Atlassian System Design: The Backbone of Software Teams
Atlassian is not a single product — it is a platform ecosystem powering software development workflows for millions of teams globally. Its flagship products — Jira, Confluence, Bitbucket, and Jira Service Management — all share a common cloud infrastructure, identity system, and data platform.
1. Requirements
Functional
- Jira: Create, assign, track, and query issues with complex workflows, custom fields, and boards (Scrum/Kanban).
- Confluence: Rich-text collaborative document editing with real-time co-authoring, spaces, and pages.
- Bitbucket: Git repository hosting with Pull Requests, code reviews, and CI/CD Pipelines.
- Identity (Atlassian Account): Single Sign-On (SSO) and OAuth 2.0 across all products.
- Marketplace: Third-party app installation and execution within the platform (Forge/Connect).
- Search: Unified full-text search across issues, pages, and repositories.
- Notifications: Real-time in-app, email, and webhook-based notifications.
Non-Functional
- Multi-Tenancy: Thousands of isolated customer organizations (tenants) on shared infrastructure.
- High Availability: 99.9%+ uptime SLAs for enterprise customers.
- Scalability: Handle teams from 5 to 500,000 users per organization.
- Data Residency: Store customer data in specific geographic regions (EU, US, AU).
- Security & Compliance: SOC 2 Type II, ISO 27001, GDPR compliance.
2. High-Level Architecture
Atlassian Cloud is built on a multi-tenant microservices platform running on AWS. All products share a common infrastructure backbone but operate as independent services.
3. Multi-Tenancy Architecture
This is the most critical design challenge at Atlassian. Thousands of companies (tenants) share the same infrastructure without data leaking between them.
Tenant Isolation Strategy
Atlassian uses a Silo Model for data isolation with shared compute:
Key mechanisms:
- Every API request carries a JWT containing the
tenantId(cloud site ID). - The Tenant Router maps
tenantId → database shard/cluster. - SQL queries are automatically scoped to the correct tenant database.
- Cross-tenant queries are architecturally impossible at the application layer.
4. Jira Deep Dive: Issue Tracking at Scale
Data Model
Issue Query Language (JQL) Execution
JQL (Jira Query Language) is a powerful SQL-like query system. Processing project = "MYAPP" AND status = "In Progress" AND assignee = currentUser() involves:
5. Confluence Deep Dive: Real-Time Collaborative Editing
Confluence pages support multiple users editing the same document simultaneously using Operational Transformation (OT) — the same technique used by Google Docs.
Collaborative Editing Flow
Key component — Synchrony: Atlassian's Synchrony service is a dedicated Node.js-based WebSocket server responsible for managing collaborative sessions. It maintains an in-memory representation of each active document and resolves conflicts using OT before persisting.
6. Bitbucket Deep Dive: Git Hosting
Bitbucket's architecture for storing Git repositories mirrors problems solved by GitHub, but with Atlassian's pipeline integration as a first-class feature.
7. Atlassian Search: Unified Full-Text Search
Search spans Jira issues, Confluence pages, and Bitbucket code — a massive indexing challenge.
- Index isolation: Each tenant has its own Elasticsearch index namespace (
jira-{tenantId},confluence-{tenantId}). - Near-real-time: Updates propagate to search within seconds via Kafka consumers.
- ML re-ranking: Recent Atlassian Intelligence features use click-through signals to boost personalized results.
8. Forge: Atlassian's App Platform (FaaS)
Atlassian Marketplace has 5,000+ apps. Forge is their serverless, sandboxed runtime for 3rd-party apps.
- Security: Each app invocation runs in a V8 Isolate — no file system access, no arbitrary network calls (only allowlisted domains).
- Tenancy: Forge Storage is automatically scoped to the current tenant — apps cannot access another tenant's data.
- Cold start mitigation: Forge pre-warms isolates for high-traffic apps, similar to Lambda's provisioned concurrency.
9. Implementation Example: Jira Issue Service
This TypeScript example demonstrates a simplified Jira issue service with tenant isolation, caching, and event publishing.
import { Kafka, Producer } from "kafkajs";
import { createClient, RedisClientType } from "redis";
// ─── Types ────────────────────────────────────────────────────────────────────
interface TenantContext {
tenantId: string;
userId: string;
}
interface Issue {
id: string;
key: string; // e.g. "PROJ-123"
summary: string;
status: "Todo" | "In Progress" | "Done";
assigneeId: string | null;
projectId: string;
tenantId: string;
createdAt: Date;
updatedAt: Date;
}
interface CreateIssuePayload {
summary: string;
projectId: string;
assigneeId?: string;
}
// ─── Infrastructure (Simulated) ───────────────────────────────────────────────
// Simulates per-tenant database shards (in production: routed via tenant ID)
const tenantDatabases = new Map<string, Map<string, Issue>>();
function getTenantDB(tenantId: string): Map<string, Issue> {
if (!tenantDatabases.has(tenantId)) {
tenantDatabases.set(tenantId, new Map());
}
return tenantDatabases.get(tenantId)!;
}
// ─── Issue Service ────────────────────────────────────────────────────────────
class JiraIssueService {
private cache: Map<string, Issue> = new Map(); // Simulates Redis cache
private issueCounters: Map<string, Map<string, number>> = new Map(); // tenant → project → counter
/**
* Generates Jira-style issue key like "PROJ-42"
* Each project has an auto-incrementing counter per tenant (stored in DB)
*/
private getNextIssueKey(tenantId: string, projectId: string): string {
if (!this.issueCounters.has(tenantId)) {
this.issueCounters.set(tenantId, new Map());
}
const projectCounters = this.issueCounters.get(tenantId)!;
const current = projectCounters.get(projectId) ?? 0;
const next = current + 1;
projectCounters.set(projectId, next);
return `${projectId.toUpperCase()}-${next}`;
}
/**
* Creates a new issue — scoped to the tenant's database shard.
*/
async createIssue(
ctx: TenantContext,
payload: CreateIssuePayload
): Promise<Issue> {
const { tenantId, userId } = ctx;
const db = getTenantDB(tenantId); // Route to correct tenant shard
const issueKey = this.getNextIssueKey(tenantId, payload.projectId);
const issue: Issue = {
id: `issue-${crypto.randomUUID()}`,
key: issueKey,
summary: payload.summary,
status: "Todo",
assigneeId: payload.assigneeId ?? null,
projectId: payload.projectId,
tenantId, // Always stored with tenantId for audit/compliance
createdAt: new Date(),
updatedAt: new Date(),
};
// 1. Persist to tenant-scoped DB
db.set(issue.id, issue);
// 2. Publish event for Search indexing & Notifications
await this.publishEvent("issue.created", { issue, actorId: userId });
console.log(
`✅ [${tenantId}] Created issue: ${issue.key} — "${issue.summary}"`
);
return issue;
}
/**
* Fetches an issue with a Redis-style cache.
* Cache key includes tenantId to prevent cross-tenant cache poisoning.
*/
async getIssue(ctx: TenantContext, issueId: string): Promise<Issue | null> {
const cacheKey = `${ctx.tenantId}:issue:${issueId}`; // Namespace by tenant!
// 1. Check cache
if (this.cache.has(cacheKey)) {
console.log(`⚡ [Cache HIT] ${cacheKey}`);
return this.cache.get(cacheKey)!;
}
// 2. Fetch from tenant DB (tenant routing enforced)
const db = getTenantDB(ctx.tenantId);
const issue = db.get(issueId) ?? null;
if (issue) {
this.cache.set(cacheKey, issue); // Populate cache (TTL would apply in production)
console.log(`🗄️ [Cache MISS → DB] ${cacheKey}`);
}
return issue;
}
/**
* Transition an issue's status through a workflow.
* Jira validates allowed transitions before applying.
*/
async transitionIssue(
ctx: TenantContext,
issueId: string,
newStatus: Issue["status"]
): Promise<Issue | null> {
const issue = await this.getIssue(ctx, issueId);
if (!issue) return null;
const allowedTransitions: Record<Issue["status"], Issue["status"][]> = {
Todo: ["In Progress"],
"In Progress": ["Done", "Todo"],
Done: [],
};
if (!allowedTransitions[issue.status].includes(newStatus)) {
throw new Error(
`Invalid transition: ${issue.status} → ${newStatus} for issue ${issue.key}`
);
}
issue.status = newStatus;
issue.updatedAt = new Date();
// Invalidate cache on write
const cacheKey = `${ctx.tenantId}:issue:${issueId}`;
this.cache.delete(cacheKey);
// Publish event for notifications ("@alice, PROJ-5 moved to Done")
await this.publishEvent("issue.transitioned", {
issue,
previousStatus: issue.status,
newStatus,
actorId: ctx.userId,
});
console.log(
`🔄 [${ctx.tenantId}] ${issue.key}: ${issue.status} → ${newStatus}`
);
return issue;
}
/**
* Simulates publishing to Kafka event bus.
* In production: Kafka Producer sends to 'jira.events' topic.
* Consumers: Search Indexer, Notification Service, Analytics
*/
private async publishEvent(
eventType: string,
payload: object
): Promise<void> {
// Simulated Kafka publish
console.log(
`📨 [Kafka] Event: ${eventType}`,
JSON.stringify(payload, null, 2)
);
}
}
// ─── Demo ─────────────────────────────────────────────────────────────────────
async function runDemo() {
const service = new JiraIssueService();
// Tenant A context
const ctxA: TenantContext = {
tenantId: "tenant-acme-corp",
userId: "user-alice",
};
// Tenant B context (completely isolated)
const ctxB: TenantContext = { tenantId: "tenant-globex", userId: "user-bob" };
console.log("\n=== TENANT A: ACME Corp ===");
const issue1 = await service.createIssue(ctxA, {
summary: "Implement OAuth 2.0 login flow",
projectId: "AUTH",
assigneeId: "user-alice",
});
const issue2 = await service.createIssue(ctxA, {
summary: "Fix null pointer in payment service",
projectId: "PAYMENT",
});
console.log("\n--- Fetching issue (will miss cache first time) ---");
await service.getIssue(ctxA, issue1.id);
console.log("--- Fetching again (should HIT cache) ---");
await service.getIssue(ctxA, issue1.id);
console.log("\n--- Transitioning issue status ---");
await service.transitionIssue(ctxA, issue1.id, "In Progress");
console.log("\n=== TENANT B: Globex (Isolated) ===");
const issue3 = await service.createIssue(ctxB, {
summary: "Set up Kubernetes cluster",
projectId: "INFRA",
});
// Prove isolation: Tenant B CANNOT access Tenant A's issues
console.log("\n--- Proving tenant isolation ---");
const crossTenantAttempt = await service.getIssue(ctxB, issue1.id);
console.log(
`Tenant B trying to access Tenant A's issue: ${crossTenantAttempt === null ? "✅ Blocked (null)" : "❌ BREACH!"}`
);
}
runDemo().catch(console.error);Sample Output:
=== TENANT A: ACME Corp ===
✅ [tenant-acme-corp] Created issue: AUTH-1 — "Implement OAuth 2.0 login flow"
✅ [tenant-acme-corp] Created issue: PAYMENT-1 — "Fix null pointer in payment service"
--- Fetching issue (will miss cache first time) ---
🗄️ [Cache MISS → DB] tenant-acme-corp:issue:issue-xxx
--- Fetching again (should HIT cache) ---
⚡ [Cache HIT] tenant-acme-corp:issue:issue-xxx
--- Transitioning issue status ---
🔄 [tenant-acme-corp] AUTH-1: Todo → In Progress
=== TENANT B: Globex (Isolated) ===
✅ [tenant-globex] Created issue: INFRA-1 — "Set up Kubernetes cluster"
--- Proving tenant isolation ---
Tenant B trying to access Tenant A's issue: ✅ Blocked (null)10. Notification System
Atlassian's notification system must deliver millions of emails and in-app alerts with configurable user preferences.
Digest Batching: To prevent flooding users who are @mentioned 50 times in an active PR discussion, Atlassian batches notifications within a 5-minute window before sending a single digest email.
11. Data Residency Architecture
Enterprise customers (e.g., German banks, Australian government) require data to stay within specific geographies.
The Global Control Plane stores only the routing metadata (which region a tenant belongs to). All content data — issues, pages, comments, attachments — never leaves the assigned region.
12. Summary: Atlassian Architecture Trade-offs
| Component | Choice | Rationale |
|---|---|---|
| Multi-Tenancy | Siloed databases + shared compute | Prevents data leakage while keeping infrastructure costs manageable |
| Collaborative Editing | Operational Transformation (OT) | Well-understood algorithm; handles concurrent edits without requiring distributed locks |
| Search | Elasticsearch with per-tenant indices | Tenant isolation at index level; flexible full-text scoring |
| Event Bus | Apache Kafka | Decouples services; enables replay for Search re-indexing and Analytics |
| App Platform | V8 Isolates (Forge) | Sandboxed execution without overhead of full containers; ~ms cold starts |
| Data Residency | Regional AWS deployments | Satisfies enterprise compliance without rebuilding application logic |
| Caching | Redis with tenant-namespaced keys | Prevents cross-tenant cache poisoning; fast JQL result reuse |
| Git Storage | Custom layer over S3 | Object storage cost savings for large repos; S3 durability (11 nines) |
13. Key Lessons from Atlassian's Architecture
Multi-tenancy is not an afterthought: Atlassian baked tenant isolation into every layer — JWT tokens, database routing, cache keys, Elasticsearch index names, and Forge storage APIs all carry
tenantId.Events drive decoupling: By publishing all mutations to Kafka, Atlassian's Search, Notifications, and Analytics systems can evolve independently without tight coupling to core services.
Compliance shapes architecture: Data residency requirements forced Atlassian to build regional deployments years before many competitors — a major engineering investment that became a competitive advantage.
Extensibility is a product: The Forge platform treats security (V8 isolates, network allowlists) and tenancy (automatic data scoping) as platform responsibilities, not app developer responsibilities — reducing the blast radius of bad marketplace apps.
Operational Transformation is hard: Confluence's real-time collaboration was one of the hardest engineering challenges. OT requires a central server (Synchrony) to serialize operations — this is why purely P2P collaborative editors often use CRDTs instead.
