Skip to content

Atlassian System Design: The Backbone of Software Teams

Atlassian is not a single product — it is a platform ecosystem powering software development workflows for millions of teams globally. Its flagship products — Jira, Confluence, Bitbucket, and Jira Service Management — all share a common cloud infrastructure, identity system, and data platform.


1. Requirements

Functional

  • Jira: Create, assign, track, and query issues with complex workflows, custom fields, and boards (Scrum/Kanban).
  • Confluence: Rich-text collaborative document editing with real-time co-authoring, spaces, and pages.
  • Bitbucket: Git repository hosting with Pull Requests, code reviews, and CI/CD Pipelines.
  • Identity (Atlassian Account): Single Sign-On (SSO) and OAuth 2.0 across all products.
  • Marketplace: Third-party app installation and execution within the platform (Forge/Connect).
  • Search: Unified full-text search across issues, pages, and repositories.
  • Notifications: Real-time in-app, email, and webhook-based notifications.

Non-Functional

  • Multi-Tenancy: Thousands of isolated customer organizations (tenants) on shared infrastructure.
  • High Availability: 99.9%+ uptime SLAs for enterprise customers.
  • Scalability: Handle teams from 5 to 500,000 users per organization.
  • Data Residency: Store customer data in specific geographic regions (EU, US, AU).
  • Security & Compliance: SOC 2 Type II, ISO 27001, GDPR compliance.

2. High-Level Architecture

Atlassian Cloud is built on a multi-tenant microservices platform running on AWS. All products share a common infrastructure backbone but operate as independent services.


3. Multi-Tenancy Architecture

This is the most critical design challenge at Atlassian. Thousands of companies (tenants) share the same infrastructure without data leaking between them.

Tenant Isolation Strategy

Atlassian uses a Silo Model for data isolation with shared compute:

Key mechanisms:

  • Every API request carries a JWT containing the tenantId (cloud site ID).
  • The Tenant Router maps tenantId → database shard/cluster.
  • SQL queries are automatically scoped to the correct tenant database.
  • Cross-tenant queries are architecturally impossible at the application layer.

4. Jira Deep Dive: Issue Tracking at Scale

Data Model

Issue Query Language (JQL) Execution

JQL (Jira Query Language) is a powerful SQL-like query system. Processing project = "MYAPP" AND status = "In Progress" AND assignee = currentUser() involves:


5. Confluence Deep Dive: Real-Time Collaborative Editing

Confluence pages support multiple users editing the same document simultaneously using Operational Transformation (OT) — the same technique used by Google Docs.

Collaborative Editing Flow

Key component — Synchrony: Atlassian's Synchrony service is a dedicated Node.js-based WebSocket server responsible for managing collaborative sessions. It maintains an in-memory representation of each active document and resolves conflicts using OT before persisting.


6. Bitbucket Deep Dive: Git Hosting

Bitbucket's architecture for storing Git repositories mirrors problems solved by GitHub, but with Atlassian's pipeline integration as a first-class feature.


Search spans Jira issues, Confluence pages, and Bitbucket code — a massive indexing challenge.

  • Index isolation: Each tenant has its own Elasticsearch index namespace (jira-{tenantId}, confluence-{tenantId}).
  • Near-real-time: Updates propagate to search within seconds via Kafka consumers.
  • ML re-ranking: Recent Atlassian Intelligence features use click-through signals to boost personalized results.

8. Forge: Atlassian's App Platform (FaaS)

Atlassian Marketplace has 5,000+ apps. Forge is their serverless, sandboxed runtime for 3rd-party apps.

  • Security: Each app invocation runs in a V8 Isolate — no file system access, no arbitrary network calls (only allowlisted domains).
  • Tenancy: Forge Storage is automatically scoped to the current tenant — apps cannot access another tenant's data.
  • Cold start mitigation: Forge pre-warms isolates for high-traffic apps, similar to Lambda's provisioned concurrency.

9. Implementation Example: Jira Issue Service

This TypeScript example demonstrates a simplified Jira issue service with tenant isolation, caching, and event publishing.

typescript
import { Kafka, Producer } from "kafkajs";
import { createClient, RedisClientType } from "redis";

// ─── Types ────────────────────────────────────────────────────────────────────

interface TenantContext {
  tenantId: string;
  userId: string;
}

interface Issue {
  id: string;
  key: string; // e.g. "PROJ-123"
  summary: string;
  status: "Todo" | "In Progress" | "Done";
  assigneeId: string | null;
  projectId: string;
  tenantId: string;
  createdAt: Date;
  updatedAt: Date;
}

interface CreateIssuePayload {
  summary: string;
  projectId: string;
  assigneeId?: string;
}

// ─── Infrastructure (Simulated) ───────────────────────────────────────────────

// Simulates per-tenant database shards (in production: routed via tenant ID)
const tenantDatabases = new Map<string, Map<string, Issue>>();

function getTenantDB(tenantId: string): Map<string, Issue> {
  if (!tenantDatabases.has(tenantId)) {
    tenantDatabases.set(tenantId, new Map());
  }
  return tenantDatabases.get(tenantId)!;
}

// ─── Issue Service ────────────────────────────────────────────────────────────

class JiraIssueService {
  private cache: Map<string, Issue> = new Map(); // Simulates Redis cache
  private issueCounters: Map<string, Map<string, number>> = new Map(); // tenant → project → counter

  /**
   * Generates Jira-style issue key like "PROJ-42"
   * Each project has an auto-incrementing counter per tenant (stored in DB)
   */
  private getNextIssueKey(tenantId: string, projectId: string): string {
    if (!this.issueCounters.has(tenantId)) {
      this.issueCounters.set(tenantId, new Map());
    }
    const projectCounters = this.issueCounters.get(tenantId)!;
    const current = projectCounters.get(projectId) ?? 0;
    const next = current + 1;
    projectCounters.set(projectId, next);
    return `${projectId.toUpperCase()}-${next}`;
  }

  /**
   * Creates a new issue — scoped to the tenant's database shard.
   */
  async createIssue(
    ctx: TenantContext,
    payload: CreateIssuePayload
  ): Promise<Issue> {
    const { tenantId, userId } = ctx;
    const db = getTenantDB(tenantId); // Route to correct tenant shard

    const issueKey = this.getNextIssueKey(tenantId, payload.projectId);
    const issue: Issue = {
      id: `issue-${crypto.randomUUID()}`,
      key: issueKey,
      summary: payload.summary,
      status: "Todo",
      assigneeId: payload.assigneeId ?? null,
      projectId: payload.projectId,
      tenantId, // Always stored with tenantId for audit/compliance
      createdAt: new Date(),
      updatedAt: new Date(),
    };

    // 1. Persist to tenant-scoped DB
    db.set(issue.id, issue);

    // 2. Publish event for Search indexing & Notifications
    await this.publishEvent("issue.created", { issue, actorId: userId });

    console.log(
      `✅ [${tenantId}] Created issue: ${issue.key} — "${issue.summary}"`
    );
    return issue;
  }

  /**
   * Fetches an issue with a Redis-style cache.
   * Cache key includes tenantId to prevent cross-tenant cache poisoning.
   */
  async getIssue(ctx: TenantContext, issueId: string): Promise<Issue | null> {
    const cacheKey = `${ctx.tenantId}:issue:${issueId}`; // Namespace by tenant!

    // 1. Check cache
    if (this.cache.has(cacheKey)) {
      console.log(`⚡ [Cache HIT] ${cacheKey}`);
      return this.cache.get(cacheKey)!;
    }

    // 2. Fetch from tenant DB (tenant routing enforced)
    const db = getTenantDB(ctx.tenantId);
    const issue = db.get(issueId) ?? null;

    if (issue) {
      this.cache.set(cacheKey, issue); // Populate cache (TTL would apply in production)
      console.log(`🗄️  [Cache MISS → DB] ${cacheKey}`);
    }

    return issue;
  }

  /**
   * Transition an issue's status through a workflow.
   * Jira validates allowed transitions before applying.
   */
  async transitionIssue(
    ctx: TenantContext,
    issueId: string,
    newStatus: Issue["status"]
  ): Promise<Issue | null> {
    const issue = await this.getIssue(ctx, issueId);
    if (!issue) return null;

    const allowedTransitions: Record<Issue["status"], Issue["status"][]> = {
      Todo: ["In Progress"],
      "In Progress": ["Done", "Todo"],
      Done: [],
    };

    if (!allowedTransitions[issue.status].includes(newStatus)) {
      throw new Error(
        `Invalid transition: ${issue.status} → ${newStatus} for issue ${issue.key}`
      );
    }

    issue.status = newStatus;
    issue.updatedAt = new Date();

    // Invalidate cache on write
    const cacheKey = `${ctx.tenantId}:issue:${issueId}`;
    this.cache.delete(cacheKey);

    // Publish event for notifications ("@alice, PROJ-5 moved to Done")
    await this.publishEvent("issue.transitioned", {
      issue,
      previousStatus: issue.status,
      newStatus,
      actorId: ctx.userId,
    });

    console.log(
      `🔄 [${ctx.tenantId}] ${issue.key}: ${issue.status} → ${newStatus}`
    );
    return issue;
  }

  /**
   * Simulates publishing to Kafka event bus.
   * In production: Kafka Producer sends to 'jira.events' topic.
   * Consumers: Search Indexer, Notification Service, Analytics
   */
  private async publishEvent(
    eventType: string,
    payload: object
  ): Promise<void> {
    // Simulated Kafka publish
    console.log(
      `📨 [Kafka] Event: ${eventType}`,
      JSON.stringify(payload, null, 2)
    );
  }
}

// ─── Demo ─────────────────────────────────────────────────────────────────────

async function runDemo() {
  const service = new JiraIssueService();

  // Tenant A context
  const ctxA: TenantContext = {
    tenantId: "tenant-acme-corp",
    userId: "user-alice",
  };
  // Tenant B context (completely isolated)
  const ctxB: TenantContext = { tenantId: "tenant-globex", userId: "user-bob" };

  console.log("\n=== TENANT A: ACME Corp ===");
  const issue1 = await service.createIssue(ctxA, {
    summary: "Implement OAuth 2.0 login flow",
    projectId: "AUTH",
    assigneeId: "user-alice",
  });

  const issue2 = await service.createIssue(ctxA, {
    summary: "Fix null pointer in payment service",
    projectId: "PAYMENT",
  });

  console.log("\n--- Fetching issue (will miss cache first time) ---");
  await service.getIssue(ctxA, issue1.id);
  console.log("--- Fetching again (should HIT cache) ---");
  await service.getIssue(ctxA, issue1.id);

  console.log("\n--- Transitioning issue status ---");
  await service.transitionIssue(ctxA, issue1.id, "In Progress");

  console.log("\n=== TENANT B: Globex (Isolated) ===");
  const issue3 = await service.createIssue(ctxB, {
    summary: "Set up Kubernetes cluster",
    projectId: "INFRA",
  });

  // Prove isolation: Tenant B CANNOT access Tenant A's issues
  console.log("\n--- Proving tenant isolation ---");
  const crossTenantAttempt = await service.getIssue(ctxB, issue1.id);
  console.log(
    `Tenant B trying to access Tenant A's issue: ${crossTenantAttempt === null ? "✅ Blocked (null)" : "❌ BREACH!"}`
  );
}

runDemo().catch(console.error);

Sample Output:

=== TENANT A: ACME Corp ===
✅ [tenant-acme-corp] Created issue: AUTH-1 — "Implement OAuth 2.0 login flow"
✅ [tenant-acme-corp] Created issue: PAYMENT-1 — "Fix null pointer in payment service"

--- Fetching issue (will miss cache first time) ---
🗄️  [Cache MISS → DB] tenant-acme-corp:issue:issue-xxx
--- Fetching again (should HIT cache) ---
⚡ [Cache HIT] tenant-acme-corp:issue:issue-xxx

--- Transitioning issue status ---
🔄 [tenant-acme-corp] AUTH-1: Todo → In Progress

=== TENANT B: Globex (Isolated) ===
✅ [tenant-globex] Created issue: INFRA-1 — "Set up Kubernetes cluster"

--- Proving tenant isolation ---
Tenant B trying to access Tenant A's issue: ✅ Blocked (null)

10. Notification System

Atlassian's notification system must deliver millions of emails and in-app alerts with configurable user preferences.

Digest Batching: To prevent flooding users who are @mentioned 50 times in an active PR discussion, Atlassian batches notifications within a 5-minute window before sending a single digest email.


11. Data Residency Architecture

Enterprise customers (e.g., German banks, Australian government) require data to stay within specific geographies.

The Global Control Plane stores only the routing metadata (which region a tenant belongs to). All content data — issues, pages, comments, attachments — never leaves the assigned region.


12. Summary: Atlassian Architecture Trade-offs

ComponentChoiceRationale
Multi-TenancySiloed databases + shared computePrevents data leakage while keeping infrastructure costs manageable
Collaborative EditingOperational Transformation (OT)Well-understood algorithm; handles concurrent edits without requiring distributed locks
SearchElasticsearch with per-tenant indicesTenant isolation at index level; flexible full-text scoring
Event BusApache KafkaDecouples services; enables replay for Search re-indexing and Analytics
App PlatformV8 Isolates (Forge)Sandboxed execution without overhead of full containers; ~ms cold starts
Data ResidencyRegional AWS deploymentsSatisfies enterprise compliance without rebuilding application logic
CachingRedis with tenant-namespaced keysPrevents cross-tenant cache poisoning; fast JQL result reuse
Git StorageCustom layer over S3Object storage cost savings for large repos; S3 durability (11 nines)

13. Key Lessons from Atlassian's Architecture

  1. Multi-tenancy is not an afterthought: Atlassian baked tenant isolation into every layer — JWT tokens, database routing, cache keys, Elasticsearch index names, and Forge storage APIs all carry tenantId.

  2. Events drive decoupling: By publishing all mutations to Kafka, Atlassian's Search, Notifications, and Analytics systems can evolve independently without tight coupling to core services.

  3. Compliance shapes architecture: Data residency requirements forced Atlassian to build regional deployments years before many competitors — a major engineering investment that became a competitive advantage.

  4. Extensibility is a product: The Forge platform treats security (V8 isolates, network allowlists) and tenancy (automatic data scoping) as platform responsibilities, not app developer responsibilities — reducing the blast radius of bad marketplace apps.

  5. Operational Transformation is hard: Confluence's real-time collaboration was one of the hardest engineering challenges. OT requires a central server (Synchrony) to serialize operations — this is why purely P2P collaborative editors often use CRDTs instead.

Released under the ISC License.