[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-mcp-production-transport-auth-scaling-challenges":3},{"article":4,"author":55},{"id":5,"category_id":6,"title":7,"slug":8,"excerpt":9,"content_md":10,"content_html":11,"locale":12,"author_id":13,"published":14,"published_at":15,"meta_title":16,"meta_description":17,"focus_keyword":18,"og_image":19,"canonical_url":19,"robots_meta":20,"created_at":15,"updated_at":15,"tags":21,"category_name":35,"related_articles":36},"da000000-0000-0000-0000-000000000002","a0000000-0000-0000-0000-000000000006","MCP in Production: Solving Transport, Auth, and Scaling Challenges","mcp-production-transport-auth-scaling-challenges","A deep dive into running Model Context Protocol servers in production — transport selection, authentication patterns, scaling strategies, audit logging, and gateway architectures for enterprise deployments.","## From Prototype to Production: What Changes\n\nBuilding an MCP server that works on your laptop is straightforward. Running one that handles thousands of concurrent AI agent sessions across a distributed infrastructure is a different engineering challenge entirely. Production MCP deployments must address five concerns that prototypes can ignore: **transport scalability**, **authentication and authorization**, **session management at scale**, **audit trails**, and **multi-server orchestration**.\n\nThis article is a technical guide for engineering teams moving MCP servers from development to production. We assume you have built at least one MCP server and understand the basics of the protocol. If not, start with our companion article on building your first MCP server.\n\n## Transport Scalability: stdio vs SSE vs Streamable HTTP\n\nMCP defines three transport mechanisms. Choosing the right one for production is your first architectural decision.\n\n### stdio Transport\n\nThe stdio transport communicates via standard input\u002Foutput streams. The host application spawns the MCP server as a child process and exchanges JSON-RPC messages through stdin\u002Fstdout.\n\n**Advantages:**\n- Zero network configuration\n- Process-level isolation\n- No port conflicts\n- Lowest latency (no network stack)\n\n**Limitations:**\n- Server must run on the same machine as the host\n- One server process per client session\n- Cannot be load-balanced\n- No horizontal scaling\n\n**Best for:** Local development tools, IDE extensions, single-user desktop applications.\n\n### SSE (Server-Sent Events) Transport\n\nSSE transport uses HTTP for client-to-server messages and Server-Sent Events for server-to-client messages. The server runs as an HTTP service.\n\n**Advantages:**\n- Network-accessible (remote servers)\n- Compatible with existing HTTP infrastructure\n- Supports multiple concurrent clients\n- Works through firewalls and proxies\n\n**Limitations:**\n- Unidirectional streaming (server-to-client only via SSE)\n- Session affinity required (stateful connections)\n- Some load balancers struggle with long-lived SSE connections\n- No built-in reconnection semantics in the protocol\n\n**Best for:** Small to medium deployments, internal tools, teams with existing HTTP infrastructure.\n\n### Streamable HTTP Transport\n\nStreamable HTTP is the newest transport, designed specifically for production deployments. It uses standard HTTP POST for all messages, with optional SSE streaming for long-running operations.\n\n**Advantages:**\n- Fully stateless request\u002Fresponse model\n- Works with any HTTP load balancer\n- Built-in session management via `Mcp-Session-Id` header\n- Supports both streaming and non-streaming responses\n- CDN and proxy compatible\n\n**Limitations:**\n- Requires server-side session storage (Redis, database)\n- Slightly higher per-message overhead than stdio\n- Newer transport — less ecosystem tooling\n\n**Best for:** Production cloud deployments, multi-tenant platforms, enterprise environments.\n\n### Transport Comparison Matrix\n\n| Feature | stdio | SSE | Streamable HTTP |\n|---------|-------|-----|-----------------|\n| Network access | Local only | Remote | Remote |\n| Load balancing | Not possible | Session-sticky | Standard HTTP LB |\n| Horizontal scaling | No | Limited | Yes |\n| Firewall-friendly | N\u002FA | Yes | Yes |\n| Reconnection | N\u002FA | Manual | Built-in |\n| Concurrent clients | 1 | Many | Many |\n| Latency | Lowest | Low | Low |\n| Statelessness | Stateful | Stateful | Stateless possible |\n| Production readiness | Dev only | Medium | High |\n\n## Authentication: SSO Integration, API Keys, and OAuth\n\nMCP servers in production must authenticate both the AI host application and the end user on whose behalf the AI is acting.\n\n### OAuth 2.0 for Remote Servers\n\nThe MCP specification includes a built-in OAuth 2.0 flow for remote (HTTP-based) servers. The flow works as follows:\n\n1. The MCP client sends an `initialize` request without credentials.\n2. The server responds with HTTP 401 and a `WWW-Authenticate` header pointing to its OAuth authorization endpoint.\n3. The host application opens a browser for user authentication.\n4. After successful auth, the host receives an access token.\n5. Subsequent MCP requests include the token in the `Authorization: Bearer` header.\n\n```typescript\n\u002F\u002F Server-side OAuth middleware for Axum\u002FExpress\nimport { McpServer } from \"@modelcontextprotocol\u002Fsdk\u002Fserver\u002Fmcp.js\";\nimport { StreamableHTTPServerTransport } from \"@modelcontextprotocol\u002Fsdk\u002Fserver\u002FstreamableHttp.js\";\nimport express from \"express\";\n\nconst app = express();\n\napp.use(\"\u002Fmcp\", async (req, res, next) => {\n  const token = req.headers.authorization?.replace(\"Bearer \", \"\");\n  if (!token) {\n    res.status(401).json({\n      error: \"unauthorized\",\n      oauth_url: \"https:\u002F\u002Fauth.example.com\u002Foauth\u002Fauthorize\",\n    });\n    return;\n  }\n\n  \u002F\u002F Validate token against your auth provider\n  const user = await validateToken(token);\n  if (!user) {\n    res.status(403).json({ error: \"invalid_token\" });\n    return;\n  }\n\n  \u002F\u002F Attach user context for tool authorization\n  req.mcpUser = user;\n  next();\n});\n```\n\n### API Key Authentication\n\nFor service-to-service MCP communication (where no human user is involved), API keys are simpler:\n\n```typescript\nconst API_KEYS = new Map([\n  [\"sk-prod-abc123\", { name: \"analytics-service\", scopes: [\"read\"] }],\n  [\"sk-prod-def456\", { name: \"admin-service\", scopes: [\"read\", \"write\"] }],\n]);\n\nfunction authenticateApiKey(key: string) {\n  return API_KEYS.get(key) || null;\n}\n```\n\n### SSO Integration Pattern\n\nFor enterprise deployments, integrate with existing SSO (SAML, OIDC):\n\n```\nUser -> AI Host -> MCP Client -> MCP Server -> SSO Provider\n                                      |\n                                      v\n                              Validate OIDC token\n                              Extract user roles\n                              Apply tool-level RBAC\n```\n\nEach MCP tool can check the authenticated user's roles before executing:\n\n```typescript\nserver.tool(\n  \"delete_record\",\n  \"Delete a database record by ID\",\n  { table: z.string(), id: z.string() },\n  async ({ table, id }, { authContext }) => {\n    if (!authContext.roles.includes(\"admin\")) {\n      return {\n        content: [{ type: \"text\", text: \"Forbidden: admin role required\" }],\n        isError: true,\n      };\n    }\n    \u002F\u002F Proceed with deletion\n  }\n);\n```\n\n## Scaling: Stateful Sessions vs Load Balancers\n\nMCP sessions are inherently stateful: the `initialize` handshake negotiates capabilities, and the server may maintain context across tool calls within a session. This creates a tension with horizontal scaling.\n\n### Session Store Architecture\n\nExtract session state from the server process into an external store:\n\n```typescript\ninterface McpSession {\n  id: string;\n  userId: string;\n  capabilities: ServerCapabilities;\n  createdAt: Date;\n  lastActivityAt: Date;\n  metadata: Record\u003Cstring, unknown>;\n}\n\nclass RedisSessionStore {\n  constructor(private redis: Redis) {}\n\n  async create(session: McpSession): Promise\u003Cvoid> {\n    await this.redis.set(\n      `mcp:session:${session.id}`,\n      JSON.stringify(session),\n      \"EX\",\n      3600 \u002F\u002F 1 hour TTL\n    );\n  }\n\n  async get(sessionId: string): Promise\u003CMcpSession | null> {\n    const data = await this.redis.get(`mcp:session:${sessionId}`);\n    return data ? JSON.parse(data) : null;\n  }\n\n  async touch(sessionId: string): Promise\u003Cvoid> {\n    await this.redis.expire(`mcp:session:${sessionId}`, 3600);\n  }\n}\n```\n\n### Horizontal Scaling with Streamable HTTP\n\nWith externalized sessions and Streamable HTTP transport, you can run multiple MCP server instances behind a standard load balancer:\n\n```\n                    ┌──────────────┐\n                    │  Load        │\n     MCP Client ──>│  Balancer    │\n                    │  (nginx\u002FALB) │\n                    └──────┬───────┘\n                           │\n              ┌────────────┼────────────┐\n              v            v            v\n        ┌──────────┐ ┌──────────┐ ┌──────────┐\n        │ MCP      │ │ MCP      │ │ MCP      │\n        │ Server 1 │ │ Server 2 │ │ Server 3 │\n        └────┬─────┘ └────┬─────┘ └────┬─────┘\n             │             │             │\n             v             v             v\n        ┌────────────────────────────────────┐\n        │          Redis (sessions)          │\n        └────────────────────────────────────┘\n```\n\nNo session-sticky routing is required. Any server instance can handle any request by loading the session from Redis.\n\n### Auto-Scaling Configuration\n\nExample Kubernetes HPA for MCP servers:\n\n```yaml\napiVersion: autoscaling\u002Fv2\nkind: HorizontalPodAutoscaler\nmetadata:\n  name: mcp-server\nspec:\n  scaleTargetRef:\n    apiVersion: apps\u002Fv1\n    kind: Deployment\n    name: mcp-server\n  minReplicas: 2\n  maxReplicas: 20\n  metrics:\n    - type: Resource\n      resource:\n        name: cpu\n        target:\n          type: Utilization\n          averageUtilization: 70\n    - type: Pods\n      pods:\n        metric:\n          name: mcp_active_sessions\n        target:\n          type: AverageValue\n          averageValue: \"100\"\n```\n\n## Audit Trails and Logging\n\nEvery MCP tool invocation in production must be logged for security, compliance, and debugging.\n\n### Structured Logging Schema\n\n```typescript\ninterface McpAuditLog {\n  timestamp: string;\n  sessionId: string;\n  userId: string;\n  toolName: string;\n  toolInput: Record\u003Cstring, unknown>;\n  toolOutput: string;\n  durationMs: number;\n  success: boolean;\n  errorMessage?: string;\n  ipAddress: string;\n  userAgent: string;\n}\n\n\u002F\u002F Middleware that wraps every tool call\nfunction auditMiddleware(server: McpServer, logger: Logger) {\n  const originalTool = server.tool.bind(server);\n\n  server.tool = (name, description, schema, handler) => {\n    originalTool(name, description, schema, async (args, context) => {\n      const start = Date.now();\n      try {\n        const result = await handler(args, context);\n        logger.info({\n          event: \"mcp_tool_call\",\n          tool: name,\n          input: args,\n          durationMs: Date.now() - start,\n          success: !result.isError,\n          sessionId: context.sessionId,\n          userId: context.authContext?.userId,\n        });\n        return result;\n      } catch (error) {\n        logger.error({\n          event: \"mcp_tool_error\",\n          tool: name,\n          input: args,\n          error: (error as Error).message,\n          durationMs: Date.now() - start,\n          sessionId: context.sessionId,\n        });\n        throw error;\n      }\n    });\n  };\n}\n```\n\n### Compliance Considerations\n\n| Requirement | Implementation |\n|-------------|---------------|\n| GDPR data access | Log which user data the AI accessed via MCP tools |\n| SOC 2 audit trail | Immutable logs of all tool invocations with timestamps |\n| PII protection | Redact sensitive fields in tool inputs before logging |\n| Data retention | Set log TTLs matching your compliance requirements |\n| Access review | Log authentication events and permission checks |\n\n## Gateway Patterns for Multi-Server Deployments\n\nEnterprise environments often need dozens of MCP servers — one for each internal system. An **MCP Gateway** centralizes management:\n\n```\n                          ┌──────────────┐\n                          │  MCP Gateway │\n        MCP Client ──────>│              │\n                          │  - Auth      │\n                          │  - Routing   │\n                          │  - Rate limit│\n                          │  - Audit     │\n                          └──────┬───────┘\n                                 │\n              ┌──────────────────┼──────────────────┐\n              v                  v                  v\n        ┌──────────┐     ┌──────────┐      ┌──────────┐\n        │ Jira MCP │     │ Slack MCP│      │ DB MCP   │\n        │ Server   │     │ Server   │      │ Server   │\n        └──────────┘     └──────────┘      └──────────┘\n```\n\nThe gateway provides:\n\n1. **Unified authentication**: One auth flow for all backend servers.\n2. **Tool namespace routing**: Tools are prefixed with server names (`jira.create_issue`, `slack.send_message`).\n3. **Centralized rate limiting**: Per-user, per-tool, and per-server rate limits.\n4. **Request routing**: Route tool calls to the appropriate backend server.\n5. **Circuit breaking**: If a backend server is unhealthy, the gateway returns graceful errors instead of hanging.\n\n### Implementing a Basic Gateway\n\n```typescript\nclass McpGateway {\n  private backends = new Map\u003Cstring, McpClient>();\n\n  async registerBackend(prefix: string, url: string) {\n    const client = new McpClient({ name: `gateway-${prefix}` });\n    const transport = new StreamableHTTPClientTransport(new URL(url));\n    await client.connect(transport);\n    this.backends.set(prefix, client);\n  }\n\n  async routeToolCall(\n    toolName: string,\n    args: Record\u003Cstring, unknown>\n  ) {\n    \u002F\u002F toolName format: \"prefix.actualTool\"\n    const [prefix, ...rest] = toolName.split(\".\");\n    const actualTool = rest.join(\".\");\n    const backend = this.backends.get(prefix);\n\n    if (!backend) {\n      throw new Error(`Unknown backend: ${prefix}`);\n    }\n\n    return backend.callTool({ name: actualTool, arguments: args });\n  }\n}\n```\n\n## FAQ\n\n**Q: Should I use stdio or HTTP transport in production?**\nA: Use Streamable HTTP for any deployment that needs to scale beyond a single machine. Use stdio only for local desktop integrations where the AI host and MCP server run on the same machine.\n\n**Q: How many concurrent sessions can one MCP server handle?**\nA: With Streamable HTTP and externalized sessions, a single Node.js server instance typically handles 500-1,000 concurrent sessions. With Rust or Go servers, 5,000-10,000 is achievable. Scale horizontally beyond that.\n\n**Q: How do I handle MCP server downtime?**\nA: Implement health checks (`\u002Fhealth` endpoint), use Kubernetes liveness\u002Freadiness probes, and configure your gateway with circuit breakers. The MCP client will receive connection errors that the AI host can present as \"tool temporarily unavailable.\"\n\n**Q: Can MCP servers call other MCP servers?**\nA: Yes. An MCP server can also be an MCP client, creating a chain. This is the foundation of agent-to-agent communication. Use this pattern carefully — deep chains increase latency and failure probability.\n\n**Q: How do I version my MCP server API?**\nA: Use the server version field in the `initialize` response. For breaking changes (removing tools, changing schemas), bump the major version and support both old and new versions during a migration period. The MCP protocol version itself is negotiated during initialization.\n\n**Q: What is the maximum message size for MCP?**\nA: The protocol does not define a maximum, but practical limits depend on the transport. For Streamable HTTP, keep responses under 10 MB. For stdio, system pipe buffers (typically 64 KB) may require chunking for large responses.","\u003Ch2 id=\"from-prototype-to-production-what-changes\">From Prototype to Production: What Changes\u003C\u002Fh2>\n\u003Cp>Building an MCP server that works on your laptop is straightforward. Running one that handles thousands of concurrent AI agent sessions across a distributed infrastructure is a different engineering challenge entirely. Production MCP deployments must address five concerns that prototypes can ignore: \u003Cstrong>transport scalability\u003C\u002Fstrong>, \u003Cstrong>authentication and authorization\u003C\u002Fstrong>, \u003Cstrong>session management at scale\u003C\u002Fstrong>, \u003Cstrong>audit trails\u003C\u002Fstrong>, and \u003Cstrong>multi-server orchestration\u003C\u002Fstrong>.\u003C\u002Fp>\n\u003Cp>This article is a technical guide for engineering teams moving MCP servers from development to production. We assume you have built at least one MCP server and understand the basics of the protocol. If not, start with our companion article on building your first MCP server.\u003C\u002Fp>\n\u003Ch2 id=\"transport-scalability-stdio-vs-sse-vs-streamable-http\">Transport Scalability: stdio vs SSE vs Streamable HTTP\u003C\u002Fh2>\n\u003Cp>MCP defines three transport mechanisms. Choosing the right one for production is your first architectural decision.\u003C\u002Fp>\n\u003Ch3>stdio Transport\u003C\u002Fh3>\n\u003Cp>The stdio transport communicates via standard input\u002Foutput streams. The host application spawns the MCP server as a child process and exchanges JSON-RPC messages through stdin\u002Fstdout.\u003C\u002Fp>\n\u003Cp>\u003Cstrong>Advantages:\u003C\u002Fstrong>\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Zero network configuration\u003C\u002Fli>\n\u003Cli>Process-level isolation\u003C\u002Fli>\n\u003Cli>No port conflicts\u003C\u002Fli>\n\u003Cli>Lowest latency (no network stack)\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>\u003Cstrong>Limitations:\u003C\u002Fstrong>\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Server must run on the same machine as the host\u003C\u002Fli>\n\u003Cli>One server process per client session\u003C\u002Fli>\n\u003Cli>Cannot be load-balanced\u003C\u002Fli>\n\u003Cli>No horizontal scaling\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>\u003Cstrong>Best for:\u003C\u002Fstrong> Local development tools, IDE extensions, single-user desktop applications.\u003C\u002Fp>\n\u003Ch3>SSE (Server-Sent Events) Transport\u003C\u002Fh3>\n\u003Cp>SSE transport uses HTTP for client-to-server messages and Server-Sent Events for server-to-client messages. The server runs as an HTTP service.\u003C\u002Fp>\n\u003Cp>\u003Cstrong>Advantages:\u003C\u002Fstrong>\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Network-accessible (remote servers)\u003C\u002Fli>\n\u003Cli>Compatible with existing HTTP infrastructure\u003C\u002Fli>\n\u003Cli>Supports multiple concurrent clients\u003C\u002Fli>\n\u003Cli>Works through firewalls and proxies\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>\u003Cstrong>Limitations:\u003C\u002Fstrong>\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Unidirectional streaming (server-to-client only via SSE)\u003C\u002Fli>\n\u003Cli>Session affinity required (stateful connections)\u003C\u002Fli>\n\u003Cli>Some load balancers struggle with long-lived SSE connections\u003C\u002Fli>\n\u003Cli>No built-in reconnection semantics in the protocol\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>\u003Cstrong>Best for:\u003C\u002Fstrong> Small to medium deployments, internal tools, teams with existing HTTP infrastructure.\u003C\u002Fp>\n\u003Ch3>Streamable HTTP Transport\u003C\u002Fh3>\n\u003Cp>Streamable HTTP is the newest transport, designed specifically for production deployments. It uses standard HTTP POST for all messages, with optional SSE streaming for long-running operations.\u003C\u002Fp>\n\u003Cp>\u003Cstrong>Advantages:\u003C\u002Fstrong>\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Fully stateless request\u002Fresponse model\u003C\u002Fli>\n\u003Cli>Works with any HTTP load balancer\u003C\u002Fli>\n\u003Cli>Built-in session management via \u003Ccode>Mcp-Session-Id\u003C\u002Fcode> header\u003C\u002Fli>\n\u003Cli>Supports both streaming and non-streaming responses\u003C\u002Fli>\n\u003Cli>CDN and proxy compatible\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>\u003Cstrong>Limitations:\u003C\u002Fstrong>\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Requires server-side session storage (Redis, database)\u003C\u002Fli>\n\u003Cli>Slightly higher per-message overhead than stdio\u003C\u002Fli>\n\u003Cli>Newer transport — less ecosystem tooling\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>\u003Cstrong>Best for:\u003C\u002Fstrong> Production cloud deployments, multi-tenant platforms, enterprise environments.\u003C\u002Fp>\n\u003Ch3>Transport Comparison Matrix\u003C\u002Fh3>\n\u003Ctable>\u003Cthead>\u003Ctr>\u003Cth>Feature\u003C\u002Fth>\u003Cth>stdio\u003C\u002Fth>\u003Cth>SSE\u003C\u002Fth>\u003Cth>Streamable HTTP\u003C\u002Fth>\u003C\u002Ftr>\u003C\u002Fthead>\u003Ctbody>\n\u003Ctr>\u003Ctd>Network access\u003C\u002Ftd>\u003Ctd>Local only\u003C\u002Ftd>\u003Ctd>Remote\u003C\u002Ftd>\u003Ctd>Remote\u003C\u002Ftd>\u003C\u002Ftr>\n\u003Ctr>\u003Ctd>Load balancing\u003C\u002Ftd>\u003Ctd>Not possible\u003C\u002Ftd>\u003Ctd>Session-sticky\u003C\u002Ftd>\u003Ctd>Standard HTTP LB\u003C\u002Ftd>\u003C\u002Ftr>\n\u003Ctr>\u003Ctd>Horizontal scaling\u003C\u002Ftd>\u003Ctd>No\u003C\u002Ftd>\u003Ctd>Limited\u003C\u002Ftd>\u003Ctd>Yes\u003C\u002Ftd>\u003C\u002Ftr>\n\u003Ctr>\u003Ctd>Firewall-friendly\u003C\u002Ftd>\u003Ctd>N\u002FA\u003C\u002Ftd>\u003Ctd>Yes\u003C\u002Ftd>\u003Ctd>Yes\u003C\u002Ftd>\u003C\u002Ftr>\n\u003Ctr>\u003Ctd>Reconnection\u003C\u002Ftd>\u003Ctd>N\u002FA\u003C\u002Ftd>\u003Ctd>Manual\u003C\u002Ftd>\u003Ctd>Built-in\u003C\u002Ftd>\u003C\u002Ftr>\n\u003Ctr>\u003Ctd>Concurrent clients\u003C\u002Ftd>\u003Ctd>1\u003C\u002Ftd>\u003Ctd>Many\u003C\u002Ftd>\u003Ctd>Many\u003C\u002Ftd>\u003C\u002Ftr>\n\u003Ctr>\u003Ctd>Latency\u003C\u002Ftd>\u003Ctd>Lowest\u003C\u002Ftd>\u003Ctd>Low\u003C\u002Ftd>\u003Ctd>Low\u003C\u002Ftd>\u003C\u002Ftr>\n\u003Ctr>\u003Ctd>Statelessness\u003C\u002Ftd>\u003Ctd>Stateful\u003C\u002Ftd>\u003Ctd>Stateful\u003C\u002Ftd>\u003Ctd>Stateless possible\u003C\u002Ftd>\u003C\u002Ftr>\n\u003Ctr>\u003Ctd>Production readiness\u003C\u002Ftd>\u003Ctd>Dev only\u003C\u002Ftd>\u003Ctd>Medium\u003C\u002Ftd>\u003Ctd>High\u003C\u002Ftd>\u003C\u002Ftr>\n\u003C\u002Ftbody>\u003C\u002Ftable>\n\u003Ch2 id=\"authentication-sso-integration-api-keys-and-oauth\">Authentication: SSO Integration, API Keys, and OAuth\u003C\u002Fh2>\n\u003Cp>MCP servers in production must authenticate both the AI host application and the end user on whose behalf the AI is acting.\u003C\u002Fp>\n\u003Ch3>OAuth 2.0 for Remote Servers\u003C\u002Fh3>\n\u003Cp>The MCP specification includes a built-in OAuth 2.0 flow for remote (HTTP-based) servers. The flow works as follows:\u003C\u002Fp>\n\u003Col>\n\u003Cli>The MCP client sends an \u003Ccode>initialize\u003C\u002Fcode> request without credentials.\u003C\u002Fli>\n\u003Cli>The server responds with HTTP 401 and a \u003Ccode>WWW-Authenticate\u003C\u002Fcode> header pointing to its OAuth authorization endpoint.\u003C\u002Fli>\n\u003Cli>The host application opens a browser for user authentication.\u003C\u002Fli>\n\u003Cli>After successful auth, the host receives an access token.\u003C\u002Fli>\n\u003Cli>Subsequent MCP requests include the token in the \u003Ccode>Authorization: Bearer\u003C\u002Fcode> header.\u003C\u002Fli>\n\u003C\u002Fol>\n\u003Cpre>\u003Ccode class=\"language-typescript\">\u002F\u002F Server-side OAuth middleware for Axum\u002FExpress\nimport { McpServer } from \"@modelcontextprotocol\u002Fsdk\u002Fserver\u002Fmcp.js\";\nimport { StreamableHTTPServerTransport } from \"@modelcontextprotocol\u002Fsdk\u002Fserver\u002FstreamableHttp.js\";\nimport express from \"express\";\n\nconst app = express();\n\napp.use(\"\u002Fmcp\", async (req, res, next) =&gt; {\n  const token = req.headers.authorization?.replace(\"Bearer \", \"\");\n  if (!token) {\n    res.status(401).json({\n      error: \"unauthorized\",\n      oauth_url: \"https:\u002F\u002Fauth.example.com\u002Foauth\u002Fauthorize\",\n    });\n    return;\n  }\n\n  \u002F\u002F Validate token against your auth provider\n  const user = await validateToken(token);\n  if (!user) {\n    res.status(403).json({ error: \"invalid_token\" });\n    return;\n  }\n\n  \u002F\u002F Attach user context for tool authorization\n  req.mcpUser = user;\n  next();\n});\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Ch3>API Key Authentication\u003C\u002Fh3>\n\u003Cp>For service-to-service MCP communication (where no human user is involved), API keys are simpler:\u003C\u002Fp>\n\u003Cpre>\u003Ccode class=\"language-typescript\">const API_KEYS = new Map([\n  [\"sk-prod-abc123\", { name: \"analytics-service\", scopes: [\"read\"] }],\n  [\"sk-prod-def456\", { name: \"admin-service\", scopes: [\"read\", \"write\"] }],\n]);\n\nfunction authenticateApiKey(key: string) {\n  return API_KEYS.get(key) || null;\n}\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Ch3>SSO Integration Pattern\u003C\u002Fh3>\n\u003Cp>For enterprise deployments, integrate with existing SSO (SAML, OIDC):\u003C\u002Fp>\n\u003Cpre>\u003Ccode>User -&gt; AI Host -&gt; MCP Client -&gt; MCP Server -&gt; SSO Provider\n                                      |\n                                      v\n                              Validate OIDC token\n                              Extract user roles\n                              Apply tool-level RBAC\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Cp>Each MCP tool can check the authenticated user’s roles before executing:\u003C\u002Fp>\n\u003Cpre>\u003Ccode class=\"language-typescript\">server.tool(\n  \"delete_record\",\n  \"Delete a database record by ID\",\n  { table: z.string(), id: z.string() },\n  async ({ table, id }, { authContext }) =&gt; {\n    if (!authContext.roles.includes(\"admin\")) {\n      return {\n        content: [{ type: \"text\", text: \"Forbidden: admin role required\" }],\n        isError: true,\n      };\n    }\n    \u002F\u002F Proceed with deletion\n  }\n);\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Ch2 id=\"scaling-stateful-sessions-vs-load-balancers\">Scaling: Stateful Sessions vs Load Balancers\u003C\u002Fh2>\n\u003Cp>MCP sessions are inherently stateful: the \u003Ccode>initialize\u003C\u002Fcode> handshake negotiates capabilities, and the server may maintain context across tool calls within a session. This creates a tension with horizontal scaling.\u003C\u002Fp>\n\u003Ch3>Session Store Architecture\u003C\u002Fh3>\n\u003Cp>Extract session state from the server process into an external store:\u003C\u002Fp>\n\u003Cpre>\u003Ccode class=\"language-typescript\">interface McpSession {\n  id: string;\n  userId: string;\n  capabilities: ServerCapabilities;\n  createdAt: Date;\n  lastActivityAt: Date;\n  metadata: Record&lt;string, unknown&gt;;\n}\n\nclass RedisSessionStore {\n  constructor(private redis: Redis) {}\n\n  async create(session: McpSession): Promise&lt;void&gt; {\n    await this.redis.set(\n      `mcp:session:${session.id}`,\n      JSON.stringify(session),\n      \"EX\",\n      3600 \u002F\u002F 1 hour TTL\n    );\n  }\n\n  async get(sessionId: string): Promise&lt;McpSession | null&gt; {\n    const data = await this.redis.get(`mcp:session:${sessionId}`);\n    return data ? JSON.parse(data) : null;\n  }\n\n  async touch(sessionId: string): Promise&lt;void&gt; {\n    await this.redis.expire(`mcp:session:${sessionId}`, 3600);\n  }\n}\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Ch3>Horizontal Scaling with Streamable HTTP\u003C\u002Fh3>\n\u003Cp>With externalized sessions and Streamable HTTP transport, you can run multiple MCP server instances behind a standard load balancer:\u003C\u002Fp>\n\u003Cpre>\u003Ccode>                    ┌──────────────┐\n                    │  Load        │\n     MCP Client ──&gt;│  Balancer    │\n                    │  (nginx\u002FALB) │\n                    └──────┬───────┘\n                           │\n              ┌────────────┼────────────┐\n              v            v            v\n        ┌──────────┐ ┌──────────┐ ┌──────────┐\n        │ MCP      │ │ MCP      │ │ MCP      │\n        │ Server 1 │ │ Server 2 │ │ Server 3 │\n        └────┬─────┘ └────┬─────┘ └────┬─────┘\n             │             │             │\n             v             v             v\n        ┌────────────────────────────────────┐\n        │          Redis (sessions)          │\n        └────────────────────────────────────┘\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Cp>No session-sticky routing is required. Any server instance can handle any request by loading the session from Redis.\u003C\u002Fp>\n\u003Ch3>Auto-Scaling Configuration\u003C\u002Fh3>\n\u003Cp>Example Kubernetes HPA for MCP servers:\u003C\u002Fp>\n\u003Cpre>\u003Ccode class=\"language-yaml\">apiVersion: autoscaling\u002Fv2\nkind: HorizontalPodAutoscaler\nmetadata:\n  name: mcp-server\nspec:\n  scaleTargetRef:\n    apiVersion: apps\u002Fv1\n    kind: Deployment\n    name: mcp-server\n  minReplicas: 2\n  maxReplicas: 20\n  metrics:\n    - type: Resource\n      resource:\n        name: cpu\n        target:\n          type: Utilization\n          averageUtilization: 70\n    - type: Pods\n      pods:\n        metric:\n          name: mcp_active_sessions\n        target:\n          type: AverageValue\n          averageValue: \"100\"\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Ch2 id=\"audit-trails-and-logging\">Audit Trails and Logging\u003C\u002Fh2>\n\u003Cp>Every MCP tool invocation in production must be logged for security, compliance, and debugging.\u003C\u002Fp>\n\u003Ch3>Structured Logging Schema\u003C\u002Fh3>\n\u003Cpre>\u003Ccode class=\"language-typescript\">interface McpAuditLog {\n  timestamp: string;\n  sessionId: string;\n  userId: string;\n  toolName: string;\n  toolInput: Record&lt;string, unknown&gt;;\n  toolOutput: string;\n  durationMs: number;\n  success: boolean;\n  errorMessage?: string;\n  ipAddress: string;\n  userAgent: string;\n}\n\n\u002F\u002F Middleware that wraps every tool call\nfunction auditMiddleware(server: McpServer, logger: Logger) {\n  const originalTool = server.tool.bind(server);\n\n  server.tool = (name, description, schema, handler) =&gt; {\n    originalTool(name, description, schema, async (args, context) =&gt; {\n      const start = Date.now();\n      try {\n        const result = await handler(args, context);\n        logger.info({\n          event: \"mcp_tool_call\",\n          tool: name,\n          input: args,\n          durationMs: Date.now() - start,\n          success: !result.isError,\n          sessionId: context.sessionId,\n          userId: context.authContext?.userId,\n        });\n        return result;\n      } catch (error) {\n        logger.error({\n          event: \"mcp_tool_error\",\n          tool: name,\n          input: args,\n          error: (error as Error).message,\n          durationMs: Date.now() - start,\n          sessionId: context.sessionId,\n        });\n        throw error;\n      }\n    });\n  };\n}\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Ch3>Compliance Considerations\u003C\u002Fh3>\n\u003Ctable>\u003Cthead>\u003Ctr>\u003Cth>Requirement\u003C\u002Fth>\u003Cth>Implementation\u003C\u002Fth>\u003C\u002Ftr>\u003C\u002Fthead>\u003Ctbody>\n\u003Ctr>\u003Ctd>GDPR data access\u003C\u002Ftd>\u003Ctd>Log which user data the AI accessed via MCP tools\u003C\u002Ftd>\u003C\u002Ftr>\n\u003Ctr>\u003Ctd>SOC 2 audit trail\u003C\u002Ftd>\u003Ctd>Immutable logs of all tool invocations with timestamps\u003C\u002Ftd>\u003C\u002Ftr>\n\u003Ctr>\u003Ctd>PII protection\u003C\u002Ftd>\u003Ctd>Redact sensitive fields in tool inputs before logging\u003C\u002Ftd>\u003C\u002Ftr>\n\u003Ctr>\u003Ctd>Data retention\u003C\u002Ftd>\u003Ctd>Set log TTLs matching your compliance requirements\u003C\u002Ftd>\u003C\u002Ftr>\n\u003Ctr>\u003Ctd>Access review\u003C\u002Ftd>\u003Ctd>Log authentication events and permission checks\u003C\u002Ftd>\u003C\u002Ftr>\n\u003C\u002Ftbody>\u003C\u002Ftable>\n\u003Ch2 id=\"gateway-patterns-for-multi-server-deployments\">Gateway Patterns for Multi-Server Deployments\u003C\u002Fh2>\n\u003Cp>Enterprise environments often need dozens of MCP servers — one for each internal system. An \u003Cstrong>MCP Gateway\u003C\u002Fstrong> centralizes management:\u003C\u002Fp>\n\u003Cpre>\u003Ccode>                          ┌──────────────┐\n                          │  MCP Gateway │\n        MCP Client ──────&gt;│              │\n                          │  - Auth      │\n                          │  - Routing   │\n                          │  - Rate limit│\n                          │  - Audit     │\n                          └──────┬───────┘\n                                 │\n              ┌──────────────────┼──────────────────┐\n              v                  v                  v\n        ┌──────────┐     ┌──────────┐      ┌──────────┐\n        │ Jira MCP │     │ Slack MCP│      │ DB MCP   │\n        │ Server   │     │ Server   │      │ Server   │\n        └──────────┘     └──────────┘      └──────────┘\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Cp>The gateway provides:\u003C\u002Fp>\n\u003Col>\n\u003Cli>\u003Cstrong>Unified authentication\u003C\u002Fstrong>: One auth flow for all backend servers.\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Tool namespace routing\u003C\u002Fstrong>: Tools are prefixed with server names (\u003Ccode>jira.create_issue\u003C\u002Fcode>, \u003Ccode>slack.send_message\u003C\u002Fcode>).\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Centralized rate limiting\u003C\u002Fstrong>: Per-user, per-tool, and per-server rate limits.\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Request routing\u003C\u002Fstrong>: Route tool calls to the appropriate backend server.\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Circuit breaking\u003C\u002Fstrong>: If a backend server is unhealthy, the gateway returns graceful errors instead of hanging.\u003C\u002Fli>\n\u003C\u002Fol>\n\u003Ch3>Implementing a Basic Gateway\u003C\u002Fh3>\n\u003Cpre>\u003Ccode class=\"language-typescript\">class McpGateway {\n  private backends = new Map&lt;string, McpClient&gt;();\n\n  async registerBackend(prefix: string, url: string) {\n    const client = new McpClient({ name: `gateway-${prefix}` });\n    const transport = new StreamableHTTPClientTransport(new URL(url));\n    await client.connect(transport);\n    this.backends.set(prefix, client);\n  }\n\n  async routeToolCall(\n    toolName: string,\n    args: Record&lt;string, unknown&gt;\n  ) {\n    \u002F\u002F toolName format: \"prefix.actualTool\"\n    const [prefix, ...rest] = toolName.split(\".\");\n    const actualTool = rest.join(\".\");\n    const backend = this.backends.get(prefix);\n\n    if (!backend) {\n      throw new Error(`Unknown backend: ${prefix}`);\n    }\n\n    return backend.callTool({ name: actualTool, arguments: args });\n  }\n}\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Ch2 id=\"faq\">FAQ\u003C\u002Fh2>\n\u003Cp>\u003Cstrong>Q: Should I use stdio or HTTP transport in production?\u003C\u002Fstrong>\nA: Use Streamable HTTP for any deployment that needs to scale beyond a single machine. Use stdio only for local desktop integrations where the AI host and MCP server run on the same machine.\u003C\u002Fp>\n\u003Cp>\u003Cstrong>Q: How many concurrent sessions can one MCP server handle?\u003C\u002Fstrong>\nA: With Streamable HTTP and externalized sessions, a single Node.js server instance typically handles 500-1,000 concurrent sessions. With Rust or Go servers, 5,000-10,000 is achievable. Scale horizontally beyond that.\u003C\u002Fp>\n\u003Cp>\u003Cstrong>Q: How do I handle MCP server downtime?\u003C\u002Fstrong>\nA: Implement health checks (\u003Ccode>\u002Fhealth\u003C\u002Fcode> endpoint), use Kubernetes liveness\u002Freadiness probes, and configure your gateway with circuit breakers. The MCP client will receive connection errors that the AI host can present as “tool temporarily unavailable.”\u003C\u002Fp>\n\u003Cp>\u003Cstrong>Q: Can MCP servers call other MCP servers?\u003C\u002Fstrong>\nA: Yes. An MCP server can also be an MCP client, creating a chain. This is the foundation of agent-to-agent communication. Use this pattern carefully — deep chains increase latency and failure probability.\u003C\u002Fp>\n\u003Cp>\u003Cstrong>Q: How do I version my MCP server API?\u003C\u002Fstrong>\nA: Use the server version field in the \u003Ccode>initialize\u003C\u002Fcode> response. For breaking changes (removing tools, changing schemas), bump the major version and support both old and new versions during a migration period. The MCP protocol version itself is negotiated during initialization.\u003C\u002Fp>\n\u003Cp>\u003Cstrong>Q: What is the maximum message size for MCP?\u003C\u002Fstrong>\nA: The protocol does not define a maximum, but practical limits depend on the transport. For Streamable HTTP, keep responses under 10 MB. For stdio, system pipe buffers (typically 64 KB) may require chunking for large responses.\u003C\u002Fp>\n","en","b0000000-0000-0000-0000-000000000001",true,"2026-03-28T10:44:33.226940Z","MCP in Production: Transport, Auth, and Scaling Challenges","Technical guide for running MCP servers in production. Covers transport selection, OAuth authentication, horizontal scaling with Redis sessions, audit logging, and gateway patterns.","mcp production scaling",null,"index, follow",[22,27,31],{"id":23,"name":24,"slug":25,"created_at":26},"c0000000-0000-0000-0000-000000000008","AI","ai","2026-03-28T10:44:21.513630Z",{"id":28,"name":29,"slug":30,"created_at":26},"c0000000-0000-0000-0000-000000000012","DevOps","devops",{"id":32,"name":33,"slug":34,"created_at":26},"c0000000-0000-0000-0000-000000000013","Security","security","Engineering",[37,43,49],{"id":38,"title":39,"slug":40,"excerpt":41,"locale":12,"category_name":35,"published_at":42},"d0200000-0000-0000-0000-000000000003","Why Bali Is Becoming Southeast Asia's Impact-Tech Hub in 2026","why-bali-becoming-southeast-asia-impact-tech-hub-2026","Bali ranks #16 among Southeast Asian startup ecosystems. With a growing concentration of Web3 builders, AI sustainability startups, and eco-travel tech companies, the island is carving a niche as the region's impact-tech capital.","2026-03-28T10:44:37.748283Z",{"id":44,"title":45,"slug":46,"excerpt":47,"locale":12,"category_name":35,"published_at":48},"d0200000-0000-0000-0000-000000000002","ASEAN Data Protection Patchwork: A Developer's Compliance Checklist","asean-data-protection-patchwork-developer-compliance-checklist","Seven ASEAN countries now have comprehensive data protection laws, each with different consent models, localization requirements, and penalty structures. Here is a practical compliance checklist for developers building multi-country applications.","2026-03-28T10:44:37.374741Z",{"id":50,"title":51,"slug":52,"excerpt":53,"locale":12,"category_name":35,"published_at":54},"d0200000-0000-0000-0000-000000000001","Indonesia's $29 Billion Digital Transformation: Opportunities for Software Companies","indonesia-29-billion-digital-transformation-opportunities-software-companies","Indonesia's IT services market is projected to reach $29.03 billion in 2026, up from $24.37 billion in 2025. Cloud infrastructure, AI, e-commerce, and data centers are driving the fastest growth in Southeast Asia.","2026-03-28T10:44:37.349311Z",{"id":13,"name":56,"slug":57,"bio":58,"photo_url":19,"linkedin":19,"role":59,"created_at":60,"updated_at":60},"Open Soft Team","open-soft-team","The engineering team at Open Soft, building premium software solutions from Bali, Indonesia.","Engineering Team","2026-03-28T08:31:22.226811Z"]