Zum Hauptinhalt springen
EngineeringMar 28, 2026

MCP in Production: Solving Transport, Auth, and Scaling Challenges

OS
Open Soft Team

Engineering Team

From Prototype to Production: What Changes

Building an MCP server that works on your laptop is straightforward. Running one that handles thousands of concurrent AI agent sessions across a distributed infrastructure is a different engineering challenge entirely. Production MCP deployments must address five concerns that prototypes can ignore: transport scalability, authentication and authorization, session management at scale, audit trails, and multi-server orchestration.

This article is a technical guide for engineering teams moving MCP servers from development to production. We assume you have built at least one MCP server and understand the basics of the protocol. If not, start with our companion article on building your first MCP server.

Transport Scalability: stdio vs SSE vs Streamable HTTP

MCP defines three transport mechanisms. Choosing the right one for production is your first architectural decision.

stdio Transport

The stdio transport communicates via standard input/output streams. The host application spawns the MCP server as a child process and exchanges JSON-RPC messages through stdin/stdout.

Advantages:

  • Zero network configuration
  • Process-level isolation
  • No port conflicts
  • Lowest latency (no network stack)

Limitations:

  • Server must run on the same machine as the host
  • One server process per client session
  • Cannot be load-balanced
  • No horizontal scaling

Best for: Local development tools, IDE extensions, single-user desktop applications.

SSE (Server-Sent Events) Transport

SSE transport uses HTTP for client-to-server messages and Server-Sent Events for server-to-client messages. The server runs as an HTTP service.

Advantages:

  • Network-accessible (remote servers)
  • Compatible with existing HTTP infrastructure
  • Supports multiple concurrent clients
  • Works through firewalls and proxies

Limitations:

  • Unidirectional streaming (server-to-client only via SSE)
  • Session affinity required (stateful connections)
  • Some load balancers struggle with long-lived SSE connections
  • No built-in reconnection semantics in the protocol

Best for: Small to medium deployments, internal tools, teams with existing HTTP infrastructure.

Streamable HTTP Transport

Streamable HTTP is the newest transport, designed specifically for production deployments. It uses standard HTTP POST for all messages, with optional SSE streaming for long-running operations.

Advantages:

  • Fully stateless request/response model
  • Works with any HTTP load balancer
  • Built-in session management via Mcp-Session-Id header
  • Supports both streaming and non-streaming responses
  • CDN and proxy compatible

Limitations:

  • Requires server-side session storage (Redis, database)
  • Slightly higher per-message overhead than stdio
  • Newer transport — less ecosystem tooling

Best for: Production cloud deployments, multi-tenant platforms, enterprise environments.

Transport Comparison Matrix

FeaturestdioSSEStreamable HTTP
Network accessLocal onlyRemoteRemote
Load balancingNot possibleSession-stickyStandard HTTP LB
Horizontal scalingNoLimitedYes
Firewall-friendlyN/AYesYes
ReconnectionN/AManualBuilt-in
Concurrent clients1ManyMany
LatencyLowestLowLow
StatelessnessStatefulStatefulStateless possible
Production readinessDev onlyMediumHigh

Authentication: SSO Integration, API Keys, and OAuth

MCP servers in production must authenticate both the AI host application and the end user on whose behalf the AI is acting.

OAuth 2.0 for Remote Servers

The MCP specification includes a built-in OAuth 2.0 flow for remote (HTTP-based) servers. The flow works as follows:

  1. The MCP client sends an initialize request without credentials.
  2. The server responds with HTTP 401 and a WWW-Authenticate header pointing to its OAuth authorization endpoint.
  3. The host application opens a browser for user authentication.
  4. After successful auth, the host receives an access token.
  5. Subsequent MCP requests include the token in the Authorization: Bearer header.
// Server-side OAuth middleware for Axum/Express
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import express from "express";

const app = express();

app.use("/mcp", async (req, res, next) => {
  const token = req.headers.authorization?.replace("Bearer ", "");
  if (!token) {
    res.status(401).json({
      error: "unauthorized",
      oauth_url: "https://auth.example.com/oauth/authorize",
    });
    return;
  }

  // Validate token against your auth provider
  const user = await validateToken(token);
  if (!user) {
    res.status(403).json({ error: "invalid_token" });
    return;
  }

  // Attach user context for tool authorization
  req.mcpUser = user;
  next();
});

API Key Authentication

For service-to-service MCP communication (where no human user is involved), API keys are simpler:

const API_KEYS = new Map([
  ["sk-prod-abc123", { name: "analytics-service", scopes: ["read"] }],
  ["sk-prod-def456", { name: "admin-service", scopes: ["read", "write"] }],
]);

function authenticateApiKey(key: string) {
  return API_KEYS.get(key) || null;
}

SSO Integration Pattern

For enterprise deployments, integrate with existing SSO (SAML, OIDC):

User -> AI Host -> MCP Client -> MCP Server -> SSO Provider
                                      |
                                      v
                              Validate OIDC token
                              Extract user roles
                              Apply tool-level RBAC

Each MCP tool can check the authenticated user’s roles before executing:

server.tool(
  "delete_record",
  "Delete a database record by ID",
  { table: z.string(), id: z.string() },
  async ({ table, id }, { authContext }) => {
    if (!authContext.roles.includes("admin")) {
      return {
        content: [{ type: "text", text: "Forbidden: admin role required" }],
        isError: true,
      };
    }
    // Proceed with deletion
  }
);

Scaling: Stateful Sessions vs Load Balancers

MCP sessions are inherently stateful: the initialize handshake negotiates capabilities, and the server may maintain context across tool calls within a session. This creates a tension with horizontal scaling.

Session Store Architecture

Extract session state from the server process into an external store:

interface McpSession {
  id: string;
  userId: string;
  capabilities: ServerCapabilities;
  createdAt: Date;
  lastActivityAt: Date;
  metadata: Record<string, unknown>;
}

class RedisSessionStore {
  constructor(private redis: Redis) {}

  async create(session: McpSession): Promise<void> {
    await this.redis.set(
      `mcp:session:${session.id}`,
      JSON.stringify(session),
      "EX",
      3600 // 1 hour TTL
    );
  }

  async get(sessionId: string): Promise<McpSession | null> {
    const data = await this.redis.get(`mcp:session:${sessionId}`);
    return data ? JSON.parse(data) : null;
  }

  async touch(sessionId: string): Promise<void> {
    await this.redis.expire(`mcp:session:${sessionId}`, 3600);
  }
}

Horizontal Scaling with Streamable HTTP

With externalized sessions and Streamable HTTP transport, you can run multiple MCP server instances behind a standard load balancer:

                    ┌──────────────┐
                    │  Load        │
     MCP Client ──>│  Balancer    │
                    │  (nginx/ALB) │
                    └──────┬───────┘
                           │
              ┌────────────┼────────────┐
              v            v            v
        ┌──────────┐ ┌──────────┐ ┌──────────┐
        │ MCP      │ │ MCP      │ │ MCP      │
        │ Server 1 │ │ Server 2 │ │ Server 3 │
        └────┬─────┘ └────┬─────┘ └────┬─────┘
             │             │             │
             v             v             v
        ┌────────────────────────────────────┐
        │          Redis (sessions)          │
        └────────────────────────────────────┘

No session-sticky routing is required. Any server instance can handle any request by loading the session from Redis.

Auto-Scaling Configuration

Example Kubernetes HPA for MCP servers:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: mcp-server
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: mcp-server
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Pods
      pods:
        metric:
          name: mcp_active_sessions
        target:
          type: AverageValue
          averageValue: "100"

Audit Trails and Logging

Every MCP tool invocation in production must be logged for security, compliance, and debugging.

Structured Logging Schema

interface McpAuditLog {
  timestamp: string;
  sessionId: string;
  userId: string;
  toolName: string;
  toolInput: Record<string, unknown>;
  toolOutput: string;
  durationMs: number;
  success: boolean;
  errorMessage?: string;
  ipAddress: string;
  userAgent: string;
}

// Middleware that wraps every tool call
function auditMiddleware(server: McpServer, logger: Logger) {
  const originalTool = server.tool.bind(server);

  server.tool = (name, description, schema, handler) => {
    originalTool(name, description, schema, async (args, context) => {
      const start = Date.now();
      try {
        const result = await handler(args, context);
        logger.info({
          event: "mcp_tool_call",
          tool: name,
          input: args,
          durationMs: Date.now() - start,
          success: !result.isError,
          sessionId: context.sessionId,
          userId: context.authContext?.userId,
        });
        return result;
      } catch (error) {
        logger.error({
          event: "mcp_tool_error",
          tool: name,
          input: args,
          error: (error as Error).message,
          durationMs: Date.now() - start,
          sessionId: context.sessionId,
        });
        throw error;
      }
    });
  };
}

Compliance Considerations

RequirementImplementation
GDPR data accessLog which user data the AI accessed via MCP tools
SOC 2 audit trailImmutable logs of all tool invocations with timestamps
PII protectionRedact sensitive fields in tool inputs before logging
Data retentionSet log TTLs matching your compliance requirements
Access reviewLog authentication events and permission checks

Gateway Patterns for Multi-Server Deployments

Enterprise environments often need dozens of MCP servers — one for each internal system. An MCP Gateway centralizes management:

                          ┌──────────────┐
                          │  MCP Gateway │
        MCP Client ──────>│              │
                          │  - Auth      │
                          │  - Routing   │
                          │  - Rate limit│
                          │  - Audit     │
                          └──────┬───────┘
                                 │
              ┌──────────────────┼──────────────────┐
              v                  v                  v
        ┌──────────┐     ┌──────────┐      ┌──────────┐
        │ Jira MCP │     │ Slack MCP│      │ DB MCP   │
        │ Server   │     │ Server   │      │ Server   │
        └──────────┘     └──────────┘      └──────────┘

The gateway provides:

  1. Unified authentication: One auth flow for all backend servers.
  2. Tool namespace routing: Tools are prefixed with server names (jira.create_issue, slack.send_message).
  3. Centralized rate limiting: Per-user, per-tool, and per-server rate limits.
  4. Request routing: Route tool calls to the appropriate backend server.
  5. Circuit breaking: If a backend server is unhealthy, the gateway returns graceful errors instead of hanging.

Implementing a Basic Gateway

class McpGateway {
  private backends = new Map<string, McpClient>();

  async registerBackend(prefix: string, url: string) {
    const client = new McpClient({ name: `gateway-${prefix}` });
    const transport = new StreamableHTTPClientTransport(new URL(url));
    await client.connect(transport);
    this.backends.set(prefix, client);
  }

  async routeToolCall(
    toolName: string,
    args: Record<string, unknown>
  ) {
    // toolName format: "prefix.actualTool"
    const [prefix, ...rest] = toolName.split(".");
    const actualTool = rest.join(".");
    const backend = this.backends.get(prefix);

    if (!backend) {
      throw new Error(`Unknown backend: ${prefix}`);
    }

    return backend.callTool({ name: actualTool, arguments: args });
  }
}

FAQ

Q: Should I use stdio or HTTP transport in production? A: Use Streamable HTTP for any deployment that needs to scale beyond a single machine. Use stdio only for local desktop integrations where the AI host and MCP server run on the same machine.

Q: How many concurrent sessions can one MCP server handle? A: With Streamable HTTP and externalized sessions, a single Node.js server instance typically handles 500-1,000 concurrent sessions. With Rust or Go servers, 5,000-10,000 is achievable. Scale horizontally beyond that.

Q: How do I handle MCP server downtime? A: Implement health checks (/health endpoint), use Kubernetes liveness/readiness probes, and configure your gateway with circuit breakers. The MCP client will receive connection errors that the AI host can present as “tool temporarily unavailable.”

Q: Can MCP servers call other MCP servers? A: Yes. An MCP server can also be an MCP client, creating a chain. This is the foundation of agent-to-agent communication. Use this pattern carefully — deep chains increase latency and failure probability.

Q: How do I version my MCP server API? A: Use the server version field in the initialize response. For breaking changes (removing tools, changing schemas), bump the major version and support both old and new versions during a migration period. The MCP protocol version itself is negotiated during initialization.

Q: What is the maximum message size for MCP? A: The protocol does not define a maximum, but practical limits depend on the transport. For Streamable HTTP, keep responses under 10 MB. For stdio, system pipe buffers (typically 64 KB) may require chunking for large responses.