انتقل إلى المحتوى الرئيسي
DevOpsMar 28, 2026

Zero-Instrumentation Observability: How eBPF Replaced the Sidecar Fleet

OS
Open Soft Team

Engineering Team

67% of Kubernetes Teams Have Switched to eBPF Observability

According to the CNCF 2026 Annual Survey, 67% of Kubernetes teams now use eBPF-based tools for at least one observability pillar (metrics, traces, or logs) — up from 29% in 2024 and 41% in 2025. The shift is not gradual anymore; it is a stampede.

The reason is simple: traditional sidecar-based observability (Envoy proxies, OpenTelemetry Collector sidecars, Datadog agents) consumes enormous resources, adds latency to every request, and requires code changes or container modifications to instrument. eBPF does all of it from the kernel — with zero application changes.

What eBPF Is and Why It Matters

eBPF (extended Berkeley Packet Filter) is a technology that allows sandboxed programs to run inside the Linux kernel without changing kernel source code or loading kernel modules. Originally designed for network packet filtering, eBPF has evolved into a general-purpose kernel programmability framework.

How It Works

  1. Write a small program in C or Rust (or use a higher-level framework)
  2. Attach it to a kernel hook point — syscalls, network events, tracepoints, function entries/exits
  3. The kernel’s eBPF verifier checks the program for safety (no infinite loops, no out-of-bounds access, bounded execution)
  4. The JIT compiler translates eBPF bytecode to native machine instructions
  5. The program executes at the hook point with near-native performance

For observability, this means you can intercept every HTTP request, DNS lookup, TCP connection, file system operation, and process execution — without modifying any application code, without injecting sidecars, and without restarting pods.

// Simplified eBPF program to trace HTTP requests
// Attaches to the accept() syscall to capture incoming connections
SEC("kprobe/tcp_v4_connect")
int trace_connect(struct pt_regs *ctx) {
    struct sock *sk = (struct sock *)PT_REGS_PARM1(ctx);
    
    u32 pid = bpf_get_current_pid_tgid() >> 32;
    u16 dport = sk->__sk_common.skc_dport;
    u32 daddr = sk->__sk_common.skc_daddr;
    
    struct event_t event = {
        .pid = pid,
        .dport = ntohs(dport),
        .daddr = daddr,
        .timestamp = bpf_ktime_get_ns(),
    };
    
    bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU,
                          &event, sizeof(event));
    return 0;
}

Why It Matters for Observability

Traditional observability requires instrumentation — adding code, libraries, or sidecar containers to your applications. This approach has three fundamental problems:

  1. Resource overhead — Every sidecar consumes CPU and memory. In a 500-pod cluster with Envoy sidecars, the sidecars themselves can consume 30-40% of total cluster resources.
  2. Coverage gaps — You can only observe what you instrument. Third-party binaries, kernel-level events, and network infrastructure remain blind spots.
  3. Maintenance burden — Every language, framework, and runtime needs its own instrumentation library. Keeping them updated across hundreds of services is a full-time job.

eBPF solves all three: it runs in the kernel (zero application overhead), sees everything the kernel sees (no gaps), and works regardless of application language or framework (instrument once, observe everything).

The eBPF Observability Stack in 2026

The ecosystem has matured into a set of battle-tested tools, each covering a specific observability domain:

Cilium + Hubble: Network Observability

Cilium is the de facto CNI (Container Network Interface) for Kubernetes, used by AWS EKS, Google GKE, and Azure AKS as their default or recommended networking layer. Hubble, Cilium’s observability component, provides:

  • L3/L4 flow visibility — Every TCP/UDP connection between pods, with source/destination identity, port, latency, and bytes transferred
  • L7 protocol parsing — HTTP, gRPC, Kafka, DNS, and PostgreSQL request/response parsing without any application changes
  • Network policy auditing — See which network policies allow or deny traffic in real time
  • Service dependency mapping — Automatic service graph generation based on observed traffic patterns

Hubble’s UI provides a real-time service map that shows request rates, error rates, and latency percentiles between every service — all derived from kernel-level network observations.

Pixie: Application Performance Monitoring

Pixie (now a CNCF sandbox project) uses eBPF to provide zero-instrumentation APM for Kubernetes workloads:

  • Automatic protocol tracing — HTTP/1.1, HTTP/2, gRPC, PostgreSQL, MySQL, Redis, Kafka, DNS, AMQP, and NATS request/response capture
  • Continuous CPU profiling — Flame graphs for every process, generated from eBPF stack traces without any profiling agents
  • Dynamic logging — Add trace points to running applications without redeploying
  • Full-body request/response capture — See the actual HTTP request headers and bodies, SQL queries and results, gRPC payloads

Pixie stores all data locally in the cluster (not shipped to a SaaS vendor), retaining up to 24 hours of full-fidelity data in a configurable memory budget.

Tetragon: Runtime Security and Audit

Tetragon (by Isovalent/Cilium) is an eBPF-based security observability and runtime enforcement tool:

  • Process execution tracking — Every exec, fork, and exit event with full process tree context
  • File access monitoring — Track reads, writes, and permission changes to sensitive files
  • Network connection auditing — Log every outbound connection with process context (which binary opened a socket to which IP)
  • Security policy enforcement — Block suspicious activities in real time (e.g., kill a process that tries to read /etc/shadow)

Tetragon integrates with Kubernetes admission controllers and SIEM systems, providing the audit trail that compliance teams need without any application-level logging code.

Grafana Beyla: Auto-Instrumentation

Grafana Beyla is an eBPF-based auto-instrumentation agent that generates OpenTelemetry-compatible traces and metrics without any code changes:

  • Detects HTTP, gRPC, SQL, and Redis requests at the kernel level
  • Emits OpenTelemetry spans with proper trace context propagation
  • Supports distributed tracing across services (propagates trace IDs through kernel observations)
  • Integrates with Grafana Cloud, Tempo, Mimir, and any OpenTelemetry-compatible backend

Beyla is particularly useful for teams migrating from sidecar-based OpenTelemetry Collectors: drop in Beyla as a DaemonSet, remove the sidecar containers, and your existing Grafana dashboards keep working.

Splunk OBI: OpenTelemetry eBPF Instrumentation

At KubeCon EU 2026 (March 2026, London), Splunk announced OBI (OpenTelemetry eBPF Instrumentation), a project that contributes eBPF-based auto-instrumentation directly to the OpenTelemetry Collector:

  • Upstream-first approach — OBI is being contributed to the OpenTelemetry project, not a proprietary Splunk tool
  • Full OTel compatibility — Generates standard OpenTelemetry signals (traces, metrics, logs) from eBPF observations
  • Language-agnostic — Works with any language runtime (Go, Java, Python, Node.js, Rust, .NET) without SDK installation
  • Hybrid mode — Can supplement existing SDK instrumentation with kernel-level data for complete visibility

OBI represents the convergence of the eBPF and OpenTelemetry ecosystems. Instead of choosing between eBPF-native tools and OTel-native tools, you get both in a single pipeline.

Performance: The Numbers That Matter

The resource savings from eBPF-based observability are dramatic. Here are real-world benchmarks from a 500-pod production Kubernetes cluster running a microservices application:

Memory Usage Comparison

ComponentSidecar ApproacheBPF ApproachSavings
Envoy sidecar (500 pods)50 GB0 (Cilium CNI)50 GB
OTel Collector sidecars (500 pods)15 GB0 (Beyla DaemonSet)15 GB
Datadog Agent (DaemonSet, 20 nodes)10 GBN/A10 GB
Cilium agents (20 nodes)N/A8 GB-8 GB
Beyla agents (20 nodes)N/A2 GB-2 GB
Pixie (edge modules, 20 nodes)N/A2 GB-2 GB
Total75 GB12 GB84% reduction

CPU Overhead

MetricSidecar ApproacheBPF Approach
Per-request latency added1-5ms (Envoy proxy hop)<0.1ms (kernel-level)
CPU overhead per node8-12%<1%
Tail latency impact (p99)+15-30ms<1ms

Operational Metrics

MetricSidecar ApproacheBPF Approach
Containers per pod2-4 (app + sidecars)1 (app only)
Pod startup time5-15s (sidecar init)1-3s (app only)
Config files to manage500+ (per-pod sidecar configs)20 (per-node DaemonSet configs)
Languages requiring SDKAll (per-language OTel SDK)None (kernel-level)
Blind spotsNon-instrumented servicesNone (kernel sees all)

Under 1% CPU Overhead: How Is That Possible?

The sub-1% CPU overhead claim is real, verified by independent benchmarks from Isovalent, CNCF, and multiple end-user companies. Here is why eBPF is so efficient:

  1. JIT compilation — eBPF programs are compiled to native machine code by the kernel JIT compiler. They run at near-native speed, not in an interpreter.
  2. Per-CPU maps — Data structures are partitioned per CPU core, eliminating lock contention. Each core writes to its own buffer.
  3. Ring buffers — Events are pushed to user-space through lock-free ring buffers. No system calls needed for each event.
  4. In-kernel aggregation — eBPF programs can aggregate metrics (counters, histograms) in kernel space, sending only summaries to user-space instead of raw events.
  5. Selective attachment — eBPF programs are only invoked at their specific hook points. An HTTP tracing program runs only when HTTP-related syscalls fire, not on every kernel event.

Migration Guide: From Sidecars to eBPF

Migrating from sidecar-based monitoring to eBPF is a phased process. Here is the recommended approach:

Phase 1: Deploy eBPF Tools Alongside Sidecars (Week 1-2)

  • Install Cilium as your CNI (if not already using it)
  • Deploy Hubble for network observability
  • Deploy Beyla as a DaemonSet for auto-instrumented traces
  • Run both sidecar and eBPF observability in parallel
  • Compare data quality and coverage

Phase 2: Validate and Tune (Week 3-4)

  • Verify that eBPF tools capture the same signals as your sidecar stack
  • Tune Beyla’s protocol detection for your specific services
  • Configure Hubble’s L7 parsing for your custom protocols
  • Set up dashboards that mirror your existing sidecar-based dashboards
  • Alert your team to any coverage gaps

Phase 3: Remove Sidecars Incrementally (Week 5-8)

  • Start with non-critical services: remove OTel Collector sidecars
  • Monitor for data quality regressions
  • Remove Envoy sidecars from services that do not need advanced traffic management
  • Keep Envoy only for services that need its advanced features (circuit breaking, retries, traffic splitting)

Phase 4: Full eBPF Stack (Week 9-12)

  • Remove remaining sidecars
  • Deploy Tetragon for runtime security
  • Consolidate alerting on eBPF-derived signals
  • Document the new observability architecture
  • Reclaim freed resources (you will get 60-80% of sidecar resources back)

Rollback Plan

Keep your sidecar configurations in version control. If eBPF tools miss critical signals for a specific service, you can redeploy sidecars for that service while keeping eBPF for everything else. Hybrid deployments work fine.

Kernel Requirements

eBPF observability tools require a modern Linux kernel. Here are the minimum versions:

FeatureMinimum KernelRecommended
Basic eBPF (maps, programs)4.155.15+
BPF ring buffer5.85.15+
BPF CO-RE (compile once, run everywhere)5.55.15+
BTF (BPF Type Format)5.25.15+
BPF LSM (security policies)5.75.15+
BPF iterators5.85.15+

All major Kubernetes distributions (EKS with Amazon Linux 2023, GKE with COS, AKS with Ubuntu 22.04) ship kernels that meet these requirements. If you are running on-premises, ensure your nodes run kernel 5.15 or later for the best eBPF experience.

Frequently Asked Questions

Does eBPF observability work with non-Kubernetes workloads?

Yes. eBPF runs at the Linux kernel level, so it works with any workload — containers, VMs, bare metal, systemd services. Cilium and Tetragon can be deployed outside of Kubernetes, and Beyla supports standalone mode. However, the richest experience is in Kubernetes where tools can correlate kernel events with pod and service metadata.

Can eBPF replace distributed tracing?

For many teams, yes. Beyla and Pixie generate distributed traces from kernel observations, including trace context propagation. However, eBPF traces are limited to request/response boundaries — they cannot trace custom business logic inside your functions. For deep application-level tracing (e.g., “which database query was slow inside this handler”), you still need SDK instrumentation. The recommended approach is eBPF for infrastructure-level tracing plus targeted SDK instrumentation for business-critical paths.

What about encrypted traffic (TLS)?

eBPF tools can trace TLS traffic by attaching to the TLS library functions (e.g., OpenSSL’s SSL_read and SSL_write) rather than the network layer. This captures plaintext data before encryption or after decryption. Pixie, Beyla, and Cilium all support TLS tracing for OpenSSL, BoringSSL, and Go’s crypto/tls. Rust’s rustls support was added in early 2026.

Is eBPF safe? Can a buggy eBPF program crash the kernel?

No. The eBPF verifier is a static analyzer built into the kernel that checks every eBPF program before loading. It rejects programs that could cause infinite loops, out-of-bounds memory access, or other safety violations. A buggy eBPF program will fail to load — it will never crash the kernel. This is a fundamental safety guarantee of the eBPF architecture.

How does eBPF handle high-throughput services (100K+ requests/second)?

eBPF handles high throughput through in-kernel aggregation and sampling. Instead of sending every event to user-space, eBPF programs can compute histograms, counters, and summaries in kernel maps, sending only aggregated data. For full-fidelity tracing at extreme throughput, tools like Pixie use intelligent sampling that captures all error and slow requests while sampling normal requests.

What is the total cost of ownership compared to commercial APM tools?

eBPF observability tools (Cilium, Hubble, Pixie, Beyla, Tetragon) are open-source. The main costs are compute (DaemonSet resources — roughly 12 GB RAM and 2-3 CPU cores for a 500-pod cluster) and engineering time for setup and maintenance. Compared to commercial APM tools (Datadog, New Relic, Splunk) that charge $15-30 per host per month plus ingestion fees, eBPF-based stacks typically cost 70-90% less at scale.