Istio Ambient Mesh Performance Test and Benchmarking

Introduction

Istio is the most popular service mesh, but the DevOps and SREs community constantly complain about its performance. 

Istio Ambient is a sidecar-less approach by the Istio committee (majorly driven by SOLO.io) to improve performance. Since there are many promotions about Ambient mesh being production-ready, many of our prospects and enterprises are generally eager to try or migrate to Ambient mesh. 

Architecturally, the Istio Ambient mesh is a great design that improves performance. But whether it performs quickly is still a question.

We have tried Istio Ambient Mesh and observed the performance countless times between January 2024 and July 2024, and we have yet to see any significant performance gains. 

Below is the lab setup on which we ran our experiments.

Lab setup to load test Istio Ambient Mesh

  1. Load testing tool: Fortio
  2. Application configuration:  Bookinfo  Application
  3. Total requests fired: 1000 queries/second (QPS), 10 connections, and for 30 seconds.
  4. Cluster Configuration: Azure (AKS) clusters with 3 nodes
  5. Node Configuration: 2 VCPU and 7GB memory for each node
  6. CNI used: Kube CNI and Cilium. (We did not use Flannel because it was not working well with AKS.)

Note:

  1. Application and Fortio are kept in different nodes:
  1. We have exposed the Rating microservice and NOT Details service to handle external traffic. Because the Details microservice is written in Ruby, it is unfit for handling higher QPS. We sent a load of 100 QPS and 1000 QPS to the Details service without Istio, and the p99 latency for 100 QPS is around 6 ms, but it goes up to 50 ms for 1000 QPS. Refer to the screenshot below. 
Kube CNI + Without Istio sidecar with Details microservice
- 100 QPS

Fig: Performance results of Details service at 1000 QPS with Kube CNI + Without Istio 

Kube CNI + With Istio sidecar with Details microservice
- 1000 QPS

Fig: Performance results of Details service at 1000 QPS with Kube CNI + with Istio sidecar

Performance test on Istio Ambient Mesh with Kube CNI and Cilium

We have carried out the performance or load test for the following cases:

  1. Kube CNI 
  2. Kube CNI + Istio sidecar (mTLS enabled)
  3. Kube CNI + Istio Ambient mesh (mTLS enabled)
  4. Cilium CNI
  5. Cilium CNI + Istio sidecar (mTLS enabled)
  6. Cilium CNI + Istio Ambient mesh (mTLS enabled)

Although we have tested each case multiple times, we have attached only three screenshots to showcase the standard deviation of P99 latency in each case. 

Load test results for Kube CNI without Istio 

Observed (Median) P99 latency: 1.12ms

Kube CNI + Without Istio

Fig 1: Kube CNI + Without Istio

Fig 2: Kube CNI + Without Istio

Load test of Kube CNI and Istio sidecar (mTLS enabled)

Observed (Median) P99 latency: 4.72 ms

Fig3: Kube CNI + With Istio Sidecar (mtLS enabled)

Fig4: Kube CNI + With Istio Sidecar (mtLS enabled)

Fig5: Kube CNI + With Istio Sidecar (mtLS enabled)

Load test of Kube CNI and Istio Ambient mesh (mTLS enabled)

Observed (Median) P99 latency: 3.6 ms

Fig6: Kube CNI + With Istio Ambient (mtLS enabled)

Fig7: Kube CNI + With Istio Ambient (mtLS enabled)

Fig8: Kube CNI + With Istio Ambient (mtLS enabled)

Load test of Cilium CNI without Istio

Observed (Median) P99 latency: 4.5 ms

Fig9: Cilium CNI + Without Istio

Fig10: Cilium CNI + Without Istio

Fig11: Cilium CNI + Without Istio

Load test of Cilium CNI and Istio sidecar (mTLS enabled)

Observed (Median) P99 latency: 8.8 ms

Fig12: Cilium CNI + With Istio Sidecar

Fig13: Cilium CNI + With Istio Sidecar

Fig14: Cilium CNI + With Istio Sidecar

Load test of Cilium CNI and Istio Ambient mesh (mTLS enabled)

Observed (Median) P99 latency: 6.8 ms

Fig15: Cilium CNI + With Istio Ambient

Fig16: Cilium CNI + With Istio Ambient

Fig17: Cilium CNI + With Istio Ambient

Final load test results and benchmarking of Rating service with and without Istio

Here are the benchmarking results for the p99 latency of the Rating service with and without Istio (sidecar and Ambient mesh). 

Conclusion

Three items are concluded from the experimentation:

  1. The performance of Istio Ambient mesh will never give you thunderbolt improvements over latency when compared with plain Kube CNI. Note that using Ztunnel for encryption still involves network hops, which will increase the latency. Yes, it is better than Istio sidecar architecture. 
  2. Irrespective of the CNI used, the performance (p99 latency) of the Istio Ambient Mesh is 20% better than that of the Istio sidecar. 
  3. Combining Cilium and Istio (sidecar or Ambient) produces undesirable results. If you are looking for performance improvements, you should avoid this mix.
Debasree Panda

Debasree Panda

Debasree is the CEO of IMESH. He understands customer pain points in cloud and microservice architecture. Previously, he led product marketing and market research teams at Digitate and OpsMx, where he had created a multi-million dollar sales pipeline. He has helped open-source solution providers- Tetrate, OtterTune, and Devtron- design GTM from scratch and achieve product-led growth. He firmly believes serendipity happens to diligent and righteous people.

Leave a Reply