With the adoption of microservices, cloud, and Kubernetes, the CI/CD processes proved to be very useful in deploying features and changes into production faster. The new normal is the faster time to market with software delivery (or CI/CD) pipelines. What sets a company apart is releasing better, high-quality features that provide excellent consumer expectations or experience.
However, the fear of delivering a wrong software change to the production haunts DevOps and SREs teams. That’s why canary deployment is becoming famous among architects and the DevOps team.
What is Canary Deployment?
Canary deployment is a strategy to release software into production gradually. The process involves allowing a fraction of users to test newly deployed software. If all the criteria, such as performance and quality, are at par with the previous release, more users are allowed to use the new software. The iteration is carried out till the newly deployed software is rolled out completely to the production users.
Canary Deployment phases in CI/CD
While performing CI/CD using tools such as Spinnaker or Argo CD, DevOps and developers want to deploy using a canary strategy. The canary deployment is usually implemented in four phases (refer to Fig A).
- Deployment: 1st phase of canary deployment is the deployment of a new release. In this phase, CD tools and GitOps tools are used to deploy software.
- Release a small portion of traffic: After the new version is deployed, a small version of traffic is routed to the latest version, and most of the traffic can go to the stable version.
- Analysis and validation: In the analysis and validation phase, the canary is tested to see if it is working fine in terms of performance and quality. In case the canary performs as the stable version, then more traffic is routed to the canary.
- Rollback/Rollback: Based on the analysis phases, the canary can be rolled back or rolled forward to serve 100% of the traffic in the production.
Fig A: Phases of Canary deployment in CI/CD
These phases are important for reducing the risks of the software release while maintaining the speed of deployment.
There are times when architects, and DevOps teams might use canary deployment and canary analysis as the same. Well, read the next section to find out the difference and understand how canary analysis is an integral part of the canary deployment.
Canary Deployment vs Canary Analysis
Canary analysis is a part of the canary deployment process where the Ops team needs to validate the Canary with each increment of traffic percentage. Usually, during the canary analysis, an SRE or Ops person would collect the metrics and logs of the new release and validate the performance, quality, and security. If all the criteria are met, the traffic to the new version will increase further.
The perfect Canary analysis can be a bit tricky because it involves statistical evaluation of metrics and logs of the new version (Canary). Since a small load to the Canary will not be statistically relevant with the baseline version, a 3rd application with the baseline version is created to route the same amount of traffic as Canary (refer to Fig B). Let us understand the steps in detail.
The steps to perform canary analysis are:
- Two applications of the same stable version (baseline) must be created. Let us call them- B1 and B2.
- A new version of the application can be deployed in the same cluster. Let’s call it the canary version.
- In case you want to route a small percentage of traffic, say 5%, to the Canary. Then route another 5% of B1 and the rest 90% to B2.
- After a certain amount of time, like 4-5 mins, the Canary will be sent for performance and quality evaluation.
- The metrics such as CPU & memory utilization, latency and throughput of Canary is compared to that of the baseline: B1 version. And if any metrics of the Canary version are not performing in a particular range ( to that of B1), then the Canary will be deemed unfit for further rollout. Similarly, the application logs, security logs, traffic logs, and API logs will also be collected to understand the behavior of the Canary. If the behavior is very dissimilar to that of the B1 version, the Canary can be rolled back.
- In case the Canary version performs as well as B1, then the traffic percentage can be increased to, say, 15% for each application – Canary and B1. The Remaining 70% can be redirected to B2.
Fig B: Way to perform canary analysis in canary deployment