Following Part I of this series, where we described the different services available, here in Part II, we’ll take a more in-depth look at each cloud container service and the differences between them. If any of the services are not familiar to you, then it may be worth referring back to Part I to get more context.
Between the container workload services offered by AWS, Azure, and Google Cloud, there are differences pertaining to features, pricing, maintenance, and performance. This post provides direct comparisons of these dimensions and also describes some of the pitfalls when comparing cloud container services.
Kubernetes Feature Comparison
AKS, EKS & GKE
Comparing specific features on such a fast-moving product like Kubernetes can be a very tricky affair, as there is currently an arms race between the cloud providers to outdo each other and grab market share.
Here is a table of some key feature comparisons that are still valid as of June 2020:
Even comparing these features can get murky. For example, AKS offers 1.16.7 (but not as the default, suggesting it isn’t fully “bedded in” yet) and up to 1.18.2 in “preview.” EKS offers a later version by default (1.16), but there’s no “preview” option at all for later versions, and GKE offers an almost identical set of versions, down to the point release. In addition to the fine print, there are also different SLAs depending on how you manage the placement of your servers on zones and regions.
Pricing models are easier to compare but are also in constant flux. Until June 2020, GKE’s control plane was free to run, and EKS was $0.20/hour. Google and Amazon both shifted their prices by 10 cents per hour to meet at the same price, while AKS remains (for now) free.
AWS Fargate, Google Cloud Run & Azure Container Instances
These serverless container services offer a similar feature set at a high level, but their architectures result in some significant differences as you start to dig deeper. Similar to AWS Fargate, Google Cloud Run is a serverless container platform that runs on Kubernetes, while Azure Container Instances (ACI) features its own technology underneath.
Both Cloud Run and ACI do not support the Kubernetes pod concept at the consumer level though, so if you want similar behavior, you have to define it yourself. For example, if you want to launch multiple containers as one unit, you have to do it separately and manage it yourself.
Because it’s built on Knative, another Google technology, Cloud Run does have the capability to scale to zero and auto-scale out of the box, while Fargate requires some configuration between AWS’s different cloud services to achieve this. In terms of security, Cloud Run offers less isolation for your workloads, with only gVisor separation available in the Google Cloud managed service. By contrast, Fargate and ACI isolate workloads at the virtual machine (VM) level, which is a more proven and mature separation that security-sensitive organizations are more comfortable with.
While Cloud Run doesn’t offer a huge range of features for peace of mind when it comes to security, it scores high on developer experience. It’s relatively quick and easy to get a service up and able to serve customer requests. Meanwhile, Fargate can give the architect far more flexibility by using the many other services that AWS offers in conjunction with it.
Cloud Container Services Price Comparison
Pricing is a much-debated area when it comes to cloud computing, and containerized workloads are no exception. We’ve already looked at raw control plane costs for Kubernetes products across cloud providers, but there are other options to consider here. For example, for AKS, you can pay for an uptime SLA for the Kubernetes API server at a price of $0.10 per cluster.
The main cost you are likely to bear is for worker nodes running within your clusters. The pricing of these is as complex as the pricing of standard cloud VMs. Just like with standard VMs, you have the choice of using larger or smaller instances, CPU- or memory-optimized instances, spot or reserved instances, and so on.
At first glance, serverless costs might seem simpler to compare:
But even here you need to be careful. For example, your mileage may vary in terms of the raw speed of the vCPU provisioned. The startup time of workloads may also differ per provider. And this is all before you consider other associated costs, like network ingress and egress traffic, cost per request, or free tier usage.
Maintenance and Ease of Setup
The cost of a service is, of course, not just what’s on the rate card. The true cost of ownership also includes the cost of setup and maintenance when using the services. Again, the picture here when comparing providers is complicated, and you need to base your choice on your given circumstances, on whether the provider’s strengths align with your particular needs. But your particular requirements aside, there’s a strong case to be made that Google comes out on top here.
If we take ease-of-setup as one of the criteria, then Google’s services have a relatively good reputation in terms of its services’ ease-of-use and elegance. This may be partly because their services are less mature than those of AWS, which must integrate and conform to many other existing–even arguably “legacy”–services. But even though AWS services are known for being relatively clunky, in practice, with supporting tools like Terraform focussing much of their effort on the most popular cloud provider, this challenge can be obviated. Azure is catching up fast.
In terms of monitoring, GKE comes out as a winner for its easy setup and maintenance, with Stackdriver being well-integrated with GKE out of the box. AWS provides no such integrated solution, and Azure requires you to set up Istio to get full integration with its Monitor and Application Insights services.
Auto-scaling of worker nodes is another area where Google’s offering is strong. Setting this up requires minimal configuration from the user to define its behavior, whereas AWS and Azure require more configuration and setup.
Performance analysis of running container workloads presents a similarly mixed picture. Obviously, the performance of container workloads rests on the performance of each cloud provider’s infrastructure, such as network and disk speed.
The most reputable report on the subject, the “2020 Cloud Report” from Cockroach Labs, ran standard TCP-C performance tests and network performance tests as well as an analysis of performance per dollar. It concluded that:
- AWS and GCP are roughly tied for CPU performance
- GCP beat out AWS for network performance, while Azure came in third with significantly worse network latency
- AWS did best on storage I/O write throughput, followed by Azure and then GCP
- AWS also came in first for storage I/O read throughput, with Azure and GCP tying for second
- GCP edged out Azure and AWS slightly for TPC-C performance per dollar
Finally, the report noted that GCP significantly improved its overall score in 2020 and that cloud providers are constantly upgrading their infrastructure. For this reason, it’s quite hard to make definitive recommendations when it comes to container performance.
Conclusion and Recommendations
When comparing cloud providers’ container services, there are a bewildering number of factors you need to consider. As has been shown in the discussion above, it’s difficult to be definitive about any cloud provider’s objective merits, but some patterns do emerge.
While AWS and Azure container services offer a great deal of power when combined with their other services, this advantage can also make them more difficult to set up, configure, and maintain. On the other hand, if you’re after relative simplicity, Google may be your best bet. In the end, unless you have a very specific reason to prefer one vendor over another, your decision will likely depend on which cloud provider you’re already using.