The breaking down of large monolithic applications into multiple microservices has resulted in its own set of challenges. Service Meshes are the solution to some of these problems that have arisen as part of the microservices evolution. This article will explore the basics of a Service Mesh and the problem it’s trying to solve, as well as its various features and different open-source/commercial offerings. Then, you can decide if a Service Mesh is the right solution for your organization.
A Brief History of Service Mesh
Before we dig deeper into the Service Mesh concept, let’s take a look at the history behind Service Meshes today.
When they began breaking down their large monolithic applications into multiple microservices, many big tech companies started facing challenges. For example, there was no dynamic way to reroute network traffic when failures occurred or when new containers needed to be deployed.
The solution these companies came across was to start writing a fat client library. So, for example, in 2000, Google began developing Stubby, an RPC framework that ultimately evolved into gRPC; you can see some gRPC characteristics in today’s Service Mesh framework Istio.
The problem with the fat client library is that every time a developer needs to make a change, he/she needs to go to every container, re-inject the code, and redistribute it. This is simply not scalable, so companies decided to have a centralized source on each node and came up with the node-agent concept. This way, all of the traffic from one container talks to the node agent, which is responsible for transmitting the data from that node to the next node or the control plane.
This approach thinned down the code and made it easier to deploy. But the drawback here is that if the agent fails, then you will lose all the containers. So, developers came up with a new capability called sidecar containers, which run alongside your application container. Most of the current Service Mesh concept today is based on sidecar containers.
A recent development in the Service Mesh field came in 2010 when Twitter created the Scala-powered Finagle, which eventually emerged as Linkerd. Additional noticeable releases in Service Mesh history are HashiCorp Consul (2014), Istio (2017), Linkerd (2018), and Kuma (2019).
What Is a Service Mesh?
While designing any solution, the first thing you need to consider is the right architectural pattern. Some available options are:
Which architectural pattern to choose depends upon your particular use case and requirements, as each pattern presents its own strengths and weaknesses. To get the best of both worlds, you can also combine architectural patterns. One such example of this is a Service Mesh where you can express your application as independent, but at the same time, it also involves interacting microservices.
What Problem Does a Service Mesh Solve?
Without a Service Mesh, you would need to add custom code with logic to manage each microservice’s communication. Debugging failures can be challenging to diagnose because these microservices communicating with each other are not exposed to each service.
A Service Mesh helps resolve this issue by routing a request to different microservices through proxies using a dedicated infrastructure layer built right into the app; this controls how different parts of an application share data with each other. Sometimes we call these proxies sidecars; they work in conjunction with microservices to route requests to other proxies.
Combining these sidecar proxies with microservices forms a mesh network. One of the primary reasons why Service Meshes are so popular is due to the addition of these sidecars. They handle all the network traffic into and out of the microservice while also providing visibility and traffic control for all of your microservices.
Figure 1: Service Mesh Architecture with sidecar proxy.
Reliability and Security
A Service Mesh provides enhanced reliability of and visibility into your application via intelligent traffic routing. In cases where the server is not responding, it will automatically stop routing traffic to that host and reroute it back if the server starts responding in a given time frame. On top of that, it offers other benefits such as rate-limiting, request retries, and timeouts.
The other important feature provided by a Service Mesh is security. Securing your services has always been tricky, and in a microservice world, it’s become even more challenging. Some of the reasons for this are the following:
- No single point of control: In a monolithic-based application, you have a single point of control to put iptables/firewalls. But as you break your application into multiple microservices, the microservices talk to each other over a network using myriads of protocols (HTTP/gRPC). This increases the complexity and takes away a single point of control from a security perspective.
- Using multiple languages (Polyglot applications): Microservices encourage developers to write an application in languages that make it more efficient and easy to maintain. Now, if there is a security patch, you need to wait until it applies to all the applications written in different languages, making the situation complex.
- Increased surface area: With microservices, you are using different containers with different base images, some of which are not even under your control. On top of that, you might be deploying your container in an environment that is also not under your control.
Before we understand how a Service Mesh simplifies security, let’s review two key security terms:
- Authentication is an act of validating/verifying your end-user or the client who is currently making the request.
- Authorization is an act of validating if the end-user or client is allowed to make the request or has the right permissions to access the resource.
How Can Service Mesh Simplify Security?
Typically, to enforce either authentication or authorization, you need to write the logic in your application, which is very complicated. But by using a Service Mesh, you can offload some of the functionalities, like authentication and authorization, into the mesh. Most of a Service Mesh works by adding a proxy, which will intercept traffic going into and out of your application. These proxies not only parse at layer four, but they also parse at the application-level protocols in order to implement the complex functionality of authentication or authorization, as they understand this protocol much better.
Security Sidecar Proxies, Logs, and Tracing
So far, you’ve seen that most of a Service Mesh works by adding a proxy (sidecar). But as the proxy intercepts traffic, it emits lots of rich, contextual telemetry data, which can then be used to understand what is happening in your clusters; this data includes tracing, metrics, and logs.
There are various solutions that work with such data. For example, App Mesh is a Service Mesh solution offered by AWS that is already integrated with the following services:
- CloudWatch: Using App Mesh, you will get metrics and logs for every container of your service. The critical pieces of information included in this data are request identifiers and service names. CloudWatch enables you to aggregate and filter this data and visualize the end-to-end service communication. You can then analyze some standard metrics via the CloudWatch dashboard, including error codes and error rates between your service and dependent service.
- X-Ray: This gives you a complete view of how a request is traveling from end to end, which helps you determine the root cause of performance issues and errors. X-Ray gives you a visual map of the traffic routed by App Mesh and the ability to use that map to find latencies in your routes.
App Mesh also supports any third-party integration with tools that work with Envoy. These include Grafana, Prometheus, and open-tracing solutions such as Zipkin.
Overview of Service Mesh Options
There is a myriad of open-source Service Mesh solutions as well as commercial offerings available on the market. Let’s explore a few of them:
Istio: Initially launched by Lyft as a Kubernetes-native solution, Istio later became the Service Mesh of choice for many big companies. It supports HTTP and HTTP/2 and includes Envoy as its backend proxy.
Linkerd2: Designed with simplicity in mind rather than flexibility, Linkerd only supports Kubernetes (Linkerd2) rather than other open-source solutions (Envoy and Istio) that support Kubernetes and a virtual machine. Now a part of the Cloud Native Computing Foundation (CNCF), Boyant initially developed Linkerd, and version 2 has now been transitioned using a sidecar methodology (instead of a node agent as in version 1) that eliminates the single point of failure.
Consul: Part of various HashiCorp offerings, Consul provides capabilities like service discovery and supports Kubernetes and other container management solutions. It uses a node-agent architecture and comes as a single binary with both server and client capabilities. Consul optionally supports an external system, such as a vault, for secret management.
AWS App Mesh: Here you have a Service Mesh solution offered by Amazon Web Services that provides functionality to monitor microservice applications on AWS. First launched in March 2019 (beta and product review phase), it is now available and supported for production use.
A Service Mesh is a relatively new technology, and before adopting it in your organization, the first thing you need to consider is what benefits you’ll receive after choosing one. If your organization is building only a handful of services, then a Service Mesh may not be the right solution for you. Even if you are dealing with many microservices, you can use tools like Kubernetes or Docker Swarm to accomplish your end goal.
A Service Mesh brings its own level of complexity to your architecture; so before using it, make sure your team is fully onboarded and has learned about this new technology. If you decide to use Service Mesh, first try it out on only a subset of microservices before rolling it out to your entire environment.
If you are already using a Service Mesh, you know its advantages, such as the fact that developers can spend more time writing business logic versus coding request logic. If implemented correctly, a Service Mesh can be an invaluable tool. If you have not yet decided whether to use one or not, it’s definitely a technology worth looking into.