Microservices have been on the rise along with an increasing number of containers that are being used in the industry. Because of this, IT professionals must change their approach to microservices health checks. In the world of monolithic architecture, the primary goal is to monitor at the system level. This is still needed for microservices, but you need to go one step further and monitor the service itself. A container can be running from a “Is this thing on?” perspective, but that won’t matter much if the service inside of the container is down.
In this post, we review why microservices health checks are necessary and we’ll take a look at multiple ways to monitor your microservice architecture for production-ready results.
Why Health Checks?
Health checks can guarantee an almost real-time alert based on what is being triggered. Health checks aren’t just for when something goes down but also for when there is an upgrade or a modification in the containerized application. In this section, we’ll discuss why health checks are vital for your architecture.
What Are Health Checks?
At their core, health checks are a way to monitor containers at the application level. With containers, and when implementing a microservices approach, monitoring has to change drastically. Monitoring the container at the system level (as with monolithic applications) is still okay, but it doesn’t tell you if the application or service is still running.
Containers by definition are ephemeral. This means that they have just one job: spin up, provide a service, spin down. But in today’s world, we’re seeing containers being used a bit differently: They don’t ever go down and instead stay up to run an application 24/7. Containers running 24/7 are on a system that is being monitored, but are the containers themselves being monitored?
Monitoring at the container level itself is simply not enough. A container can still be running while the service or application running inside the container is down. And if the application or service inside of the container is down, the container itself being up doesn’t do you much good.
How Health Checks for Containers Work
Monitoring an application running in a container is qualitatively different from monitoring the same application that’s running on a server. An application running on a server has a service or process running, which is monitored to see if it is up or down. If it is down, there may be some automation in place to restart the service or start the process if it went down. But containers work a bit differently.
With containers, automation that is configured can’t be used to SSH or RDP into a container and restart a service or start a process. If an application inside of a container goes down, the container should be deleted and a new container created. This process obviously isn’t suitable for a monolithic approach with virtual machines or servers. Spinning up a new virtual machine after deleting it can take an hour or more if there isn’t automation in place, whereas creating a new container takes seconds.
An ideal approach to your containerized environment is to have an architecture in place where if the container’s application or service goes down, a new container gets spun back up. For example, with orchestration platforms like Docker Swarm and Kubernetes, the architecture for auto-creating a new container exists out of the box. So if a service or deployment goes down in Docker Swarm or Kubernetes, the dead container is removed and a new container is automatically created. In fact, AWS has a service for Kubernetes where health checks take place right out of the box using Elastic Kubernetes Service (EKS).
Of course, this orchestration path may not be for everyone. So what do you do if your organization doesn’t have an orchestration system like Docker Swarm or Kubernetes?
Monitoring at the Application Level and What to Alert On
All architectural designs need to include health checks at the container level. If they don’t include monitoring of the application inside of the container, the architecture is setting up your production environments and deployments for failure. If there are no health checks inside of the container in the architectural design, health checks will not be implemented, and there will be no way of knowing if the container’s application is up or not.
After an understanding of what will be monitored is implemented in the workflow, you then need to deal with what to be alerted on. The two scenarios are:
- Don’t get alerted. Simply allow your orchestration and automation to take care of everything behind the scenes.
- Get an alert if the application goes down.
The first scenario, don’t get alerted, is the trickiest but most efficient. If the orchestration and monitoring have built-in self-healing functionality, then an alert may not be necessary. However, a notification is still needed. For example, an alert that wakes an engineer up at 1 a.m. just so he can see that the self-healing worked may not be the best use of the engineer’s time. But a notification should still be sent to the team so they know what happened; this is more for references and any root cause analysis that management may need. This process should already be QA tested and approved as well as production-tested. Also, destroying a container or stopping an application in a container within your production environment to test this is crucial.
The second scenario is getting an alert if the application inside of a container goes down. This is the recommended approach for the first 30 days, simply for peace of mind. It’s good for management and engineers to know that the alerting is working as expected.
When to Implement Health Checks and Monitoring
In the previous section, you saw why health checks, monitoring, and alerting are important. Before getting to the microservices health checks, you need a service to monitor. Depending on the application, there is always the best time to implement a health check, something we discuss further in this section.
Health Checks at Each Software Stage
The stages for developers and DevOps engineers typically go as follows:
Implementing health checks must begin at the development stage. To monitor properly, monitoring has to be built in such a way that the application expects it from the very start. Health checks should be incorporated at the time of creating the development environment, and test-driven development should be incorporated into the software delivery lifecycle.
Figuring Out What to Monitor
Monitoring an application or service needs to be exact in terms of how the service is running. For example, let’s say you have a Java application that’s running using the command java -jar some_application.jar. That means that the resource executing the .jar application needs to be monitored as well as the application or service itself. Any runtime command or script that is kicking off the application or service needs to be monitored.
Besides monitoring the application or service, there should be monitors in place for standard CPU and memory usage. Monitoring whether or not the container will crash is critical outside of the application or service itself.
As you can see, there are several layers that need to be monitored in a containerized environment.
Health Checks for Containers in AWS
In the two previous sections, you not only saw why it’s important to implement microservices health checks but when to implement them, too.
In this section, we’ll discuss the many ways to implement health checks in AWS.
In March of 2018, AWS announced that health checks were available for ECS. Within an ECS cluster, health checks are available by default for CPU and memory utilization. With ECS health-check monitoring, AWS implemented health checks for Elastic Load Balancing (EBS) inside the ECS cluster. These health checks are configured using CloudWatch, which is AWS’s built-in monitoring solution.
Health checks in Amazon’s Elastic Kubernetes Service (EKS) are a bit different. Kubernetes incorporates self-healing in its deployments out of the box, which will restart an unhealthy container. Kubernetes does this by using probes inside of the Kubernetes cluster to perform a constant check on the containers to see if they are in a healthy state. AWS takes advantage of the self-healing feature orchestration platforms provide built-in, but also offers CloudWatch alerting for the EKS cluster itself.
Creating Container Insights in ECS
Now that you have a clear understanding of health checks and monitoring for your environment, it’s time to put that knowledge to use by creating a health check in ECS.
This section assumes you have:
- An AWS account, which can be created for free for 12 months here.
- A container running in ECS. Instructions on that can be found here.
First, go to the AWS ECS console.
Click the blue Create Cluster button.
In the Create Cluster section, choose the EC2 Windows + Networking template. It’s not mandatory to choose this template, but the rest of this tutorial will be based on that template.
Once a template is chosen, click the blue Next Step button as shown in the screenshot below.
Put in a cluster name. I’m using testCluster, but this is not mandatory.
Scroll down to the bottom of the page, and you’ll see an option to enable Container Insights as shown in the screenshot below.
Put a checkmark in the box next to Enable Container Insights, and click the blue Create button.
The ECS cluster will start being created with Container Insights enabled.
Congrats! Your ECS cluster has been created with container insights enabled.
In this post, you took a first-hand approach to learn how to perform microservices health checks, monitor containers at the service level, and why it’s important. You also learned what health checks are and why they are critical for small/medium businesses and enterprises. Finally, you learned how health checks work for containers and when to start monitoring containers. After that, you dove headfirst and created a health check in AWS for Elastic Container Service (ECS). The health checks for ECS provide CloudWatch metrics so that the cluster itself can report on any degrading issues with the cluster or the containers.