Serverless architecture is the new kid on the block, and according to a recent survey by Serverless, Inc., a vast majority of developers will start using it by the end of the year. The serverless paradigm involves running code in the cloud without managing any servers, allowing you to build business logic and create value without ever thinking about the infrastructure or underlying software. Essentially, it lets you focus on your code.

Serverless does not only cover AWS Lambda and other FaaS providers, but basically everything you can use to run the code, host files, and store images and data. This means that you, as an engineer, don’t need to manage, scale, or operate any servers whatsoever. And here’s the icing on the cake: you only pay for the time your code is running!

Although serverless offers many benefits, there are still some pitfalls, such as latency. In this article, we’ll discuss how to minimize latency in AWS Lambda. This dreaded phenomenon is caused by cold starts, which are, by definition, slower initial responses from your serverless APIs.

Before we get into how to avoid frequent cold starts, let’s dig deeper into what FaaS is and how it works.

What Is FaaS?

AWS Lambda is the compute service on AWS that lets you run code without caring about servers, and is one of many FaaS (Function as a Service) providers. But what is FaaS? It’s an entirely new cloud model giving engineers a platform to develop, run, and manage applications without having to maintain and build infrastructure. Creating applications while following this model is a way to achieve a serverless architecture. Using this architecture is best suited for building microservices.

FaaS providers have various use cases you can take advantage of. One of the most powerful is the ability to run short running scripts and processes you can trigger and forget. Apart from that, you can also set up periodic processes to execute every once in a while, or regularly (every morning at 9 AM, for example). See? The benefits of not caring about servers, in this case, are incredibly convenient.

Even though we know FaaS thanks to AWS Lambda, Google Cloud Functions, and Microsoft Azure Functions, it was first introduced and made available to the world by in late 2014. Having services such as these at our disposal means that we can upload modular pieces of code to the cloud and execute them entirely independent of each other.

Sign up for a free trial of Epsagon today!

How Does AWS Lambda Work?

Let’s dig a bit under the surface, especially since AWS has never issued a statement about how AWS Lambda works under the hood described in re:Invent 2018 how Lambda works under the hood. Here we’ll explain what’s actually going on when an AWS Lambda function is triggered.

Step one is to configure an event trigger to invoke the lambda function. This can be anything from an HTTP request to an image upload, to S3 or an SQS queue. The lambda function will first tell AWS Lambda to create an instance of the function. This instance is basically a container similar to a Docker container, but AWS has its own proprietary containerization software to manage AWS Lambda function instances.

Once the instance spins up, the code will be deployed into the container. This is when the code runs, executes, and returns a value. In this period, the function is considered alive and running, while the idle period is when it has finished executing the code.

The first request will always be a bit slower due to the initialization process (previously container, now with Firecracker), but subsequent requests will hit the same instance, meaning they will be lightning-quick. Unfortunately, there’s a catch. Concurrent requests will trigger the creation of new AWS Lambda instances. For example, if you have 10 concurrent requests to the same lambda function, AWS will create 10 containers to serve the requests. Yan Cui did an amazing job explaining this in his article, “I’m afraid you’re thinking about AWS Lambda cold starts all wrong.” If you’re interested in more information about how Lambda is implemented, you should check out the Lambda Internals blog series.

What Is a Cold Start?

Does it make sense now? A cold start is caused by the initial process of creating a container which runs your code. Based on the language you’re using, the latency can often be well over a few seconds. But is it really that big of an issue? According to Chris Munns, Principal Developer Advocate for Serverless at AWS, less than 0.2% of all function invocations are cold starts. It only becomes an issue with synchronous invocations.

Now you’re probably thinking, what’s all this fuss about? Well, be honest. Have you ever left a website because it still hadn’t loaded after a few seconds? This is where engineers play an important role, making sure users never have to face this issue. Here’s how to avoid the problem.

Optimizing Cold Starts

There are two main tactics you can use when battling cold starts: minimizing the duration of the cold start (meaning cutting down the latency of the cold start itself) and minimizing the number of times cold starts occur. The former is done by using common programming patterns and common sense, while the latter is achieved with a technique called function warming. Let’s explain in more detail.

Minimizing the Duration of Cold Starts

You can shave down the time impact of cold starts by writing functions using interpreted languages. Cold start latency with Node.js or Python is well under a second. A compiled language like Go is another example of low cold start latency. Another easy step is to choose higher memory settings for your functions. This also gives your functions more CPU power. However, the most important consideration is to avoid VPCs. VPCs have to create ENIs, which take well over 10 seconds to initialize. Please, don’t put your functions in a VPC. Just don’t!

Minimizing the Frequency of Cold Starts

You can reduce the number of cold starts that occur in the first place with function warming. This is the act of sending scheduled ping events to your functions to keep them alive and idle, ready to serve requests. With Amazon CloudWatch Events, triggering the functions periodically to always have a fixed number of AWS Lambda instances alive is simple. Just set up a periodic cron job to trigger your function every 5-15 minutes, and rest assured, it will always be idle.

Once again, this raises the question: How do you handle concurrent cold starts? Luckily, there are a few modules and plugins to use. If you’re a Node.js developer, Lambda Warmer will hit home. Jeremy Daly, CTO of, discussed Lambda Warmer on our serverless and observability webinar. It lets you warm concurrent functions, and even choose the level of concurrency you want. And, it works with both the Serverless Framework and AWS SAM. The Serverless Framework has another plugin you can use, called Serverless WARM-Up Plugin. But sadly, it doesn’t support concurrent function warming.

In order to correctly warm your functions, you should follow a few simple steps:

  • Don’t invoke the function more often than once every five minutes.
  • Invoke the function directly with Amazon CloudWatch Events.
  • Pass in a test payload when running the warming.
  • Create handler logic that doesn’t run all function logic when it is running the warming.

Here’s what a handler function would look like with warmer logic included:

const warmer = require('lambda-warmer')
exports.handler = async (event) => {
    // if a warming event
    if (await warmer(event)) return 'warmed'
        // else proceed with handler logic
    return 'Hello from Lambda'

The event trigger configuration with the Serverless Framework would only require you to add a few lines of YAML:

name: myFunction
handler: myFunction.handler
 - schedule:
     name: warmer-schedule-name
     rate: rate(5 minutes)
     enabled: true
       warmer: true
       concurrency: 1

By using this warmer, you can safely keep as many AWS Lambda instances running as you want, including copies of the same function if you are expecting concurrent requests. So, say goodbye to cold starts.

Wrapping Up

In the end, it all boils down to latency, the dreaded phenomenon all engineers want to reduce in their software. As we discussed, you can optimize cold starts by minimizing their duration and the number of times they occur. You can also cut down the duration by increasing memory settings for the function and keeping it out of a VPC, while function warming keeps the overall amount of cold starts at bay.

Even though less than 0.2% of all function invocations are cold starts, engineers have a duty to make sure all of their users experience optimal performance at all times. It’s easier said than done, but by following the guidelines above, it’s definitely achievable.

Sign up for a free trial of Epsagon today!