One of the most important configurations for an AWS Lambda function is the timeout value. The AWS Lambda timeout dictates how long a function invocation can last before it’s forcibly terminated by the Lambda service. In this post, let’s look at the considerations we should make and the best practices for AWS Lambda timeouts and limits.
We will see why the rule of thumb is to use short timeouts for Lambda, typically between 3-6 seconds. To define these limits, let’s start with the hard AWS limits within which we have to work.
AWS Lambda Limits
The most obvious limit is the fact that a Lambda function can run for five minutes at most.
If what you’re doing cannot be comfortably completed within five minutes, you should consider moving that workload elsewhere. AWS Fargate became generally available in 2018 and is a good fit for this type of long-running task that doesn’t quite fit with Lambda’s execution limit, which affects the Lambda timeout.
Depending on the event source you’re using with Lambda, there are also other limits to consider. For example, Amazon API Gateway has a hard limit of 29 seconds for integration timeout.
This means that even if your function can run for five minutes, API Gateway would have timed. It would take 29 seconds. It would have returned a 500 error to the caller. Knowing these best practices can prevent future issues.
User Experience and Lambda Limits
When you’re building APIs, every additional second that you allow a function to run has an impact. It is another second the user would potentially have to wait. After that, they would get a Lambda timeout error.
Empirical evidence says that the relationship between response time and load follows an exponential curve. When a system reaches its saturation point, response time skyrockets! So if you hadn’t received a response after X seconds, then you’re less likely to receive a response after X+1 seconds.
This is why the rule of thumb in microservices is to use short timeouts on API endpoints. The same wisdom is still applicable when it comes to building APIs with API Gateway and Lambda. You could wait for API Gateway to timeout the request after 29 seconds. Or, otherwise, the best practice is to proactively timeout our function after a few seconds to not reach the limit.
If a function is not able to respond within a few seconds, it might depend on a downstream system. It is possible that it has reached its saturation point. Instead of keeping the user waiting, we should consider the following.
Use short timeouts when communicating with a downstream system
Just as a short Lambda timeout is a good practice, we should use short timeouts on our integration points, too.
If a downstream system is not able to respond within a few seconds, then perhaps it’s struggling. Maybe it won’t respond for a long time. Since we have to give up at some point, it is better to fail fast. Otherwise, our own function might timeout by the Lambda.
We can go a step further and set the timeout for these API calls based on how much time is left in the invocation. This way, we strike a good balance between giving the requests the best chance to succeed and protecting ourselves against slow downstream.
Use circuit breakers to stop dog-piling on a struggling downstream system
A circuit breaker is a resilience pattern that is often used in a microservices architecture to prevent cascade failures. When requests to a downstream system consistently times-out, it trips the circuit. We stop forwarding requests to the downstream system and give it time to recover from its current demise.
When applying the circuit breaker pattern to prevent AWS Lambda timeouts, there are things to consider. You should avoid its anti-pattern of sharing the state of the circuit across multiple callers.
Sharing these circuit states introduces another integration point that will require protection, require its own circuit breaker, and so on. In the case of Lambda, this means you shouldn’t share the state of the circuit across concurrent executions of any functions.
Use cached or default responses when downstream cannot respond in a timely fashion
If a downstream system errs consistently and tripping the circuit, what can we do in that case? Do we return an error to our user? In many cases, a partial answer or even a wrong answer is better than no answer at all.
Take Netflix’s home screen for example. When you log in, it attempts to load suggestions based on your preferences and watch history. If the API calls to fetch user preference or watch history fail, then Netflix will, if possible, return previously cached suggestions for you. Failing that, it will return a default set of suggestions that are bundled into the suggestion service.
In this case, downstream failures are handled and the user experience is downgraded gracefully as opposed to simply surfacing the failures to the user.
In this case, handling the downstream failures is the chosen way. However, it causes a downgrade to the user experience.
AWS Lambda Performance vs Cost
If the workload is CPU intensive—for example, running simulations or Machine Learning algorithms—then you can improve the execution time by configuring the function with more memory. This is because Lambda gives you a single dial to control the performance of your function through memory allocation. More memory equals more CPU.
If you want to learn more about how CPU allocation works in Lambda, then this post by OpsGenie offers some nice insights. Remember: in serverless, memory equals CPU equals running time, which affects Lambda timeouts.
When you allocate more than 1.8GB of memory, you also unlock a second CPU core. To take advantage of this second core, you have to parallelize your workload. Amdahl’s law tells us that the performance improvement achieved by adding more CPU cores is limited by the portion of the workload that can be parallelized.
This effect is clearly illustrated in a previous post that measures the performance impact of memory size on a single threaded application. Beyond 1.8GB of memory, the application is no longer able to observe any performance gains.
Also, remember that concurrency is not the same as parallelism. Node.js, for example, is not well adapted for taking advantage of this second core because it’s a single threaded event loop.
With Lambda, you’re charged in 100ms blocks of execution time, based on the amount of memory allocated to the function. A 256MB function that runs for 1s would cost exactly twice as much as a 128MB function that runs for 1s.
What Is the Right Choice for CPU?
How much CPU you should allocate for a serverless function depends on a number of factors:
- The portion of code that can be improved by adding more CPU. For example, a function may spend most of its execution time waiting for a DynamoDB operation. In this case, adding more CPU is unlikely to drastically improve performance.
- The portion of code that can be parallelized. This is only relevant when you allocate more than 1.8GB of memory.
- The average execution time of the Lambda. If a function averages 102ms, then it will be charged for 200ms of execution time. In cases like this, you might be able to save money by allocating more memory and bring the average execution time below 100ms.
- For example, 200ms at 128MB = $0.000000416, but 100ms at 192MB = $0.000000313, so going from 128MB to 192MB yields a 25% cost saving!
Alex Casalboni proposed a clever approach to work out the best memory size for your function. His solution uses Step Functions to execute your function with different memory settings. Using the actual performance data that this generates, you can work out the memory size that gives you the best balance between performance and cost. You should also be aware of the APIs you are using and their impact on the running time.
These all have a big impact on the serverless timeout and it’s important to consider these best practices for Lambda timeouts.
Another reason for keeping Lambda timeouts short is related to security. With serverless, we find ourselves in a peculiar position regarding Denial-of-Service (DoS) attacks.
On the one hand, we are standing on the shoulders of giants. We get a lot of scalability, and it’s much easier to brutally force your way through such an attack. On the other hand, you can end up paying a lot of money for the Lambda invocations and use of other resources. It is why we have nicknamed DoS attacks Denial-of-Wallet attacks in the serverless community!
The Cost of Lambda Invocations
If a publicly accessible API Gateway endpoint can invoke the function, then you should avoid using long timeouts. In the event of a ReDoS attack, a long timeout (such as 60) can cost you a lot more than a function with a short timeout (such as 6).
In addition, if the Lambda reaches a timeout then it necessarily costs the maximum amount!
The same applies to other functions that can be invoked via event sources that can be accessed publicly. Example: if you allow users to upload files to an S3 bucket and then process the uploaded files with Lambda.
While it’s important to use short timeouts, it requires much more effort to resist DoS attacks. Here are some of the common ways to protect yourself against DoS attacks:
- Monitor and alert regarding suspicious traffic patterns.
- Use WAF to inspect and block suspicious packets.
- Apply IP whitelisting/blacklisting.
- Apply rate limiting at API Gateway level and/or through Lambda’s reserved concurrency setting (which actually means the maximum concurrency of a function.)
As I illustrated in this post, memory size has a big impact on cold start time, and this makes sense. The more CPU resources the function has, the faster it’s able to initialize the language runtime. Also, initialize your handler module and its dependencies.
But the timeout value for your functions can also have a subtle impact on how frequently you experience cold starts. It is especially true during times when things are running a little slower than usual.
When API Gateway receives an HTTP request, it will first see if it can reuse an existing container. If all existing containers are executing already, then a new container would be spawned to handle the request. This invocation would, therefore, be a cold start and would take longer to complete.
Example: Two Functions
If you consider two configurations of the same Lambda function, a has a 6 seconds timeout and b has a 12 seconds timeout. You’re more likely to see cold starts with b because there is a bigger time window in which another invocation would require a new container, and therefore a cold start.
Of course, we’re talking about probabilities. What would transpire in reality depends on other factors that are often outside of your control, such as traffic pattern. However, a shorter timeout gives you better protection against the worst-case scenario.
As we discussed already, once a system reaches its saturation point, the response time goes up exponentially. In some cases, a long timeout would make things worse. The case that your function depends on a downstream system that has reached its saturation is an example. It may cause more cold starts and damaging user experience even further.
I know what you’re thinking. “That is an awful lot to think about just to choose a number between 1 and 300! Don’t you have a few simple rules for me?”
Here are a few general rules that I like to follow:
- For API functions where the latency is user-facing, use a timeout of 3-6 seconds. These functions shouldn’t be doing too much as they’re on the critical path, so 3-6 seconds should be sufficient.
- If your API needs to perform expensive computations, then consider adopting the decoupled invocation pattern so that the initial request will be fast.
- For Kinesis, DynamoDB Streams or SQS functions where you typically process data in batches, adjust timeout based on the batch size.
- Sometimes a Lambda is either doing many things or requires complicated retry logic with exponential delays. In these cases, consider using Step Functions. Break the function up and delegate the retry logic to Step Functions.
In general, apply common sense. If you feel that you have to give a function a much longer timeout, then there is usually some underlying problem. In cases like this, I like to use the 5-whys technique to help me figure out what the real problem is and tackle that problem directly.