One of the most important configurations for a Lambda function is the timeout value. It dictates how long a function invocation can last before it’s forcibly terminated by the Lambda service. In this post, let’s look at the considerations we should make when choosing a timeout value. We will see why the rule of thumb is to use short timeouts for Lambda, typically between 3-6 seconds.
To define the upper boundary of what we can choose from, let’s start with the hard AWS limits within which we have to work.
The most obvious limit is the fact that a Lambda function can run for five minutes at most.
If what you’re doing cannot be comfortably completed within five minutes, you should consider moving that workload elsewhere. AWS Fargate became generally available in 2018, and is a good fit for this type of long-running task that doesn’t quite fit with Lambda’s execution limit.
Depending on the event source you’re using with Lambda, there are also other limits to consider. For example, Amazon API Gateway has a hard limit of 29 seconds for integration timeout.
This means that even if your function can run for five minutes, API Gateway would have timed out after 29 seconds and returned a 500 error to the caller.
User Experience is King
When you’re building APIs, every additional second that you allow a function to run is another second the user would potentially have to wait before they get a timeout error.
Empirical evidence says that the relationship between response time and load follows an exponential curve. When a system reaches its saturation point, response time skyrockets! This tells us that if you hadn’t received a response after X seconds, then you’re exponentially less likely to receive a response after X+1 seconds.
This is why the rule of thumb in microservices is to use short timeouts on API endpoints. The same wisdom is still applicable when it comes to building APIs with API Gateway and Lambda. Rather than wait for API Gateway to timeout the request after 29 seconds, we should proactively timeout our function after a few seconds.
If a function is not able to respond within a few seconds, then chances are good that it depends on a downstream system that has reached its saturation point. Instead of keeping the user waiting, we should consider the following:
- Use short timeouts when communicating with a downstream system.
Just as we should use short timeouts for our functions, we should use short timeouts on our integration points, too. If a downstream system is not able to respond within a few seconds, then chances are it’s struggling and won’t respond for a long time. Since we have to give up at some point (before our function itself is timed out by Lambda), it is better to fail fast.
We can go a step further and set the timeout for these API calls based on how much time is left in the invocation. This way, we strike a good balance between giving the requests the best chance to succeed and protecting ourselves against slow downstreams.
- Use circuit breakers to stop dog-piling on a struggling downstream system.
Circuit breaker is a resilience pattern that is often used in a microservices architecture to prevent cascade failures. When requests to a downstream system consistently times out, it trips the circuit. We stop forwarding requests to the downstream system, and give it time to recover from its current demise.
When applying the circuit breaker pattern, you should avoid the anti-pattern of sharing the state of the circuit across multiple callers. Sharing these circuit states introduces another integration point that will require protection, require its own circuit breaker, and so on. In the case of Lambda, this means you shouldn’t share the state of the circuit across concurrent executions of any functions.
- Use cached or default responses when downstream cannot respond in a timely fashion.
If a downstream system errs consistently and tripping the circuit, what can we do in that case? Do we return an error to our user? In many cases, a partial answer or even a wrong answer is better than no answer at all.
Take Netflix’s home screen for example. When you log in, it attempts to load suggestions based on your preferences and watch history. If the API calls to fetch user preference or watch history fail, then Netflix will, if possible, return previously cached suggestions for you. Failing that, it will return a default set of suggestions that are bundled into the suggestion service.
In this case, downstream failures are handled and the user experience is downgraded gracefully as opposed to simply surfacing the failures to the user.
Performance vs Cost
If the workload is CPU intensive—for example, running simulations or Machine Learning algorithms—then you can improve the execution time by configuring the function with more memory. This is because Lambda gives you a single dial to control the performance of your function through memory allocation. More memory equals more CPU.
If you want to learn more about how CPU allocation works in Lambda, then this post by OpsGenie offers some nice insights.
When you allocate more than 1.8GB of memory, you also unlock a second CPU core. To take advantage of this second core, you have to parallelize your workload. Amdahl’s law tells us that the performance improvement achieved by adding more CPU cores is limited by the portion of the workload that can be parallelized.
This effect is clearly illustrated in a previous post that measures the performance impact of memory size on a single threaded application. Beyond 1.8GB of memory, the application is no longer able to observe any performance gains.
Also, remember that concurrency is not the same as parallelism. Node.js for example, is not well adapted for taking advantage of this second core because it’s a single threaded event loop.
With Lambda, you’re charged in 100ms blocks of execution time, based on the amount of memory allocated to the function. A 256MB function that runs for 1s would cost exactly twice as much as a 128MB function that runs for 1s.
How much CPU you should allocate for a function depends on a number of factors:
- The portion of code that can be improved by adding more CPU. For example, if a function spends most of its execution time waiting for a DynamoDB operation to complete, then adding more CPU is unlikely to drastically improve performance.
- The portion of code that can be parallelized. This is only relevant when you allocate more than 1.8GB of memory.
- The average execution time of the function. If a function averages 102ms, then it will be charged for 200ms of execution time. In cases like this, you might be able to save money by allocating more memory and bring the average execution time below 100ms. For example, 200ms at 128MB = $0.000000416, but 100ms at 192MB = $0.000000313, so going from 128MB to 192MB yields a 25% cost saving!
Alex Casalboni proposed a clever approach to work out the best memory size for your function. His solution uses Step Functions to execute your function with different memory settings. Using the actual performance data that this generates, you can work out the memory size that gives you the best balance between performance and cost.
Another reason for keeping Lambda timeouts short is related to security. With serverless, we find ourselves in a peculiar position regarding Denial-of-Service (DoS) attacks. On the one hand, we get a lot of scalability by standing on the shoulders of giants, and it’s much easier to brutally force your way through such an attack. On the other hand, you can end up paying a lot of money for the Lambda invocations and use of other resources that incurred during the attack, which is why we have nicknamed DoS attacks Denial-of-Wallet attacks in the serverless community!
Since the cost of Lambda invocations is tied to their durations, attacks such as Regex Denial-of-Service (ReDoS) can have a telling impact on cost. If a function can be invoked via a publicly accessible API Gateway endpoint, then you should avoid using long timeouts. In the event of a ReDoS attack, a long timeout (such as 60 seconds) can cost you a lot more than a function with a short timeout (such as six seconds).
The same applies to other functions that can be invoked via event sources that can be accessed publicly. For example, if you allow users to upload files to an S3 bucket and then process the uploaded files with Lambda.
While it’s important to use short timeouts, it requires much more effort to resist DoS attacks. Here are some of the common ways to protect yourself against DoS attacks:
- Monitor and alert regarding suspicious traffic patterns.
- Use WAF to inspect and block suspicious packets.
- Apply IP whitelisting/blacklisting.
- Apply rate limiting at API Gateway level and/or through Lambda’s reserved concurrency setting (which actually means the maximum concurrency of a function.)
As I illustrated in this post, memory size has a big impact on cold start time, and this makes sense. The more CPU resources the function has, the faster it’s able to initialize the language runtime and initialize your handler module and its dependencies. But the timeout value for your functions can also have a subtle impact on how frequently you experience cold starts, especially during times when things are running a little slower than usual.
When API Gateway receives an HTTP request, it will first see if it can reuse an existing container for the Lambda invocation. If all existing containers are executing already, then a new container would be spawned to handle the request. This invocation would therefore be a cold start and would take longer to complete.
If you consider two configurations of the same Lambda function, a has 6s timeout and b has 12s timeout. You’re more likely to see cold starts with b because there is a bigger time window in which another invocation would require a new container, and therefore a cold start.
Of course, we’re talking about probabilities. What would transpire in reality depends on other factors that are often outside of your control, such as traffic pattern. However, a shorter timeout gives you better protection against the worst-case scenario. As we discussed already, once a system reaches its saturation point, the response time goes up exponentially. If your function depends on a downstream system that has reached its saturation, then a long timeout would make things worse by causing more cold starts and damaging user experience even further.
I know what you’re thinking. “That is an awful lot to think about just to choose a number between 1 and 300! Don’t you have a few simple rules for me?”
Here are a few general rules that I like to follow:
- For API functions where the latency is user-facing, use a timeout of 3-6 seconds. These functions shouldn’t be doing too much as they’re on the critical path, so 3-6 seconds should be sufficient.
- If your API needs to perform expensive computations, then consider adopting the decoupled invocation pattern so that the initial request will be fast.
- For Kinesis, DynamoDB Streams or SQS functions where you typically process data in batches, adjust timeout based on the batch size.
- If a function needs a long timeout because it’s either doing many things or requires complicated retry logic with exponential delays, then consider using Step Functions. Break the function up and delegate the retry logic to Step Functions.
In general, apply common sense. If you feel that you have to give a function a much longer timeout then should be required to do its job, then there is usually some underlying problem. In cases like this, I like to use the 5-whys technique to help me figure out what the real problem is and tackle that problem directly.