The Hitchhiker’s Guide to Serverless

What you should know about serverless, the universe and everything.

As with any new technology, the serverless ecosystem is evolving rapidly. Cloud providers are releasing new features and services on a monthly basis. As a new user, it can be quite overwhelming. So in this post, I’ll help you figure out who’s who in the serverless zoo — I will give an overview of services you should know when building serverless applications, discuss when you should (or shouldn’t) use them and list some common gotchas when using them.

I will focus on the AWS serverless services since AWS is my cloud of choice. However, the offerings of other cloud providers are pretty similar to the ones we will discuss here, so you can use lists like AWS to GCP or AWS to Azure, which are like Rosetta stones for cloud providers, to get started with the cloud vendor of your choosing.

One more thing before we begin: serverless is advantageous, but the entire ecosystem can be a lot to take in at first. So the best tip I can give you to getting started with serverless, which is also a great tip for traveling the galaxy, is DON’T PANIC. It might seem hard to get a grip on everything at first, but the learning curve is very steep, and the results are worth it. So without further ado — let’s start our serverless journey.

The Serverless Universe

Lambda

Lambda is the obvious choice for embarking on our serverless journey. It is a FaaS (Function-as-a-service) offering — An event-driven compute service provided by the cloud vendor that lets you execute your code without managing any servers. You zip your code and send it to the cloud provider. Then you configure events for which the code should execute (such as an HTTP request, a message placed into a queue, etc.) and the cloud provider takes care of the rest.

Lambda (and Faas) is an integral part of most serverless systems. It scales automatically and is pay-per-use (as most serverless services are). I would recommend to chose Lambda for most cases, although there are some other serverless compute services which target more specific use cases, which we will cover later on.

However, there are some new gotchas you should note when using Lambda:

  1. Lambda has a time, memory and concurrency limit — make sure you don’t reach it, or AWS will throttle your function! Also, if building a user-facing API, or an API where latency is significant, you should mind cold starts, and keep in mind the chosen programming language.
  2. Avoid monoliths — the serverless ecosystem is a perfect fit with the microservice architecture. Don’t have a full web app packed into one Lambda. Each of your functions should do exactly one thing.
  3. When using Lambda, make sure your functions transform data, and not transport data. There are better AWS services to use for that.

Lambda is the cornerstone of most large serverless apps, so there are many more gotchas, and it takes time to get in that serverless mindset. But this should get you started on the right track.

SQS

SQS (Simple Queue Service) is a fully serverless queue service. You can create a queue with a click of a button and start sending messages through it. Queues are a common component in distributed systems. They are used to un-couple different parts of our system, so each can operate on his own. SQS offers two types of queues: FIFO queues where the order of the messages is guaranteed, and standard queues where the ordered is not guaranteed, but the throughput of the queue is almost unlimited.

There are several things you should note when using an SQS queue as a component in a serverless system:

  1. An SQS queue can have only one client. That means that if you want the message to get to multiple destinations, you should use another service (like SNS or Kinesis). Architecturally speaking, the microservice who owns the queue should be the receiving end — everyone can message the queue, but only one service will react to the message.
  2. When using an SQS queue as a trigger to a Lambda function, you can only use standard (non-FIFO) queues. That makes sense since you want to allow Lambdas to execute concurrently for different queue message. If you do need to process in order, have a look at Kinesis instead.

SNS

SNS (Simple Notification Service) is a managed pub/sub service. An SNS “entity” is called a topic. Each topic can have several subscribers (HTTP endpoints, Lambda functions or SQS queues).

A typical use case for SNS is a service broadcasting an event to the rest of the system. Let’s say I have several microservices who need to react to the registration of a new user. I can have my registration service publish a message whenever a user is registered, and have all the other components as subscribers to this topic (either via a Lambda trigger, an SQS message or an HTTP hook). Classic Pub/Sub.

Here are some things to keep in mind when using SNS:

  1. There are two architectural setups for using SNS: one-to-many and many-to-many. Either the sending end owns the topic, or it is a service of its own who has no owner. If your SNS has only one client (and always will have, not just because you did not develop that part of the system yet) and is owned by him — maybe SNS isn’t the right choice for this use case. Consider using an SQS queue / Kinesis.
  2. SNS can get quite pricey at scale, compared to the other messaging services. In some cases, consider using Kinesis instead.

Kinesis

Kinesis is a fully managed stream. It allows processing of data records in order, at a very high scale. To enable parallel data processing, each stream is made of several “shards”, and only the shard is processed in order. You can guarantee that the same shard will handle two different messages by using the same identifier when inserting them to the Kinesis.

Kinesis is a family of services which includes the standard Kinesis, Kinesis Video Streams and Kinesis Firehose which is a service used for data aggregation.

You can read data from Kinesis using the KCL (Kinesis client library) or trigger a Lambda with it. Here are some notes on integrating Kinesis to your serverless system:

  1. A Kinesis stream can support multiple consumers and replicate the data to all of them (without affecting each other). Useful!
  2. A Kinesis stream can have many possible different use cases (architecturally speaking). Since its purpose is to act as a stream, it sometimes makes sense to use it as an input to a service, sometimes as an output, and sometimes as a completely independent service.
  3. Kinesis is built for scale, and it’s pricing scales well. Don’t be scared to use it at the busiest parts of your system.
  4. Kinesis does not support auto-scaling yet. That means you have to provision the number of shards in advance. There are some solutions to this problem, but it something to take into account when using kinesis.

Step Functions

Since Lambda functions are stateless, managing state can sometimes be hard. But fear not, for Step Functions are here for the rescue! Step Functions is an AWS service which allows you to manage state as code. It is an orchestration service that lets you model workflows as state machines. You can find some more details in this excellent post by Yan Cui (whose name I ripped off).

Note that AWS announced support for more service integration for Step Functions at re:Invent 2018, like integration with DynamoDB, Fargate, SNS and SQS, so you can use Step Functions to orchestrate a lot more than just Lambda. The service is a bit pricey, so there are some use cases where it might be an overkill, but for the most of it, it is a pretty awesome tool at your disposal.

S3

S3 (Simple Storage Service) is probably the most popular object store service in the world. While it has many possible use cases (e.g. backups, static websites) serverless systems reveal its full potential.

When building a serverless microservice, you usually require some database. In some cases, S3 will be a great fit — It is easy to use, highly available, durable, and it is massively cheap. It integrates with Lambda, and you configure it as one of its event sources. Also, when combined with other serverless services (like Athena or Glue) it can be quite powerful, despite its simplicity.

However, when choosing S3 as a database for your service make sure it’s a good fit first: S3 is an object store. It does not have database features like locking mechanisms and transactions, which can be an issue for some services which require parallel access to the DB (common pattern with Lambda) — so make sure this is not the case (now or in the foreseeable future for the service). If you don’t have parallel writes / they don’t interrupt you service — congratulation, S3 might be a great fit.

DynamoDB

DynamoDB is a serverless key-value document database. It is a popular database for building serverless applications on AWS. When I say DynamoDB is serverless, It entails a few features:

  1. To get started you only have to create a table — no need to provision a server.
  2. DynamoDB’s API is over HTTP endpoints — there are no DB connections to worry about when working with it, which is excellent when using Lambda since it can have thousands of parallel executions.
  3. Since AWS Latest announcement, you can use DynamoDB on a pay-per-use model (you can still use a reserved capacity which is sometimes a bit cheaper)

DynamoDB is a pretty flexible database and works best with a single purpose service. If you build a monolith with DynamoDB and use it as an all-purpose database, you are gonna have a bad time.

I am not saying this because it’s impossible, or because it will be harder than with other databases (it might, but that’s not the point). It is because DynamoDB is the easiest to use when you keep your data as simple as possible. Adding many indexes for lots of different purposes (for each you will probably need a small piece of the data) is an anti-pattern. Additional Indexes are valid for some cases but use them wisely.

DynamoDB has recently announced transactions support, which was the last piece missing for making it the ultimate serverless database. However, there are some cases where DynamoDB is not the best fit: when you have to perform complex searches against your data, or when you hold raw analytics or time-series data you might find DynamoDB hard to use. For these cases, use one of the other serverless databases.

Other Serverless Databases

AWS recently announced several serverless databases other than DynamoDB. Some are still in preview but will be released this year. I will not elaborate on each of them independently (since DynamoDB is a good fit for most use cases, at least the basic ones), but that does not mean you should not use them. It’s the other way around — you should know all of them well, and use the one that best fits your use case! Always try to have the right tool for the right job.

Serverless databases on AWS. choose from key-value (DynamoDB), graph (Neptune), time-series (Timestream) or ledger (QLDB)

API Gateway

API Gateway is the gateway to your application. It lets you manage your API easily, and integrates with many compute services for you to handle the requests with (Lambda is one of them!). It is REST-based (meaning you can use the different HTTP verbs with it), and has some features that give you great control over your API such as setting limits on your APIs or using different authorizers for your API, thus separating the authentication logic from your main business logic.

While being a handy service, note that it adds some latency to your requests (compared to invoking a Lambda directly via the SDK for example), and it is not very cheap (not too expensive as well, but somewhere in the middle). API Gateway is an excellent choice for REST user-facing APIs. For internal APIs synchronous invocations (between microservices) it depends on the use case. In those cases don’t just use API Gateway as a default — check what are the benefits over, for example, invoking the other service Lambda directly. Sometimes there will be some; sometimes there won’t.

AppSync

This one is exciting. We already mentioned that you should use Lambda to transform, not transport. But (let’s say you are building a user-facing application) how should you send data to the user? Isn’t that precisely what Lambda + API Gateway does? The short version is yes, but you can do it better.

AppSync is a serverless backend for mobile/web/any API consuming application. Unlike API Gateway it uses a GraphQL API for your service. To use AppSync you have to define your data schema. Then, you have to set your data source, from which AppSync reads the data. The default data source is DynamoDB, but there are many other options (You can even use it with API Gateway behind the scenes, for legacy APIs for example). In case you need to transform the data before you send it to the user, you can always use a Lambda resolver for the request and have it process the data. Then let AppSync take care of the rest.

Writing APIs which use API Gateway + Lambda to serve users request can be a significant effort you can spend somewhere else. I highly recommend using AppSync for that. However — It does require some basic knowledge of GraphQL. If you are at the beginning of your serverless transition some of you would probably want to take it step-by-step and not introduce all these new technologies to your stack all at once.

Athena (& Glue)

Love analytics? This one is just for you. Athena is a serverless tool built on Presto which allows you to analyze massive amounts of data stored in S3 quickly and cheaply, using standard SQL queries. The data can be stored in many formats, including CSV, JSON or Parquet. Another thing worth mentioning is that Athena is integrated with Glue, which is a serverless ETL service that is pretty cool on its own. You can use Athena queries on data sources you have in your Glue data catalog.

One thing to keep in mind when using Athena is that you pay not only for Athena (which is priced according to tera-bytes read) but also for the S3 calls that Athena is making. Some things to keep an eye for are:

  1. Use partitions on your S3 data lake. When you execute an Athena query you can specify the partitions you want to search in the WHERE clause of the SQL statement. This way you will scan only the data you need.
  2. Mind that when using Athena, you are paying for the S3 API calls as well. This can get quite expensive if your data lake is made of many small S3 files, and will also hurt performance.
  3. If you need some heavier lifting then SQL queries on your data, you can use Glue’s custom scripts to run a serverless spark job! pretty cool.

here are some more tips for working with Athena, by Manjeet Chayel and Mert Hocanin

OK, Now What?

This is just the tip of the iceberg. While writing this post I had to leave so much out, and it is still pretty packed. Services like AWS Batch, CloudFront, Route 53, and the different IoT and machine learning services are just a few others I left out. Whenever you feel like you need to implement something in your application — you should check if a service for that already exists. But you are at least familiar with the most basic and common services you will use on your serverless journey.

Also, these are just the basic building blocks — now it’s up to you to start using them. One thing you should note is that the architecture of serverless applications is pretty different, and learning to do it right takes a while. Here is a serverless transformation example (again, by Yan Cui) and a good read about serverless design patterns by Jeremy Daly to give you a sense of what a serverless application might look like. You can also use tools like AWS’s new well-architected tool to assist along the way. Also, be sure to check The 5 Best Use Cases for the Serverless Beginner.

Like with any new technology, going serverless takes time and effort. But the good news is you are not alone! The serverless community is an ever growing one and is really inclusive. Join the serverless forum on slack or visit serverless days and you will for sure find people who will help you take your first steps.

So long, and thanks for all the functions!