People can easily be misled by the term “serverless.” A serverless application doesn’t mean that the application doesn’t need any server to run. When you’re developing an application with the serverless approach in mind, you have two options for databases: host it yourself or go with a serverless database. The first route entails you having to maintain the servers (on-premises or in the cloud), taking care of securing the database, backing it up, scaling, etc. But if you take the serverless route, all of this is taken care of for you.
More and more organizations are adopting serverless databases these days, and it makes perfect sense to go with a managed database instead of taking on the chore of maintaining the database yourself. But the question is, how do you decide if you want to host or go serverless? There are a few metrics you need to look at before opting for a serverless database.
Considerations when Choosing a DB
Before diving in here, the very first thing to consider when looking at databases is what kind of data modeling you require. Two types of DBs are well-known here: SQL and NoSQL. The SQL model can be considered a relational database and is typically used in applications that deal with transactions or in applications that require complex joins across tables. NoSQL databases are used in cases where the schema is not defined or there’s no fixed schema. Also, not all NoSQL databases support joins. NoSQL databases are used when high ingestion rates are required.
There are two types of serverless databases: server-less and server-full. Amazon Relational Database Service (RDS), for example, is server-full, meaning that AWS still deploys an RDS database instance on a virtual machine (VM) in the cloud. If you’re talking about your database in terms of the number of instances to be deployed, you’re talking about a server-full serverless database.
The difference between this and the traditional approach of hosting a database on-premises is that the servers themselves are in the cloud, so you don’t need to access them physically. Also, when you want to scale your server-full serverless database, it’s just a configuration change in a console, not a week-long activity.
Amazon DynamoDB, on the other hand, is a serverless database. Because you don’t have to worry about any VMs here or instances, a serverless database is always up and running. You also don’t need to add more instances when you’re expecting a surge in usage nor cut back instances once the holiday sales season is over. The “managed” serverless database will take care of all these things by itself.
Serverless databases do autoscale, but this doesn’t mean you are charged for every second extra instance that is running. You pay only per the request made to the database. If there is an hour during which the database is not accessed at all, you don’t pay anything. This hands you a huge advantage in pricing, and with millisecond latency, this is perfect for high-traffic applications.
When you host a database yourself, you have complete control over the servers and the services you’re running on them. But that’s not the case for a serverless database. Monitoring any serverless application is difficult, as you can’t just collect any metric you want. Each serverless database and serverless infrastructure provider exposes a certain set of metrics that can be collected and monitored, depending on the database in question.
Not having enough metrics to monitor the performance of a database could prove fatal when there is an unexpected (or even expected) rise in traffic. You need to have access to the right metrics to predict when to scale up or down, figure out what’s causing bottlenecks in queries leading to bad user experiences, etc. Therefore, before choosing a database, it’s important to know what metrics a database provides for performance monitoring.
Monitoring a database will tell you that there’s something wrong going on, but you need to have access to tools and services to troubleshoot the database and fix the problem. Troubleshooting services vary greatly from database to database and depend on whether it’s a managed or self-hosted database.
For example, suppose you host a MongoDB cluster on your on-premises servers or even in the cloud on a few EC2 instances. If something goes wrong, you can easily look at the logs, restart the database services, etc. But if something similar happens to your DynamoDB database, how can you troubleshoot? The best you can do is raise a support ticket with Amazon or get on a call with their support team.
This does not apply only to fully managed vs. self-hosted databases. Some self-hosted databases may also not provide advanced monitoring and troubleshooting tools. So make sure to do your research before deciding on a database for your application.
Performance Metrics Comparison
The following are a few serverless and server-full databases on the market today and some of the performance metrics they provide.
Amazon DynamoDB is a popular managed database that enables monitoring via CloudWatch and other tools. But the metrics you can monitor are pretty limited. Also, you’ll realize that the metrics below don’t tell you anything about the performance of DynamoDB; instead, they’re all about your AWS account and usage limits. This is not a complete list but should give you an idea of what kind of metrics are available:
Note that because DynamoDB is fully managed and Amazon provides a service level agreement regarding the performance of the service, you don’t get metrics to monitor the performance of the database.
Monitoring Athena is similar to that of Amazon DynamoDB, in that since it’s a managed service, you don’t get a lot of metrics that tell you about database performance. Below is the complete list of metrics that Athena will send to CloudTrail or CloudWatch for monitoring:
Amazon Redshift is another fully managed data warehousing solution that Amazon provides. But the difference here is that Redshift is a server-full database, whereas Athena and DynamoDB are serverless. This means that you have to provision a cluster to start using Redshift, and because of this, you get to monitor the performance of the cluster at the query level. Below is a short list of metrics that Redshift provides for monitoring:
You can see the difference in the kind of metrics that are available for Redshift compared to DynamoDB and Athena. Note the focus on cluster performance monitoring versus AWS account or billing.
MongoDB is available as both a fully managed service and a self-hosted database. There are many ways to monitor MongoDB, including some nice built-in monitoring tools with recent versions. Below are some metrics that are available for monitoring a MongoDB installation:
These metrics are not as detailed as some of those offered by other databases, but they manage to give enough information for you to detect bottlenecks in your system.
PostgreSQL is another popular database that can be hosted on-premises, and you have a lot of tools available for monitoring its performance. Below are some of the metrics PostgreSQL provides:
- Tup_returned vs. tup_fetched
- Rows inserted, updated, and deleted per database and table
These few can indicate possible issues with query performance, at least for reads where proper indexes are not used.
Monitoring Serverless Infrastructure with Epsagon
No matter what kind of database you choose—SQL or NoSQL, fully managed or self-hosted—monitoring your serverless infrastructure is a challenge because of the distributed nature of the architecture and the often asynchronous communication between services. Dedicated monitoring tools can add a lot of value in such systems. For example, with Epsagon’s auto-generated service maps, you can visualize the flow of data and the various communication channels present between your applications and databases, as seen in Figure 1 below.
Epsagon can even collect logs from Amazon CloudWatch and then correlate them with other data collected, such as payloads from HTTP requests, to make, say, debugging or monitoring an Amazon Lambda function simple. Monitoring the logs of a completely serverless system is not easy when you’re trying to debug one particular HTTP request that resulted in a non-200 response. Epsagon’s logging and data correlation make it far easier.
With the ability to alert you when something goes wrong, or is about to go wrong, Epsagon makes sure you’re on top of a problem before it affects your end users. You can also easily customize alerts to avoid being bombarded unnecessarily.
If you’re using AWS Lambda functions in your infrastructure, you can integrate Epsagon as layers to gather information about your apps, requests, logs, etc., correlate everything in one place, and generate a service map. Plus, it’s easy to get started; you just have to add a dependency to your project.
There are many differences between a fully managed and a self-hosted database, and choosing one over the other might not be a straightforward decision. Along with performance differences (including scaling), it’s important to understand the options provided to monitor and troubleshoot a database. This becomes specifically important when you are leaning toward self-hosting a database on-premises. The various performance metrics exposed by these databases can help you evaluate their ability to monitor and troubleshoot and thus fix issues fast.