This post was originally published on Medium by Luther.ai’s Lead Engineer.

At Luther.ai, we use AWS Serverless Stack and Kubernetes for all the core real-time pipelines, and it is a data-driven execution across all AWS Services — ECS, Lambda, SQS, Fargate, etc.

With hundreds of services and thousands of invocations, each day presents significant complexity to configure, monitor, review logs, measure latency, etc. For configuration and CI/CD, we use the Serverless Framework for packaging and deploying AWS Lambda functions.

We tried multiple monitoring solutions, including just leveraging AWS-native options. However, the scale brought various issues including:

  1. Multiple programming languages being used for AWS Lambda development.

Along with the specifics above, the development team wanted to focus on the serverless function rather than increasing its monitoring footprint, causing many worries to on-call DevOps Engineers.

After many reviews of various services to help us solve the issues listed above, Epsagon was the solution we decided to implement.

Below is the journey of how we saved hours and hours of our serverless implementation. Let us break it into installation, monitoring, latency measurements, and notifications:

Installation / Onboarding: If you’re using AWS’ serverless development and are not using lambda layers, you’re missing a core feature that will help a lot. Auto-tracing for your AWS region Lambda functions is enabled with a simple workflow leveraging the Lambda layers, thus solving multiple programming languages development dependencies to enable monitoring (including custom logic per serverless function).

Monitoring: Once you have the auto-tracing enabled, proactive monitoring will help you with alerts and notifications. We use native Slack and email integrations to receive notifications. Each of the notifications has the contextual link to the alert within the CloudWatch log and the start time of the Lambda execution along with the service map of all the services used (external API calls, AWS Services inbound and outbound).

If you’re interested in patterns like I am, you can use the historical view (available for the last 7 days) to understand scenarios such as issues with the last deployment, any specific user action and/or scalability issues caused (I have an interesting issue which we uncovered using historical patterns, but for a future blog).

Epsagon Lambda Tabular view — showing execution count and cost for the month.

Epsagon — Tabular view of all the lambda functions

Latency measurement: With hundreds of services and multi-thousands of executions every day, even a couple of milliseconds of execution time added to one service in the real-time pipeline can result in a bad user experience. With Epsagon, it is efficient to isolate latency issues in multiple facets:

  1. As in the picture above, a unified view across all the functions is available with the Average Execution duration — which is a great start.
Epsagon Service View example — showing multiple services interacting.

Epsagon Service Map View example

Notifications: With all the integrated features, we configured extensible alerts — of which we use the PagerDuty integration for core functions. Also, the native integration with Jira helps to document bugs from the tool itself for each of the issues. The contextualization of information captured in the bug is a key feature — no more worrying about the log capture, issue timing, etc.

Did I say that we self-configured from start to finish in a weekend? Yes, we did both for our dev and prod environments. Here are the resources that have proven handy:

To conclude, serverless deployments and monitoring of workloads with hundreds of services and millions of invocations are no longer a “needle in a haystack” with Epsagon.

Try Epsagon free for 14-days >>

Read More on Serverless Monitoring:

Serverless Monitoring in Practice

How to Effectively Monitor AWS Lambda

Considerations for the Beginner Serverless Developer

The Hitchhiker’s Guide to Serverless