Function-as-a-Service offerings such as AWS Lambda have played an important role in the serverless revolution. The promise of deploying code in a scalable manner while paying only for what you use and without the need for any infrastructural management is simply too attractive to ignoreAWS is capable of picking up the standard output of most of its managed services and automatically storing it in CloudWatch, making it very easy for application code to write logs in a uniform manner, regardless of whether they reside in ECS, AWS Lambda, or some other managed service.

Unfortunately, while CloudWatch is great for ingesting large amounts of application logs in AWS, it’s not particularly friendly for locating and analyzing specific log data. Instead, for this purpose, you have Elasticsearch: a technology that has become a household name in recent years for managing and analyzing huge amounts of data, especially log data.

How Log Streaming Works

Figure 1: A Lambda’s logs are streamed from CloudWatch to Elasticsearch via a dedicated Lambda function.

By setting up a streaming subscription, you can stream logs from CloudWatch to an AWS Elasticsearch Service cluster. Therefore, application code (in this case a Lambda function) will first write its logs to standard output, which is picked up automatically and stored in CloudWatch. If a streaming subscription is set up, this will cause a special Lambda function to fire, reading logs from CloudWatch and synchronizing them into Elasticsearch as they arrive.

The Lambda function used to stream logs from CloudWatch to Elasticsearch is automatically generated when a streaming subscription is set up. However, just like with any other AWS Lambda function, you can customize it as needed, which is particularly useful, for example, when you want to add custom fields or write to indices that have a particular name. You can reuse the same function as well to stream logs from different sources.

As with any other AWS service, log streaming comes with its own caveats. Being a push mechanism, it will stream logs from the point in time when it was set up, meaning that older logs will not be sent to Elasticsearch. It can also skip streaming some logs entirely if errors are encountered, e.g., if the log streaming Lambda function is throttled due to excessively high usage.

Setting up Elasticsearch in AWS

To stream AWS Lambda logs to an Elasticsearch instance, the latter must be set up first. From the AWS console, go to Amazon Elasticsearch Service and click on the “Create new domain” button. For an exploratory first setup, you can choose the “Development and testing” deployment type, select a name for the Elasticsearch domain, and go with reasonable defaults for most options.

However, you should pay close attention to some of the security settings. The most important of these is whether your Elasticsearch cluster will be configured with VPC access or Public access. The former means that Elasticsearch can be accessed only from within a VPC, which makes it tricky to access from outside AWS (e.g., directly from the office) but also secures it against unauthorized external access. Public access, on the other hand, means it is open to the world, so access should be restricted based on security policy (e.g., allowing access to specific IP addresses) rather than allowing just anybody to reach the cluster.

Logging via AWS Lambda

The next thing you need is an actual AWS Lambda function. From the AWS console, find the Lambda service and create a new function. Simply give it a name, and choose one of the many supported languages, including .NET Core, Go, Java, Node.js, Python, and Ruby. In this example, we’ll assume that the code will be written using Python. Hit the “Create function” button; now it’s time to write your Lambda function’s code.

The following Python code is an extremely simple example that takes a dictionary of input parameters and writes it to standard output as JSON:

import json
def lambda_handler(event, context):    

After applying this code in the editor and saving the Lambda, you can invoke it using a payload such as the following:

  "age": 21,
  "country": "Malaysia",
  "gender": "f"

After running the Lambda function with this input, you can then observe the same JSON in the function’s CloudWatch logs, together with START, END, and REPORT logs that are automatically generated by the AWS Lambda service.

Overview of Creating a Log Stream

You can create a log streaming subscription using the AWS console, which will set up many things for you, including the log streaming Lambda. AWS recently released a new user interface to create log streaming subscriptions via the console, so you can currently choose whether to use the old interface or the new one.

However you decide to do this, the console will assist you in doing the following:

  1. Create a Lambda function to perform log streaming
  2. Give this Lambda a role with the appropriate permissions
  3. Give CloudWatch permissions to execute this Lambda
  4. Set up the log format and filters (optional)

Creating a Log Stream in Practice

First, locate the log group belonging to your Lambda function in CloudWatch logs, which would normally be called /aws/lambda/name_of_your_function. From the “Actions” drop-down, select the appropriate option to set up the log streaming subscription. In the next screen, choose the name of the Elasticsearch domain you created earlier.

Next, choose a role that the log streaming Lambda (generated as part of this process) will assume when it executes. This role needs access to several Elasticsearch, CloudWatch, and EC2 permissions. This forum thread contains an example policy that you can attach to this role, as well as other instructions on how to set up the log streaming subscription in case you run into issues.

The console will normally set up a trigger, which will cause CloudWatch to execute the new Lambda function when new logs are available. If this fails, you can set it up manually by locating the new Lambda function and adding a CloudWatch trigger.

Finally, you can set the log format to JSON and add a subscription filter pattern to control which logs get sent to Elasticsearch. For instance, to exclude the AWS Lambda service’s START, END, and REPORT logs, which are not in JSON format, you could simply use the pattern “{“ to ensure that any logs have at least a curly bracket. Go ahead and test this out in the “Test pattern” section before proceeding.

Once all of this is set up, your logs should be in Elasticsearch. However, at this stage, there is not much feedback telling you whether the log streaming is working successfully or not. You can check the logs of the new log streaming Lambda function in CloudWatch; this will tell you if it is being executed and if there are any errors, but ultimately you will want to see your logs in Elasticsearch itself.

Inspecting Logs with Kibana

Elasticsearch usually works in conjunction with Kibana, its companion software that searches for and visualizes data. AWS Elasticsearch Service automatically provides Kibana as part of any deployed Elasticsearch clusters. If you opted to set up Elasticsearch with VPC access, you’ll need some additional effort to access Kibana. For example, you might have to access it from an EC2 instance in the same VPC after configuring the Elasticsearch cluster’s security group to allow access to this instance.

By default, logs streamed from CloudWatch are written to an index of the form cwl-timestamp. From the Management page (last item in the left sidebar), you can set up an index pattern of cwl-* allowing you to search through the logs.

The first three items in the left sidebar are DiscoverVisualize, and Dashboard. These build on each other, so you can run queries in Discover, use those queries to create visualizations in Visualize, and combine visualizations into dashboards in the Dashboard section.

Once the index pattern is in place, you should begin to see the log data in Discover. You can filter data based on specific fields, e.g., various countries, and also control how to display the results, e.g., a table instead of the default JSON dump. You can also choose the time range in which to search or do a full-text search across all fields.

Elasticsearch uses a flexible schema, so you can log arbitrary data and search for it with ease. While the earlier example showed how you could log object data, it also makes a lot of sense to enrich this with application metadata. Adding things like the name of the Lambda, its executing request ID, or even a global correlation ID can help a great deal when you’re troubleshooting a problem and want to see the sequence of events for a particular request.

Additionally, you can save, restore, share, and use all searches to create visualizations, which you can then use in dashboards.

Figure 2: Example of a dashboard in Kibana showing a pie chart and data table.

Wrapping Up

As we have seen, you can use a Lambda function (or one of several other AWS services) to write logs into CloudWatch and then stream them into Elasticsearch for further analysis. You can do this by setting up a Lambda function that is triggered every time new CloudWatch logs are added, pushing those logs into Elasticsearch.

Kibana allows anyone who needs to see this log data to easily search through it, filter and visualize it, thus enabling them to investigate issues, gather statistics, and more.