Online Furnishing Shopping at the Scale of Serverless

Dunelm is UK’s no. 1 homewares retailer offering their customers over 300,000 products to enhance every room in their home. Dunelm is a multichannel retailer with over 170 superstores, four high street stores and a market leading website, Dunelm.com, featuring extended ranges and delivery convenience (Home Delivery, Collect+, Reserve & Collect) via multi-device functionality and our own delivery fleet.

With Tonino Greco, Head of Infrastructure and DevOps at Dunelm

Cloud at Dunelm

Dunelm (Soft Furnishings) Ltd, based in Leicester, is one of the most popular websites in the UK to purchase home furnishings online.

To improve service and keep up with demand Dunelm is going through a major transformation. They are writing their e-commerce website in serverless. The project involves designing, building, and operating an almost 100% serverless application, comprised of thousands of AWS Lambda functions, at a very high scale. A group of over 80 developers and DevOps practitioners in two separate locations are working on this large project.

Dunelm’s e-commerce website

The first part of the architecture is based on API Gateway and AWS Lambda which serves the users of the website. The system uses many AWS services such as SNS, SQS, DynamoDB, Aurora, and data lake platforms. Transactions connect to existing systems such as ERP and checkout systems, which are built using Docker containers. Overall, 90% of the code is serverless.

They had several sessions and workshops with solutions architects from AWS, which helped a lot. “The adoption within the team has been amazing,” said Tonino. “They could start working on a new service on Monday and already deploy it by Wednesday”.

Challenges

As the teams were moving away from containers to serverless and AWS Lambda, it was challenging to break services to dozens of functions. It was difficult to correlate them and have a consolidated view of all the many services.

The serverless cost was another concern. While Lambda is very cheap, they needed to know what the expected cost of their service would be.

Visibility into their functions and how well they were executing was critical. They needed a solution that can handle the scale of thousands of functions in a convenient way.

Searching for a Monitoring Solution

Dunelm was looking into several monitoring solutions. They wanted a tool they can set up quickly and effectively and will provide them visibility into their distributed architecture.

“We also looked at AWS solutions, X-Ray among them. It was useful but required a lot of manual implementation”. They estimated weeks of work to implement and build the required dashboards.

They also looked into other vendors. The main drawback was that it required them to spend a significant amount of time to set up. Eventually, they couldn’t get them up and running on time.

Impressions of Epsagon

Dunelm heard about Epsagon through AWS. “10 minutes into the conversation it was clear to us that this is the solution we’ve been looking for.” said Tonino.

“We were interested in visibility and traceability”. They started a POC with Epsagon. They wondered how long it would take to configure their development and production environments in Epsagon. “We clicked on one button and all the functions were visible right away. We were ecstatic.”

One of Dunelm’s applications, traced by Epsagon

After a week, more and more people in the team were using Epsagon. Tonino demoed the tool to 80 people from additional teams to show them what they’ve been working on. The other teams were excited and asked to try Epsagon as well. “The adoption of the tool has been great. Epsagon is high up in everyone’s mind as THE tool to use. The architecture view was very useful to show to other teams and to the management. We had additional sessions with Epsagon about the traceability and quick troubleshooting, which was also helpful.” said Tonino.

Serverless Cost Monitoring

As a pay-per-use service at a high scale, understanding the cost of AWS Lambda was very important for Dunelm. They were worried about how much the new platform is going to cost. While the architecture is structured, it is difficult to predict the actual cost in production.

Our CIO asked us: “Do we know what the billing for AWS Lambda will be?”.

“Epsagon provided us a way to know the cost of Lambda in a clean and easy way”, said Tonino. “We used Epsagon to immediately show the end-to-end cost.”. The CIO asked: “are you sure this is only $8?”. Epsagon helped us make this cost predictable and ensuring that it is low.

They also used the cost as a way to encourage healthy competition between the development teams. Grouping functions by applications showed the total cost of the application. “The teams kept score on who has the most cost-efficient application, which was fun and productive,” said Tonino.

Tracing external API calls to the Salesforce SaaS platform

Working with the Epsagon Team

According to Tonino, one of the biggest wins was the support. “Epsagon’s team has been absolutely amazing. We had a Slack channel with several of the Epsagon’s team. Questions were answered quickly – 9 out of 10 questions were answered right away. We asked for a bunch of feature requests and most of them were implemented very quickly. The fact that the things they asked for are on a roadmap or just done is refreshing. It has been a really good journey.”

Demonstrating the Effectiveness of Serverless

“The culture at Dunelm is amazing. We demonstrated the capabilities to the CEO and the management of the company and they were blown away,” says Tonino. According to him, the Development Tribes in Dunelm have vast amounts of autonomy and what they built demonstrates that.

Quantified Outcomes of Using Epsagon

“Instead of spending a lot of manual work to set manual instrumentation which other vendors require, or implementing multiple dashboards on Grafana, purchasing Epsagon ‘off the shelf’ required little development and provided a lot of visibility very quickly,” says Tonino. “We completely gave up on our original plan to push performance data into a self-implemented dashboard and used Epsagon instead.” He says that all they needed to do is add a couple of lines in the SAM CLI.

“Using Epsagon, tasks take seconds instead of hours or days. We are seeing a 90%-95% decrease in fixing issues and troubleshooting time

Once creating an application, all the functions are automatically associated with Epsagon and the architecture visualization appears. This was enormous for them.

The team also used the performance data in Epsagon to optimize the Lambda functions, also as part of their internal competitions.

“Having monitoring at the beginning can pay off quickly as troubleshooting time decreases significantly.”

Conclusion

With Epsagon, Dunelm is able to execute their vision of a fully-serverless system at a high scale. The adoption across the engineering and DevOps team has been great, and the support that Epsagon provided was a key parameter in the success of using the tool.

As troubleshooting time decreased significantly and architecture visualization and cost predictability became the standard at Dunelm, their team is able to implement new cloud services while keeping the highest standards of quality.