Onceit is one of New Zealand’s fastest-growing online fashion sites. It launched in May 2010 with a staff of one and a dream to become a leading destination for online designer sales. Onceit has become a one-stop destination for more than 500 local and international brands that keep the company’s customers on top of the latest styles and trends.

A Serverless Transformation

As part of its serverless transformation, Onceit is developing three different projects, according to Diego Lewin, Head of Development at Onceit.

  1. An application to synchronize products from e-commerce to a marketplace platform
  2. A complete rebuild of a Warehouse Management System
  3. A merchant portal for suppliers

“Understanding the buyers’ experience is crucial for Onceit.”

Any new project is done in containers and serverless on AWS, and Onceit is migrating existing code to serverless and microservices.

As part of the implementation, Onceit is using multiple AWS services: Amazon Elastic Container (ECS), AWS Lambda, API Gateway, SNS, SQS, DynamoDB, CloudWatch, RDS, and more. Onceit also is using MailChimp and other APIs.

Better understanding the customer

When it chose a modern cloud architecture, based on ECS and AWS Lambda, Onceit encountered significant challenges in ensuring great customer experiences. Onceit needed to gain end-to-end visibility to track issues in a particular segment of the website, understand how customers might be affected, and analyze how that might affect conversion rates.

Cloudwatch logs contained extremely detailed information for debugging and troubleshooting. With a complex architecture involving several AWS services such as API Gateway, Lambdas calling Lambdas, Dynamo, RDS, SQS, SNS, Onceit required end-to-end visibility on the whole request with minimal manual effort.

Previously Proposed Solutions

Onceit tried AWS X-Ray. It was useful, but Onceit struggled because it couldn’t show the whole picture, such as Lambda to Lambda calls or MySQL requests. Onceit went back to using logs, which was a setback.

Onceit also tried solutions provided by other vendors. Unfortunately, they couldn’t provide visualization of the entire architecture and required manual instrumentation, instead of doing it automatically.

Leveraging Performance Monitoring

The choice of AWS enabled OnceIT’s development teams to focus on logic for delivering business applications, features, and services by automating delivery, scaling, and management of compute and storage. In addition, AWS helped in reducing the time to market for new applications and features, which is very important for OnceIt.

Once Onceit decided to use AWS, it investigated solutions that could help the teams scale quickly.

There were several reasons for Epsagon:

  • Architecture visualization
  • Performance and timings of API calls
  • Getting the CloudWatch log for every request
  • Intelligent cost prediction
  • Identifying non-used (=unused) functions
  • Out-of-the-box experience without manually changing functions
  • Performance monitoring for containerized environments
  • Tracing and logging in a single interface

Monitoring Serverless Workloads

The onboarding was very easy, Diego explained. Onceit used the Serverless Framework plugin, which was easy to get started with. Onceit was finally able to get the complete picture of all the requests and lifetimes and the time being spent in every part of the application. This complete picture was very important, since Onceit wanted to identify bottlenecks in the application.

“The onboarding was very easy.”

Onceit has multiple databases. In one particular case, with Epsagon’s aid, Onceit was able to find a problem by measuring how much time each query took and identifying the associated Lambda.

Improved Performance and Confidence 

A Distributed Transaction, Automatically Captured by Epsagon

Onceit ‘s mean-time-to-resolution has decreased significantly, Diego noted. Onceit is using Epsagon’s Trace Search capability to easily find specific requests in the application and all the associated information. “Overall, Onceit is seeing a 90% decrease in our troubleshooting time.”

“Overall, Onceit is seeing a 90% decrease in our troubleshooting time.”

“Onceit finally has visibility into the application. We are able to see all the requests and where they’re coming from, which gives us more confidence when developing and operating our application,” Diego said.

“We can also improve our performance faster. Since we can easily identify where issues are, we can optimize the memory and costs of Lambda functions. Onceit found optimizations that we wouldn’t have known about otherwise.”

“Onceit found optimizations that we wouldn’t have known about otherwise.”

Onceit also had some third-party integrations that weren’t working and that Onceit didn’t even know about. “Epsagon helped us find them,” Diego said.

Cost monitoring by Epsagon has been helpful to find potential optimizations of memory and timeouts. Onceit uses it to monitor the health of the system. It also helps to keep the technical debt under control in the infrastructure. “We are using cost as an indicator of issues and optimization opportunities,” Diego explained.

Quality improvement was also a major benefit. “By identifying slow queries and errors in third-party APIs, we are able to identify architecture problems. We decided to split one Lambda into several Lambda functions, for example. By doing so, we were able to improve the quality of the queries and the code. So far, Onceit is seeing a 40% quality improvement,” Diego explained.

“So far, Onceit is seeing a 40% quality improvement in queries and code.”

Monitoring Containerized Services

Managing ECS clusters with Epsagon

In addition to serverless, a significant part of the Onceit application runs on ECS clusters. Onceit chose ECS as it is well suited to complex, scalable applications. Onceit supports its microservices workloads with ECS to drive greater efficiency. By using ECS, Onceit reduces the operational effort of deploying applications by consolidating an application code, configurations, and dependencies into a single object.

The main challenge was limited visibility into container-based workloads. Onceit teams run multiple environments in several stages and clusters, which results in several of ECS clusters spread across multiple AZs and multiple accounts. Since understanding the buyers’ experience is crucial for Onceit, it now can understand application processes running on ECS clusters by using Epsagon’s distributed tracing engine that automatically collects infrastructure data from the ECS task metadata endpoint.

“Epsagon gave us invaluable visibility into our whole stack. Now, I cannot imagine running serverless and containers in production without Epsagon,” said Diego.

“Epsagon gave us invaluable visibility into our whole stack. Now, I cannot imagine running serverless and containers in production without Epsagon.”

Using Epsagon, Onceit can map clusters performance metrics, like utilization and reservation, and review current services running on ECS and focus and filter only the important ones. Their largest service is their Koa API that handles high concurrent traffic. Epsagon easily displays the status of the service with all running tasks. In a single unified platform, Epsagon provides Onceit metrics based on the service and detects spikes in CPU as well as memory usage.

Epsagon Metrics based on the Service with CPU and Memory Usage

It also helps them peek into logs of certain tasks to verify they are running properly and detect underperforming services.

Compressive visibility with Epsagon dashboards

Once Onceit completed Epsagon’s AWS integration, the business could access an out-of-the-box dashboard that provides detailed information about  ECS clusters, including the status of deployments, cluster-level resource utilization, and live status of ECS clusters.

Onceit was able to see how the health and performance of containers correlate with the application running the tasks. “Essentially, by integrating Epsagon’s CloudFormation stack, all clusters were scanned automatically and displayed in a unified dashboard that presents both infrastructure data, and the application level metrics, including logs and traces,” Diego said.

“Essentially, by integrating Epsagon’s CloudFormation stack, all clusters were scanned automatically and displayed in a unified dashboard that presents both infrastructure data, and the application level metrics, including logs and traces.”

Epsagon pulls tags from Amazon CloudWatch automatically to group and filter metrics. This capability makes it possible to monitor ECS CPU utilization for a single cluster and then drill in to see how each container contributes.

Epsagon provides Onceit engineers with the ability to correlate a problem from the trace to the environment and the infrastructure it is running on, track the disk I/O operations, and perform health checks. Epsagon’s platform automatically generates an architecture map that includes all running services and tasks, allowing Onceit to obtain greater visibility into the application’s health and reduce errors and troubleshooting time.

For a specific task, performance metrics are revealed such as CPU usage, network statistics, disk operation, and memory usage—alongside the most critical metric—the EC2 status check.

Epsagon Performance Metrics by Task

When a task is unhealthy, Onceit can quickly see related traces regarding the running application or jump to the AWS console for more configurations. “With Epsagon, Onceit experienced a 90% reduction in troubleshooting time and 75% reduction in error rates. This productivity helps us to scale container workloads quickly and keep pace with business demands,” Diego said.

“With Epsagon, Onceit experienced a 90% reduction in troubleshooting time and 75% reduction in error rates.”

Onceit wanted to provide a better shopping experience for all customers by reducing downtime and troubleshooting application and infrastructure issues as quickly as possible. “We achieved a 360-degree view of the production application by leveraging Epsagon, enabling us to monitor ECS workloads for each and every task and dive into application layer metrics, such as requests, errors, and payload visibility,” Diego explained.

Understanding Business Impact

“Before Epsagon, we didn’t understand the full impact of our decisions,” Diego noted.With every addition to our architecture, more issues kept popping up. Epsagon gave us the confidence to use advanced and different design patterns.”

“Before Epsagon, we didn’t understand the full impact of our decisions ….Epsagon gave us the confidence to use advanced and different design patterns.”

“Today, the breakdown of our developer day is roughly 60% coding, 20% testing, and 20% debugging. Before Epsagon, the time spent debugging was extremely high, and we had very limited visibility or confidence to resolve complex issues. Now, with Epsagon, things are running smoothly and coherently and are easy to use.”

Using Epsagon, Onceit was able to analyze customers’ buying behaviors and ensure the customer experience.  Epsagon’s unified platform that correlates metrics, logs, and traces enables Onceit to troubleshoot quickly and accelerate new feature development and deployment.

“Using Epsagon, Onceit was able to analyze customers’ buying behaviors and ensure the customer experience.”