Bridgecrew, the developer-first cloud security platform, is changing the way companies secure their cloud infrastructure. By providing security-as-code and scanning infrastructure code earlier in the dev lifecycle, Bridgecrew enables today’s most tech-forward companies to find, fix, and prevent cloud misconfiguration.

With a product built for Engineers, Bridgecrew needs to ensure that it is making use of the most efficient technologies and architectures possible. As a result, Bridgecrew has a main footprint on AWS and leverages the microservice power of serverless and ECS Fargate. Originally, for its monitoring needs, the team relied on the out-of-the-box offering from CloudWatch. However, as they quickly discovered, relying on CloudWatch events with self-built integrations to tools like Slack and PagerDuty simply wasn’t enough as they scaled from ten to over one hundred Lambdas, with additional ECS tasks on top of that.

Gaps in Visibility

Bridgecrew develops and deploys at a fast pace, often upgrading production several times per day. Having thousands of resources in production, it was very difficult for its Engineering teams to get the right notifications and context from CloudWatch. Says Barak Schoster, Co-Founder and CTO, “We didn’t feel confident that we knew when errors were occurring and, furthermore, our Engineers got zero guidance after getting an alert from PagerDuty”. It became clear that Bridgecrew needed to implement a more robust observability strategy in order for its teams to automate monitoring, ensure reliability, and deploy new code quickly.

Enter Epsagon

As a security-focused solution, Bridgecrew knows how important application performance and uptime are for its customers. With their initial use of CloudWatch + Slack for monitoring, the team could only solve issues that it knew about and, even then, their mean time to detect (MTTD), root cause analysis time, and mean time to resolve (MTTR) were long. Timeouts, rigid alerting parameters, and a lack of context with identified issues were causing slowdowns and less-than-ideal performance, so the team needed to look to a third-party solution.

“As a startup we are all about moving fast, and Epsagon provided the immediate value we need to achieve that”

Barak and his team tried out Epsagon and were initially impressed with how quick and easy the setup was. Says Schoster, “As a startup, we are all about moving fast, and Epsagon provided the immediate value we need to achieve that. Within our Engineering team, we matured very quickly in the way we looked at issues in production.” Bridgecrew’s teams were routinely seeing a 1-2 hour time delay between an issue occurring and it being detected (MTTD), a metric which reduced to mere seconds after onboarding Epsagon. Additionally, they found long-term value in Epsagon’s ability to reveal issues that would have previously gone unnoticed and, once detected, immediately understand their root causes.

Along with Epsagon, Bridgecrew leverages integrations with Pagerduty, Jira, and Slack to ensure alerts and insights from the platform reach their Engineers efficiently.

Engineering Efficiency: A Key to Success

Barak was very impressed with the increase in Engineering efficiency after implementing Epsagon: “Before Epsagon, Engineers took a day or two to solve issues. Now, it takes an hour or less. Even junior Engineers can now solve issues in production the same day that they occur.” Epsagon also enables Bridgecrew’s team leads to have better governance and shift responsibility back to the Devs by splitting detected issues automatically using the OOTB Pagerduty integration.

Before Epsagon, Engineers took a day or two to solve issues. Now, it takes an hour or less”

Beyond decreasing issue detection, troubleshooting, and resolution times and increasing the efficiency of Bridgecrew’s development teams, Epsagon also acts as an interface between their Lambdas and databases, including DynamoDB, Neptune, Amazon RDS, and Elasticsearch. This has allowed for further issue detection and data correlation, bringing observability more easily and readily to Bridgecrew’s teams. In addition, Epsagon monitors webhook interfaces between Bridgecrew and services like Stripe, Github, Gitlab, and Bitbucket.

Lessons Learned

Although hesitant at first to implement a third-party solution for observability of their microservices, the Bridgecrew team credits Epsagon’s product and support for taking the headaches away from homegrown and open source alternatives. Additionally, Epsagon has helped the company in their journey to understanding best practices in the cloud (for example, when to use Lambda vs ECS) and how to make fast optimizations to their architecture while making sure they don’t “over-optimize” for use cases they don’t need.

Asked why he and his team chose Epsagon over other third-party vendors, Barak notes “Epsagon has the best context and largest support to services with easy onboarding and extremely strong user experience.” Additionally, Epsagon helps Bridgecrew’s Engineering teams to be transparent about production issues when they arise, which in turn enables them to be transparent with their customers.

Epsagon has the best context and largest support to services with easy onboarding and an extremely strong user experience

Barak and his team are proud of their system health indicator within Epsagon consistently remaining about 99.97% and plan to continue scaling their usage of the platform as their deployment of microservices continues to grow.

The future for Bridgecrew

As Bridgecrew continues its impressive growth as a company, the Engineering and DevOps functions will continue to scale. Given the significant value they’ve experienced with serverless + Epsagon, Barak says: “Our main strategy is to stick to serverless and microservices, and couldn’t imagine doing it without Epsagon.”

“Our main strategy is to stick to serverless and microservices, and couldn’t imagine doing it without Epsagon”

Why Epsagon

Epsagon enables Dev and Ops teams to instantly visualize, understand, and optimize what’s happening within complex microservice architectures. Teams are able to eliminate gaps in data and manual work, providing significant reductions in issue detection, troubleshooting, and resolution times. With a centralized location for understanding containers, Kubernetes, serverless and more, Engineers now know when something is wrong and can immediately trace issues to root cause before they affect production.

Increase development efficiency and reduce modern application downtime with Epsagon. Try Free for 14 days.