Via is re-engineering public transit, from a regulated system of rigid routes and schedules to a fully dynamic, on-demand network, connecting multiple passengers who are headed the same way to share a premium vehicle.
Via Revolutionizes Public Transit
Via is re-engineering public transit, from a regulated system of rigid routes and schedules to a fully dynamic, on-demand network. Via’s mobile app connects multiple passengers who are headed the same way, allowing riders to seamlessly share a premium vehicle. First launched in New York City in 2013, the Via platform operates across the globe in over 50 cities. Via has partnered with public transportation agencies, private transit operators, taxi fleets, private companies, and universities, seamlessly integrating with public transit infrastructure to power cutting-edge on-demand mobility.
“Via is re-engineering public transit, from a regulated system of rigid routes and schedules to a fully dynamic, on-demand network.”
Billions of Events
Via’s users—riders and drivers—are sending billions of events to their backend engine on a monthly basis. Via’s application handles, tracks, and routes cars simultaneously and matches them to riders using sophisticated algorithms, taking many parameters into consideration, such as the traffic in the city, the location and drop-off time, other riders in the city, and more.
In terms of scalability, these events create an extremely high and spikey load on the backend systems. For example, New York City, one of Via’s major operational regions, is particularly busy from 6 AM to 9 AM, and from 4 PM to 7 PM, when people are commuting to and from work. The demand can grow up to 20 times in rush hours compared to low-demand periods. This requires rapid scaling in massive magnitudes.
“The demand can grow up to 20 times in rush hours compared to low-demand periods. This requires rapid scaling in massive magnitudes.”
A Cloud-First Architecture
As a cloud-first company, Via’s traditional architecture was already built on AWS. It was based on AWS ELB (Elastic Load Balancer) and standard EC2 servers. When scaling up, Via had to add additional servers to handle the load, growing from tens to hundreds.
Since Via also used microservices that were connected to these AWS ELB and to one another, the entire system had to scale together, which forced the team to scale multiple ELBs when one has already scaled.
Via wanted a way to be infinitely scalable and to do it fast—“to go from 1 time to 200 times in no-time,” according to Ynon Cohen, Engineering Team Leader at Via.
“Via wanted a way to be infinitely scalable and to do it fast—to go from 1 time to 200 times in no-time.”
They also wanted a way to split the traditional server into small pieces, each of them highly scalable without strong dependencies between one another—that is, remove the coupling between different services.
Moving to a Modern Architecture
Via’s backend fully runs on Kubernetes clusters. Via considered adding new services on the Kubernetes framework versus using Lambda serverless as infrastructure. Eventually, Via decided to go with Lambda for all new services.
“Less work on DevOps – more work on our business,” said Ynon.
Via started developing new services using AWS Lambda. They also re-wrote some of the existing services using Lambda, breaking large code pieces into Lambda functions.
In the last year, 80% of the Lambda code is new. Via used Serverless Framework to manage and deploy their applications.
In terms of cost, Via saw an extensive reduction in their cloud costs. “Lambda is significantly cheaper than anything else out there. Our total cost dropped drastically,” said Ynon.
Challenges with AWS Lambda
Going serverless wasn’t free of problems. The team encountered different challenges, from cold starts in a VPC, concurrency and parameters limits to dealing with the concept of stateless code, which required a mindset shift.
“The team encountered different serverless challenges, from cold starts in a VPC, concurrency and parameters limits to dealing with the concept of stateless code, which required a mindset shift.”
The most significant challenge that Via’s team encountered with serverless infrastructure was troubleshooting applications.
The team put effort into unit-testing their applications, but soon realized that many issues arose not because of a single Lambda function, but rather due to the wiring between the Lambdas, such as an SNS topic that triggers another Lambda.
One of the main challenges of a serverless microservice architecture is knowing “who is talking to whom and why.”
“One of the main challenges of a serverless microservice architecture is knowing who is talking to whom and why.”
Knowing how its backend architecture looked like would provide several benefits:
- Onboard new employees quickly
- Gain confidence in their application
- Discover hidden issues – a wrong connection between services, or to understand “why is this call being made?” to an external service such as Mandrill or Twilio
Epsagon @ Via
Via started using Epsagon after already running production workloads with billions of events and thousands of Lambda functions involved.
By enabling Epsagon’s instrumentation to Via’s functions, Epsagon immediately discovered Via’s production architecture. Epsagon gave them the confidence to monitor the connectivity mesh of their serverless microservices. Via uses Epsagon regularly to find out about common issues such as Lambda functions that are not connected to anything, or a Lambda that calls itself for no apparent reason.
Monitoring is primarily done via Epsagon’s Slack integration. Since every service has its own Slack channel, issues surface clearly and are easier to handle quickly. When getting an alert, going from the alert to the exact transaction, including all the relevant traces and logs, enables them to troubleshoot issues faster than ever. “When it comes to monitoring and troubleshooting our serverless mesh, we rely on Epsagon,” Ynon said.
“When it comes to monitoring and troubleshooting, we completely rely on Epsagon and its automated approach.”
According to Ynon, 50 engineers use Epsagon across different teams. On average, it saves every engineer that uses Epsagon up to half a day every week.
“On average, Epsagon saves 50 Via engineers up to half day every week.”
Mostly, Epsagon saves them a lot of frustration, which enables them to keep building and innovating.
Epsagon saves Via’s teams 50% of the troubleshooting time, which has a significant impact on their business and customer experience.
Best Practices Guidance
Via emphasized a best practice approach for building a modern application that includes the following considerations and guidance:
- Know the limitations of AWS Lambda
- Serverless requires a mindset change when it comes to the management of massive concurrency.
- Many things happen in parallel – how will Lambdas behave when triggered by an SQS that has 50K messages concurrently?
- It’s difficult to know which events were handled correctly and which were not, and how to fix them. You need to think about it in advance, and a tool like Epsagon greatly helps.
- “Think nano-services” – it’s easy to take an existing Lambda cluster and just put more functionality on top, but Via found that the right way is usually to write a new service. If the word “and” appears in the service’s name, it’s a good sign it needs to be broken down into two separate services.
- Define a process for building a new service: 1) write the code, 2) test, 3) deploy, 4) monitor. This way, you won’t get lost every time you deploy a new service.
- “Start with application monitoring in mind” – both technical and business monitoring are critical. Think about post-production from day one.
Using Epsagon, Via continues to grow its modern cloud application to expand its business. Epsagon’s monitoring, troubleshooting and visualization capabilities are key to Via’s rapid application development for its riders.
“It was difficult originally to know which events were handled correctly and which were not and how to fix them. You need to think about it in advance, and a tool like Epsagon greatly helps.”