In 2019, Lyft’s frontend architecture needed a reckoning. We were growing quickly as a company, and new teams were creating new software systems daily. At that point in time, we were generating new frontend services from a service generator template — complete with a copy of our bespoke, zero-config frontend build platform. Having such an easy means of service creation led to an explosion of new services with heterogeneous code built upon our React-based frontend architecture.
At the same time, we were running into headwinds trying to maintain our own frontend platform — an internal set of Webpack configurations, ESLint libraries and framework code — and finding ourselves bogged down troubleshooting cryptic build errors and generally finding our productivity sapped by such support requests. Because codebases began to diverge (as they do in microservice architectures), our developers found the task of upgrading to new versions of our frontend platform to be time-intensive and frustrating.
With over 100 frontend services and nearly as many frontend engineers, it was clear something needed to be done in order to ensure that our platform was maintainable for Lyft’s growth.
What problems did we have?
We sat down and named some of the core issues we were facing:
- Drifting infrastructure: New platform releases did not see uniform uptake, leaving us with a long tail of services that were left on older platform versions. With time, our frontend infrastructure began to drift, leading to maintainability and code complexity issues.
- Keeping the entire service fleet up to date is hard: The responsibility of upgrading each service fell upon our product engineering teams, who were often busy and overbooked. This led to services falling behind on security and performance updates.
- Proliferation (and divergence) of infrastructure code: Each service implemented frontend infrastructure (like Redux, or server-side rendering) in its own special way according to its own needs and team preferences, leading to heterogeneous implementation of common app patterns.
- Performance bottlenecks: As new technologies like dynamic imports and other bundle size optimizations became available, frontend services that had not been upgraded to our latest platform began to lose out on performance wins offered by newer platform updates.
- Common tasks are hard to apply at scale: Tasks that would normally be simple were difficult to apply at scale. For example, if we wanted to introduce styled-components to our service bundles, we would need to manually go into each service and add it in its own special way for each service’s implementation approach.
- Lack of standardization: Sharing code is very difficult due to our heterogeneous codebases. Our engineers must often reinvent the wheel when implementing patterns and modules instead of leveraging shared code and libraries.
We made the decision to turn to the open-source community to find a batteries-included framework that would solve these headaches for us. After evaluating different platforms, we landed on Next.js! We liked:
- Its batteries-included, opinionated philosophy would help unify the divergent architectures of our platform.
- Its executable wrapper that allowed us to move all central application concerns behind a module interface and remove our need to maintain our own build system architecture.
- Its strong open-source ecosystem, friendly community, and solid documentation really sold us on the future growth and trajectory of the platform.
We could have just pulled Next.js off the shelf and asked everyone to use it as-is, but we still needed to solve a couple more problems.
Plus a little bit of Lyft special sauce…
Two problems remained unsolved with Next.js out of the box.
First, we needed to automate platform migrations for the future.
We needed the ability to write easy-to-run and bulletproof service upgrades, and to be able to apply these at scale. To solve this, we designed a migration service with jscodeshift that allows us to ship and run migrations that automatically update service code when upgrades are run.
This means that any future breaking changes in our platform will come with automatic codemods that upgrade code in the host app. This also means that we can open pull requests to upgrade services across the fleet without product engineering intervention.
We needed a way to code share.
We wanted to build an extensible application architecture that would allow developers to write plugins to introduce different state managers and packages with as little configuration or glue code as possible. We designed a plugin service around Webpack Tapable that would allow any of our developers to inject shared Lyft packages into server middleware and the client React app to accomplish different tasks in our ecosystem — from GraphQL clients, Mirage mocking support, UI component libraries to shared libraries around metrics and logging.
Developer communication is key
One doesn’t just go and upgrade 100 services on their own — we needed to validate and understand the pain points from our product engineering teams before we committed to our design. We interviewed engineers from around the company to learn about their challenges and pain points with the current platform and collect feedback about whether our new design would solve them. We kept teams updated with the progress of the new stack in our internal frontend guild all-hands meetings. The entire process was transparent and developer-centric from start to finish.
We named our new platform @lyft/service and identified a small group of developers that would be involved in alpha testing platform milestone releases. As our platform continued to mature, we expanded this audience to larger groups of teams that gathered for half-day migration workshops. Doing these sessions really helped us to build a community where teams worked together to learn the Next.js architecture, help each other fix issues, and understand more context to why we made the design decisions we did.
Beta testing & some issues with React Router
We ran into a few roadblocks that emerged only after we began migrating services in beta sessions. For example, we had assumed that it was possible to migrate all our applications from React Router to the default Next.js filesystem-based router. However, due to the very specific ways that React Router was implemented in our services, we found it was nearly impossible to easily codemod these routes. Instead of migrating away from React-Router, we built a feature that allowed our engineers to preserve their existing React Router router and migrate to the Next.js router one route at a time.
How does a migration work?
A migration to @lyft/service is incredibly easy to run. A service owner simply invokes one command:
$ npx lyftsrv upgrade
And our codemods go to work and safely upgrade code in-place. Once that’s done, most of the heavy lifting is complete!
Of course, loose ends must also be addressed by each service owner, like:
- Fixing unit tests
- Integrating with the new Next.js router (or using our implementation of React Router)
- Upgrade related packages that may need manual intervention (like usage of mobx or Redux).
On average, the work needed from running the migration scripts to tying up loose ends takes a matter of days.
Today, @lyft/service runs nearly 40% of our frontend fleet, and we’re accelerating its adoption quickly.
We have seen incredible feedback for this new platform, including the following wins:
- Reduced the dev feedback loop (time from code change to browser update) by 350ms.
- Removed 845kb of bundle size (in our boilerplate app).
- Removed 10,000 lines of infrastructure code from each service.
Migrating to this new platform will continue paying off in the future, as:
- New upgrades are as simple as NPM module upgrading @lyft/service and running a migration CLI command. Because infrastructure code is fully encapsulated behind a package (and a suite of plugins), migrations require far less surface area than they did in the past.
- Migrations can be done automatically with automatically-opened PRs across the fleet, requiring far less product team intervention, and all services can receive the latest and greatest that the Next.js community has to offer.
For more details about our migration, watch Josh’s talk on our migration process at Next.js Conf 2020!
Want to get behind the wheel?
If you’re interested in working with us on the next-generation stack (pun intended) we’ve outlined here, solving complex transportation challenges to create the world’s best transportation service, we’d love to hear from you! Visit www.lyft.com/careers to see our openings.
We would like to thank the following teammates for their work and contributions to our new platform:
Daniel Kempner, Elad Ossadon, Derek Louie, Dustin Savery, Guy Thomas, Kim Truong, Jordan Patton, Jose Padilla, Martín Conte Mac Donell, Shekhar Khedekar, Adam Derewecki, Andrew Oh, Ryan Jadhav, Evan Madow, Derek Schaller, David Andrus, Beto Dealmeida, Ashley Yiu, Moon Jang, Alex Ung, Alexandre Smirnov, and Marcos Iglesias
Changing Lanes: How Lyft is Migrating 100+ Frontend Microservices to Next.js was originally published in Lyft Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.