So where are we jumping into this story?
This post is part two of our story on how we dove into Ruby web server startup sequences to fix a strange but critical issue that a few of our customers were having. Part one is here. Read that first!
In part one, we set the context for the issue, explored what was happening, and dove into Rack’s design (and its role in the issue). At the end of the post, we shared our fix for the issue, which essentially was to implement true access to the final configuration of the running Ruby web server and use of the dedicated web server hooks to delegate actual boot to workers. This worked out great in our testing environment, but we hit an unexpected stumbling block as soon as we ran our full integration test suite on CI. That’s where we rejoin the story here in part two. Without further ado…
Wherein the plot thickens
Everything was working on our integration test app, across all possible versions of Puma, Unicorn, Passenger, Ruby, and Rails. So what was the problem then? Why did we smack into a wall? As it turns out, our integration test app’s
Gemfile orders things differently than what we always use, as we do hallway testing across varying scenarios and usually follow our own setup instructions for the agent. Those instructions give a simple shell command to append to the
Gemfile, so it’s usually last…
gem 'sqreen' before
puma and starting with
rails server, the gem is indeed required, but nothing happens, as there’s no web server to detect.
Out in the wild, we really can’t be sure of the final order in the
Gemfile: we may have customers with that order already, or people may add web servers later on, and anyway we really want to be as easy to use as possible. So we won’t settle for imposing the order.
Thinking it through, there may be multiple of those web servers required in the
Gemfile, but only one actively running, so we can’t rely on either constant presence or
Also, there are multiple ways to start web servers:
rackup -s, or the web server command itself:
passenger… In essence, this fix was not enough.
Gloves on. Investigation intensifies.
Investigating the Rails server startup sequence
The issue started to become very apparent on Rails and using Puma, so let’s work from that.
By setting a
binding.pry in the object that accesses the web server configuration to evaluate
preload? we could immediately see the issue:
::Puma::Configuration constant is not even there!
While Rails guides describes the typical startup sequence, it’s short on details and focuses on upper level concepts, geared towards application development. What we needed was a detailed timeline of what happens behind the scenes, including how the various gem parts are required, how the web server to run is selected, and when the web server configuration is finally available.
Let’s build a map of what happens (I removed the steps that don’t matter). The first part is mostly about finding our place around:
I drew a line here, as we’re leaving the early command part. Starting with this, we’ll add some information about whether the code resides in Rails or Rack because there’s a lot of back and forth, and indent based on call depth.
At this point, the web server will start and this call will be blocking until the end of the web server life. We picked Puma, so we’ll add some additional quick notes here:
- The Puma gem gets
Bundler.require, but only minimally as we saw before. This situation lasts all the way to
- Midway through
require config/environment, some more of Puma’s parts are loaded, but still not everything, and still no configuration, which is why we fail to be able to make a decision at
So let’s proceed where we left off:
Puma requires its own files very lazily, so we can’t hook
Puma::Launcher#run directly because it’s only available very late.
I was expecting
options to be filled with a default value, but instead it’s the
Rack::Handler.default method which makes a fallback selection. To know what the web server used will be, I can hook on the
Rack::Server#server accessor and get its return value.
Investigating the rackup command startup sequence
Compared to Rails, Rack’s startup is delightfully simple. A notable difference though is that Rack does not concern itself with Bundler so we’ll include that.
Again, let’s draw a line here as we’re leaving the early command part. Here’s the
Those mere three lines are actually Rack, and the last line behaves like the
Rails::Server case above, only without the
(Rails) parts. This has subtle but rippling consequences on the order of some things though: see how
wrapped_app is touched at a slightly different time, and
Bundler.require is explicitly called in the Rails sequence, whereas it will be done as part of
require config/environment step in Rack.
Investigating the web server commands startup sequences
As it turns out, each Ruby web server also has a command to start the server by itself. All of them leverage Rack in some way and have similarly simple startup sequences, but the timeline is again important. These commands should be started with
bundle exec so I’ll generally skip that part as we get into it below.
As mentioned before, Puma has both a cluster mode – which forks workers – and a single mode – which relies on threads for concurrency.
Puma doesn’t use
build_app_and_options_from_config, instead reimplementing a part of it to digress between
Puma::Rack::Builder. The boot notably differs because
Puma::Launcher#run is called very early so it cannot be hooked upon, since the app will only be required – and thus
Bundler.require – at the
new_from_string stage. Also, since
Puma::Launcher requires the configuration, we have everything needed super early.
Puma also supports graceful restarts for updates, for which it will fork and exec to a new, clean Ruby process, and for that it will need to setup Bundler again. This is done in
Puma::Launcher#run by manipulating the
RUBYOPT environment variable.
Unicorn and Rainbows
Rainbows extends Unicorn, so their starting process is basically the same. It also embeds no
Rack::Handler by itself, so it’s usually started by its
unicorn command. There is also a
unicorn_rails command but it’s mostly inconsequential regarding startup with modern Rails versions.
Unicorn can only fork, by design. Its startup is superbly small.
So basically Unicorn reimplements
Rack::Builder#parse_file, but otherwise nothing to see here, please move along.
Passenger has a particular behavior: the
passenger start command is merely a remote control for a background process started completely independently, and runs
tail -f on the log files to fake the output. A consequence of that is that you can’t use
binding.pry since the Ruby part is actually not a TTY, making interactive exploration a little bit more challenging.
This background process will start the application in all cases. There are two operation modes, called spawning methods: direct (i.e forking without preloading) and smart (i.e forking with preloading). This is how Passenger enables scaling, dynamically ramping processes up and down.
To this end, the Passenger background process will exec a detached Nginx as well as a
Passenger Agent and a
Passenger Watchdog. This design allows Passenger to present a single frontend port for all applications, independently of the language since Passenger supports more than just Ruby. The
Passenger Agent then forks into a
Passenger Core process, itself forking into the various applications.
At this stage, the spawning method matters and will produce either an
AppLoader process in direct mode — which will load the app upon spawn, then run the app — or an
AppPreloader one in smart mode — which will load the app, then fork and run the app.
AppLoader is implemented in
AppPreloader is implemented in
In both cases, Passenger basically inlines the implementation of
Rack::Builder.parse_file. Also, since the background process is completely independent of the remote control one, Passenger has to call Bundler’s setup, which it does through
Another peculiar point is that Passenger definitely monkey-patches
This comes into play when starting with
rails server. This
Rack::Handler has only one role: calling
system('passenger start'). The consequence is that there is no ultimate difference between the three commands to start Passenger.
A cheerful conclusion
Armed with this detailed behavior, we were able to solve the issue at its root for all mentioned web servers, and more, including Thin and WEBrick. By hooking onto
Rack::Server#server accessor, we were able to reliably detect the web server to be run in a simple, generic way, and by hooking this web server’s
Rack::Handler::<server>#run method, we were able to make special case decisions according to each one’s specific implementation and configuration in a simple, specific way for that web server. The detailed timeline of events guaranteed that those hooks and the
Rack::Builder#to_app one will operate properly in every startup situation. We also improved our integration tests to cover the whole test matrix of web servers, configurations, and startup commands. We therefore properly patched our customer’s issues this time around, and hardened our Ruby agent against further startup issues down the line. And everyone lived happily ever after.
The takeaway of this wondrous adventure into Ruby web servers is that even though Rack has a very simple design, the ecosystem around it can be surprisingly involved! The changes across versions of Rack, Rails, and each web server are quite limited, but the variety of implementations is a testament to the power of Rack’s design and the ingenuity of the community.
We hope you enjoyed this look at the journey we took into Ruby web server startup sequences in order to solve a critical customer issue. If this sounds like the kind of thing you want to work on and be a part of, we’re hiring!
The post Fixing a critical issue: a journey into Ruby web server startup sequences, part two appeared first on Sqreen Blog.