Instagram plays a critical part in forming meaningful communities where people can connect with each other and share what matters most to them. To help best facilitate these connections, we craft our app with high quality sharing experiences that we can take pride in. One way we work hard to improve the Instagram experience is by improving audio quality.
Instagram’s Music Sticker song suggestions for the pop music genre
What is Audio Quality?
Audio quality is a measure of how closely the audio we deliver to Instagram apps matches the original uncompressed audio file. Instagram delivers compressed audio to enable smooth video playback with fewer stalls caused by rebuffers.
However, in exchange for smoother playback, this introduces the risk of compression artifacts. Some examples of compression artifacts are reduced clarity in high frequency sounds, weaker bass, and noise. These differences collectively lower the audio quality perceived by listeners.
Improving Audio Quality
Instagram’s video system has access to multiple levers that affect audio quality. The audio codec selection, sample rate, and bitrate all contribute to the quality of the audio encoding.
Different audio codecs have different levels of lossy compression, and they perform differently on different types of content. With the scale and range of Instagram’s content, it’s important to rigorously evaluate which codecs best fit the content and install metrics to track audio quality. Instead of potentially focusing plenty of engineering time to build an audio quality metric, we pursued the simple solution first and aimed to demonstrate that Instagram listeners cared about audio quality via existing engagement metrics. Changing the audio codec was not the simplest solution, so we decided to keep AAC as our audio codec selection for our audio quality improvement experiment.
Sample rate affects the upper bound of frequencies that our audio encodings can represent correctly. The Nyquist-Shannon Sampling Theorem says that: “A band limited continuous-time signal can be sampled and perfectly reconstructed from its samples if the waveform is sampled over twice as fast as its highest frequency component.” Instagram uses an industry standard 44.1kHz sample rate, more than enough to convey the 20kHz max that most people can hear, so we ruled out sample rate as a variable worth changing.
Bitrate, measured as kilobits per second (kbps), varies linearly with the number of bits in the audio file. In other words, a higher bitrate means more data and less compression in the audio encoding. This allows the compressed audio encoding to retain more features of the original audio file with fewer compression artifacts. When the bitrate is too low, the encoder removes audio details that it considers less important. Since we kept the audio codec and sample rate constant, and bitrate was simple to change, we chose to vary the bitrate in our audio quality improvement experiment.
The Bitrate Experiment
Prior to our audio quality improvement efforts, Instagram’s default bitrate for audio in videos was 64kbps. The microphone on a phone doesn’t produce a rich audio signal, so despite the low bitrate, Instagram’s audio compression performed well for most content. However, as Instagram creators started posting studio-produced audio content (e.g. music recordings), it became clear that 64kbps was not sufficient for delivering high quality audio.
We received reports that Instagram’s audio sounded “blown out” or too low quality for artists to want to share certain songs on Instagram. When we tested the Instagram app, we observed common compression artifacts. For example, in Instagram’s Music Sticker Stories, we noticed that the compressed audio for snare drums, cymbals, voice, and reverb sounded drier and thinner than they did in the original recordings.
We unfortunately can’t simply increase bitrate for all content. We need to split bandwidth between audio and video because of limited overall bandwidth, so this is a zero-sum game. High quality video has a bitrate so high that the difference between 64kbps and 128kbps audio has a negligible impact on playback rebuffers. However, in low bandwidth situations we serve video at much lower bitrates. In these situations, a difference of 64kbps can be substantial in the playback experience.
While we can increase the audio bitrate, we must weigh the tradeoffs between audio quality and video quality. Increasing this bitrate for all content is particularly risky, since we know that most content has simple audio and will not benefit from the audio side of the tradeoff. In our experiment, we aimed to make the right quality tradeoff for the right content.
Content and Community Specific Quality Preferences
To find the strongest signal on Instagram listeners’ preferences for audio quality, we considered ways to focus our audio quality improvements. From our previous experiments on visual quality, we knew that quality of experience is subjective and unique to content type and community type.
Audio quality sensitivity depends on each listener’s attention to audio details and the quality of the playback speaker (e.g. the device’s default external speaker or headphones). We worried that some Instagram listeners with low-end mobile phone speakers may not focus on general audio quality. Musicians, on the other hand, know Instagram as a platform where they can create music communities, so we suspected that many Instagram listeners would be sensitive to music audio quality.
We expected to see the strongest correlations between audio quality and engagement in Instagram’s music content where the audio frequency range is wide and full. To obtain this signal, we ran a targeted audio quality improvement test on the product where we expected audio quality to make the biggest impact: Music Sticker Stories.
a music sticker that plays a song by Relient K
Music Sticker Stories Experiment
To avoid diluted results from non-music content, we leveraged Instagram’s video and audio encoding tag system to zoom in on Stories audio encodings in the A/B test. All audio encodings in the control group used our default 64kbps bitrate. We ran two test groups: one group where the audio encodings used a 96kbps bitrate and another group where the audio encodings used a 128kbps bitrate.
In the experiment results, we saw clear engagement wins from improved audio quality in Music Sticker Stories. The 128kbps test group delivered the best results. We measure video engagement by watch time (i.e., time spent watching videos) and view percent (i.e., the percentage of a video a viewer finishes watching). Both watch time and view percent improved despite regressions in visual quality and rebuffers.
We expected the regressions in visual quality and rebuffers because we shifted our bandwidth usage from video to audio. However, the engagement metric wins exceeded our expectations. These metrics demonstrated that Instagram viewers are more willing to watch complete Music Sticker Stories videos even with playback performance regressions because the audio quality is better.
Increasing the audio bitrate for Music Sticker Stories is only the beginning of delivering a personalized video quality of experience to the Instagram community. To help us make the right tradeoffs between audio quality, visual quality, and smooth playback, we are considering future plans to build bandwidth aware audio ABR (i.e., adaptive bitrate) and content identification (i.e., identifying which video content has music).
Many thanks to my great team members: Donald Chen, Haixia Shi, Chris Ellsworth, Bill Phillips, Mackenzie Pearson, who helped to make this happen.
Donald Chen (Android) and Chris Hsu (Server) are software engineers on Instagram Media Infrastructure team.