Egnyte Architecture: Lessons learned in building and scaling a multi petabyte content platform


This is a guest post by Kalpesh Patel, an Engineer, who for  Egnyte from home. Egnyte is a Secure Content Platform built specifically for businesses. He and his colleagues spend their productive hours scaling large distributed file systems. You can reach him at @kpatelwork.


Your Laptop has a filesystem used by hundreds of processes. There are a couple of downsides though in case you are looking to use it to support tens of thousands of users working on hundreds of millions of files simultaneously containing petabytes of data. It is limited by the disk space; it can’t expand storage elastically; it chokes if you run few I/O intensive processes or try collaborating  with 100 other users. Let’s take this problem and transform it to a cloud-native file system used by millions of paid users spread across the globe and you get an idea of our roller coaster ride of scaling this system to meet monthly growth and SLA requirements while providing stringent consistency and durability characteristics we all have come to expect from our laptops. 

Egnyte is a secure Content Collaboration and Data Governance platform, founded in 2007 when Google drive wasn’t born and AWS S3 was cost-prohibitive. Our only option was to roll up our sleeves and build basic cloud file system components such as object store ourselves. Over time, costs for S3 and GCS became reasonable and with Egnyte’s storage plugin architecture, our customers can now bring in any storage backend of their choice. To help our customers manage ongoing data explosion, we have designed many of the core components over the last few years. In this article, I will share the current architecture and some of the lessons we learned scaling it along with some of the things we are looking to improve upon in the near future.

Egnyte Connect Platform

Source: High Scalability