Building Personalisation at Scale

Building Personalisation at Scale: One user at a time and all at once!

15 billion user events a quarter and counting

I have talked about how we have been building a technology first company in my previous post. Taking this further, I am back with the next level of innovation we have been doing to make our customer journey as satisfying as its for us to serve them!

At Grofers, our marketing and product teams have been working hard at delivering user experiences and propositions that are valuable to our customers and at the same time viable for our business. Balancing the two is an extremely delicate balance to achieve. One obvious way to achieve this is to personalise a user’s experience on and off the app through tailored content, recommendations, messages and notifications.

While our marketing and product teams had been doing this in a semi-automated way, we soon realised that in order to truly achieve personalisation at scale, we will have to solve two problems:

  1. Engineering: create an end-to-end integrated marketing technology (martech) or growth stack.
  2. Organisational: create a shared view of personalisation and everyone speaking the same language when it comes to it.

The ways to personalise a user’s experience are endless. There is indeed an amazing diversity of customers in terms of needs, preferences, expectations and past experiences. Taking this diversity into account is essential to send relevant communications, deliver valuable experiences and improve customer retention.

In this blog post, we will focus on engineering aspects and try to share our experience on this journey, our martech pipeline and what the future looks like for us.

Tenets of Personalisation

For any organisation planning to encompass on this journey there are 5 key tenets to consider and strengthen before picking to solve this problem.

  1. Data
  2. Analytics & ML
  3. Content Pipelines
  4. Automation
  5. Experimentation


Very soon during our startup journey we realised that data will be the most powerful weapon in our arsenal right behind our people. What started as simply trying to understand what users are doing on the platform by tracking user actions as events, soon ballooned into million events. We are right now at 11+ billion events a quarter and counting.

Our customer data was distributed across separate, disconnected systems typically managed by different stakeholders. Marketing, Customer Success, Product all viewed customers as differently (and rightly so) because all of them saw a side of the customer. We soon realised that this was a huge problem as everyone had a very different view of the user depending on which team you talk to and what that user wanted.

Single View of the Customer (SVC) is the first step to personalisation. Data consolidation using a customer data platform (CDP) becomes key for this.

In 2019, we took a long hard journey to rebuild our APP analytics platforms and have a robust CDP in our data ecosystem. We chose and have been using it to collect, unify and manage all user generated data events. CDP becomes a source of user information for all of other systems which use this information to personalise user experiences. Typical use-cases being creating segments/audiences of users based on their traits/properties and curating on and off journeys for an immersive shopping experience.

Along with our customer data which lives in the CDP now, we have a lot of other sources of data which augment the user, order and overall end to end order fulfilment experience; like NPS data from vendors, Customer Center data, source data from production databases and so on.

Once the CDP was done, the next big problem to solve was to bring all of the data through all the sources possible in one place. This is a fairly common problem across all organisations that deal with a huge number of customers. While having vendors makes disseminating the data and using the information very easy, having different tools like CDPs and CRM further adds to the problem when it comes to any kind of analysis.

Welcome Source Replication Pipeline! Our in house replication pipeline helps in bringing all of this supermassive data (source databases, events from and other systems) at one place — our HUDI lake & our AWS Redshift Warehouse.

Analytics & ML

Having data in one single source is extremely powerful but unless you can use that data to draw actionable insights and patterns it doesn’t serve much purpose. In order to utilise the abundance of data we have, we built equally strong analytical tools and machine learning models to act on them. Our data science teams use the consolidated data to create insights which are used to power different products.

Models are either built using SparkML or Tensorflow and deployed through automated pipelines to our production. ML jobs identify product associations as well as the buying patterns of our customers to recommend products accordingly to them. Some of our top used widgets on the main app feed like Recommend For You are powered through these jobs.

The best part of the process, aggregated data from these jobs is Reverse ETL back to cascading it to systems that put together on and off app experiences for our customers. The beautiful circle of constant feedback and learning is thus complete!


The next and more or less obvious tenet in the personalisation journey is pipelines.

Easy way to explain pipelines is the plumbing required to move the data and insights from source to destinations.

For our marketing and product teams, the source is the CDP and our job was to build pipelines that work disseminate data into systems which put together experiences for the end user. We power our promotions engine, our content management system and campaigns, and re-engagement systems through these pipelines.

There are different ways in which pipelines are built ranging from out of box vendor integrations, custom development through data dumps and API integrations to name a few. Teams are easily able to flow the audience created in CDP into these systems and ensure all communication is personalised.

While we have built a source to destination pipeline which ensures all systems see the exact same user data, the future that we embark on is building interconnected pipelines.

More about our pipelines has been explained in depth here


Penultimate step in scaling personalisation is auto production of experiences based on data informed content. Our content and experience production systems are currently in a semi-automated state. Our marketing and product teams utilise the templates, layouts and widgets and personalise the content for our users based on data we have. While this works well at a small scale, the scale that we aspire to be at now requires automation of content production, short real time feedback cycles and run-time experience optimisations.

We currently automate content eviction and sorting based on user preference, impressions and click through rates at different points in the user’s buying journey. This needs to evolve in an end-to-end journey optimisation next.


The final and most important step in this personalisation journey is Experimentation. For Grofers it means rapid learning. What we learn from our data, the patterns we see, insights we get always leads to an experiment. We experiment a lot and run multiple A/B tests at times many throughout the day. While this helps us to learn a lot about our customers and what they want and like, this process is semi-automated right now.

The final step in scaling our personalisation journey is building an experimentation platform which empowers our teams to experiment and measure and visualise metrics impact easily.

Bringing it all together

All these tenets form the foundation for a strong scalable personalisation platform. While we experiment with many plug and play personalisation services (build vs buy game) along with our ML models, we are building our own personalisation service now which utilise all of this ground work, connected pipelines and then curate cross journey experiences for our customers. Stay tuned for learnings as they come.

Meanwhile here is a little something around our systems, pipelines and channels to munch on while we come back with more!

Data Sources, Pipelines and Channels

Devika Razdan is Director of Product Engineering at Grofers. Follow her on Twitter.

I want to thank Satyam Krishna for partnering with me on this blog and being the kickass Data Engineering leader that he is!

Satyam Krishna is Engineering Manager in Data Engineering at Grofers. Follow him on Twitter.

Thanks for reading Lambda.

Say hello on Twitter or follow us on LinkedIn.

We’re hiring!

We are hiring across various roles! If you are interested in exploring working at Grofers, we’d love to hear from you. You can either apply on LinkedIn or directly reach out to the author on Twitter or LinkedIn.

Building Personalisation at Scale was originally published in Lambda on Medium, where people are continuing the conversation by highlighting and responding to this story.

Source: Grofers

Leave a Reply

Your email address will not be published.