Data Science at OLX

Improving customer experience for millions of OLX users

Data science is an important topic for OLX. It helps to grow our business, makes our customers happier, and makes the user experience better and safer. We started to use it more than five years ago, and now more than 40 machine learning services are running in production and positively affecting our customers.

In this article, we’ll talk about different areas where we use data science at OLX. We’ll also cover some of these areas in more details:

  • Moderation (trust and safety)
  • Search and recommendations
  • Seller experience

Data science at OLX

OLX is a two-sided marketplace: on one side we have sellers, and on the other — buyers. We want to make it easier for them to find each other, and to ensure that the interaction between them is safe and convenient.

Machine learning plays an important role here. We use it across the organization in many strategically important areas:

  • Trust and safety to make the interaction between our users safe
  • Search and recommendations to help our buyers find what they are looking for
  • Seller experience to make it easier to sell on OLX
  • Verticals to cater to the specific needs of our users in cars, real estate and jobs categories
  • Monetization to make it more profitable for our users and for us

And other categories.

More than half of the models we use in production are in the trust and safety and search and recommendations areas

To identify projects and areas where we want to invest our time, we consider the following factors:

  • Strategic importance. We want to prepare the stage for what will be important tomorrow. For example, online payments and deliveries are important areas, so we want to work closely with these departments.
  • Estimated business impact. If we know that some applications were successful for other companies in the same or related industry, we also want to evaluate them.
  • Availability of data. Without good data, we won’t be able to have good models. But we also want to set the ML flywheel in motion for as many use cases as possible: this means getting data to create ML solutions that solve real user problems, therefore having more users attracted to our products and generating data to train even better models. One way of doing that in practice is to start with use cases where “good enough” accuracy is already acceptable.

In the next section, we’ll talk about the major areas where we use data science. The first applications of machine learning at OLX were in moderation, so we’ll start with it.

Trust and safety: moderation

Every day millions of new ads are created at OLX. Unfortunately, some of this content is offensive or harmful, and we cannot allow it to go live. To stop it before anyone sees it, we have a moderation system. We heavily rely on machine learning to help us identify these problematic listings and remove them before they do any harm.

As a part of the moderation system, we have many models, including:

  • NSFW model — to detect explicit nudity
  • Forbidden items model — to detect weapons and other items that are forbidden to sell
  • Duplicate detection system — to fight duplicated listings
  • Chat moderation system — to make sure the communication between buyers and sellers is safe and pleasant
  • Fraud detection model — to detect rings of fraudsters and ban them
Our automatic moderation system contains many components with machine learning models inside.

The moderation system is quite complex, and it includes many components. You can read more about moderation in the article about our duplicate detection system.

From the most recent projects in moderation, our fraud detection system is among the most impactful ones. It detects organized crime rings that perform fraudulent activities at scale. You can learn about it here: detecting fraud rings with unsupervised learning.

Moderation is not the only area at OLX where data science makes a significant impact. Next, we’ll talk about two other areas where it makes a difference: search and recommendations.

Search and recommendations

At the moment, there are 19 million active listings at Even with the best possible categorization of items, it would be still difficult to find interesting items without a good search engine. Search is a very important part of our product.

Search on

Each buyer at OLX is unique and has their own interests. To make our search more effective, we need to tailor it to each user individually and take into account their preferences. That’s what our data scientists do in the search team. We call this project “personalized ranking”.

There are more projects where we use data science to improve our search:

  • Search2vec — reducing the number of null-searchers: for queries that return no results, we want to show something relevant from similar queries
  • Query categorization — understanding the intent of the user and showing a category
  • Spell checking — dealing with typos

You can read more about our models for search here.

In addition to search, there’s another way to make the experience of buyers more personalized: a recommendation system. If you want to know more about our recommender service for jobs, check here.

Search and recommendations help our buyers discover items they will love. But the experience of our sellers is also very important for us, and that’s another area where machine learning helps.

Seller experience

We want to make it easy for our users to sell on OLX. To simplify the process of creating a listing, we use the category prediction model: it looks at the title of a listing and determines the best category.

We also want to help our sellers create great ads. For that, we analyze each listing and determine if there are any areas for improvement. We look at:

  • The number of images it has. The more images, the better.
  • The quality of each image. If images are blurry or too dark, it makes the listing less attractive.
  • The length of the title and the description. We want to make sure that they contain enough information.

If we think that a listing could be improved, we give actionable recommendations: add more images, replace blurry images, and other suggestions.

You can read more about image quality models in our article about infrastructure for serving deep learning models.

We also help sellers of by suggesting the optimal price. Determining the right price is quite complex: users need to do a lot of research and analysis to come up with the right price. We make it easier for them with a machine learning model.

The output of our price evaluation model on

There are many other areas where we use machine learning for positively affecting the experience of our users. To learn more about the work we do at OLX, check other posts from our engineering blog.


  • At OLX, we use data science extensively: we started using it more than five years ago and now we have more than 40 machine learning services in production.
  • To identify potential machine learning projects, we take into account three aspects: strategic importance, estimated business impact, and availability of data.
  • In moderation, we use machine learning for detecting explicit nudity, fraud, forbidden items, duplicates, and spam. Also, we use it for chat moderation.
  • Search and recommendations help buyers find what they want. We do it by making search more personalized, reducing the number of queries with no results, and suggesting items similar to what you like.
  • To make it easier to sell at OLX, we suggest the best category for a new ad and analyze its quality. We also help the sellers at to find the best price range for their cars.

If you liked what you read and want to become a part of our team — check our open positions. Find out more at

This post is based on a presentation “Data Science at OLX”

This article is written by Alexey Grigorev and Andreas Merentitis.

Data Science at OLX was originally published in OLX Group Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.

Source: OLX

Leave a Reply

Your email address will not be published.