Dateparser: A little but powerful date parsing library

It was 6 years ago when Zyte (formerly Scrapinghub) released Dateparser, an open source library that parses human-readable dates, and in October 2020 we released version 1.0.0, a very important milestone. In this article, I’d like to introduce this library and share some insights into why it’s so popular.

Dateparser was developed to make date extraction from HTML pages easier. Initially, it was used only by web scraping developers, but later it was quickly adopted by the wider community as well. It has been used for multiple applications like command-line tools, chatbots, etc.

Key features of Dateparser

  • Support for almost every existing date format: absolute dates, relative dates (“two weeks ago” or “tomorrow”), timestamps, etc.
  • Support for more than 200 language locales.
  • Language autodetection
  • Customizable behavior through settings.
  • Support for non-Gregorian calendar systems.
  • Support for dates with timezones abbreviations or UTC offsets (“August 14, 2015 EST”, “21 July 2013 10:15 pm +0500″…)
  • Search dates in longer texts.

Read more about Dateparser on Github.

Dateparser usage in other projects

More than 5.7K projects in GitHub list Dateparser as a dependency. Some examples are:

According to PyPI Download Stats, last month Dateparser was downloaded 1.65M times. That’s 2200+ downloads every hour!

Dateparser also has 1.8K stars on Github and this number keeps growing.

Why Dateparser is so popular

No need to reinvent the wheel

When you are building software, dealing with date strings and parsing them can become very difficult and tedious: it is very hard to foresee all cases and build something that works efficiently. However, you should always look for ways to use existing solutions and avoid reinventing the wheel, and this is where Dateparser comes in handy: Dateparser can convert dates in all sorts of formats and locales to date time objects. 

It would take a lot of time to create something so powerful from scratch, but instead, all you need to do is import Dateparser and use it. 

Simple interface

Using Dateparser is as simple as importing it and calling the `parse()` method:

>>> import dateparser
>>> dateparser.parse('20/11/2021')
datetime.datetime(2021, 11, 20, 0, 0)

Of course, you can do a lot more than this. But you can start using it without any heavy setup and without reading a lot of documentation.

Customization

Even if the interface is really simple, you can handle a lot of nuances by using the settings. You can pick a reference date, select the preferred date order, adjust the timezone configuration, decide how to handle incomplete dates, etc.

Is there something you want to achieve? Take a look at the settings and you will probably find what you were looking for.

Test coverage

Good code comes with automated tests that provide good code coverage. Dateparser has a code coverage of >98%, with examples in many languages.

Other reasons

  1. Few dependencies: Installing a package should never be a pain. Keeping a small set of well-maintained dependencies helps improve the user experience.
  2. Accessible (and complete) documentation: If you need to achieve anything more complex than those common examples, or you need to understand something in-depth, you only need to visit: https://dateparser.readthedocs.io/en/latest/ 
  3. Support: Anything is not working? Are you searching for a new feature? You can open an issue in the Github issues tracker and you will find support directly from the maintainers.

Building and maintaining open source projects is in our DNA, and we have been really proud to see this -little- project growing during the years!

Useful links:

If you want to learn more about Dateparser, a date parsing library, and how to use it in your project check out these links:

Source: Scrapinghub

Leave a Reply

Your email address will not be published.


*