Zyte Developers Community newsletter issue #3

Hi there,

If you are not signed up already for the Zyte Developers Community newsletter, you can sign up here.

In this issue:

  • Scrapy 2.5.0 is out
  • Recipe scraping app (with source code)
  • Web scraping in Elixir
  • Easy table scraping with R

Scrapy 2.5.0 is out

The first new Scrapy release of the year is here!

Highlights:
– Official Python 3.9 support
– Experimental HTTP/2 support
– New get_retry_request() function to retry requests from spider callbacks
– New headers_received signal that allows stopping downloads early
– New Response.protocol attribute

Release notes here.

Recipe scraping app

@mango_mero – as part of the #100DaysOfCode challenge –  created an awesome django demo app which scrapes recipe information real-time, using beautifulsoup. Source code is available on Github.

Web scraping in Elixir

If you are using Elixir for web dev, and considering a web scraping project, you might want to check out this framework: Crawly, a high-level web crawling & scraping framework for Elixir. Check out the documentation and the quickstart guide.

Easy table scraping with R

Extracting data from HTML tables can be messy. For one-off jobs though, there’s an easy alternative. If you’re using R Studio, there’s an addin which makes it easy to scrape tables: datapasta. You literally just copy the table from the page, paste it into the tool and you get the data in structured form. Here’s a tutorial video.

Source: Scrapinghub

Leave a Reply

Your email address will not be published.


*