Extracting data from the Internet with Scrapy

While exposing data to developers through API is getting more typical, most of the data found on the internet is only available through raw HTML, often mixed in seemingly chaotic tags. This talk aims to be a quick introduction for the data scientist to politely extract data from a website and store it in a structured database with the help of the Python library Scrapy, and how one might extend it to fits their specific needs.

Israël Hallé

Flare Systems Inc.

Israël Hallé has a B.Eng. from the École de Technologie Supérieure (E.T.S.). He worked as a developer on the Merchant Protection and Checkout teams at Shopify. He also did malware analysis and reverse engineering contracting work for Google on their Safe browsing team. He is now working full-time developing the technology that powers Flare Systems. Israël has organized exploitation workshops at E.T.S. and at the NorthSec conference.