Search by Job Title

Distributed Crawling EngineerPriceonomics

We look through tons of data to build price reports and want you to help us look through even more.

We index 2 million pages per day and use the results to help inform smart purchase decisions. We're looking for someone who wants to own this process, scale it up, and make it harder better faster stronger.

Responsibilities
Manage our existing distributed web crawling infrastructure
Monitor performance using statsd and visualize using graphite
Discover and eliminate bottlenecks to make the system go as fast as possible
Build web crawlers to discover and index websites
Deploy crawlers across many (20+) servers in an automated fashion

Technologies
Python: Tornado and Celery for our backend infrastructure
lxml for dom parsing. We used BeautifulSoup for a while but it became a bottleneck :(.
Amazon Web Services (EC2, S3)
Chef or Puppet, Fabric.
JS / Coffeescript
Statsd & Graphite experiences is a plus.

Apply
Email to omar+jobs-crawling@priceonomics.com. In your message:

Tell us about the most impressive python project you've built. Links to code would be great but we understand if you can't share.
Include a link to your github profile. If you don't have any open projects listed, include a python project you can share, along with an explanation of what it does and why that's significant.