Notice how the strings we’re searching for contains % – this is a wildcard character. This is essentially saying “find deals where the title contains “Yoga”. Learn more about querying Postgres.

Hook up to a Cron job

It’d be pretty annoying if we had to manually run this script regularly. This is where cron jobs come in.

We’ll first create a bash script that simulates what we would do if we were to run scrapy manually. There is a sample one in the new-coder/scrape/living_social/ directory called scrape.sh. Edit the bash script to where your (ScrapeProj) virtualenv is as well as where your scraper root directory is (where the scrapy.cfg file lies).

Next, within your terminal, type:

1

$ crontab -e

to edit your crontab file. This opens up the editor for your cron tab. Add a line:

0 13 * * * sh ~/Projects/new-coder/scrape/living_social/scrapy.sh

This says that ever day at hour 13 (1pm, relative to your local machine time), run the scrapy.sh script. To schedule your cron job at a different time, check out Wiki’s overview.

NOTE

The cron job will run automatically for whenever you schedule it to run (in this example, daily at 1pm).

But! It will only run when your computer is on (not hibernate/sleep or powered off), and in particular with this script, connected to the internet.

To run the cron job regardless of the state your computer is in, you would host the scraper code + the bash script and cron job on a separate server (either your own, or a company’s) that will always be ‘on’. You can host your own cron jobs on OpenShift.