Data Scientist at Yahoo!

Menu

Scheduling R scripts to run on a regular basis

Recently I was working on a project with a friend of mine to scrape some data from a website. However, we needed to scrape the data on a daily basis. Obviously, we wouldn’t run the script manually every day. I was aware that cron could do the job, although I had never used it before.

cron is a time-based job scheduler in Unix-like computer operating systems. You can use it to schedule jobs, which includes R scripts for example, on a regular basis. And it turns out to be incredibly easy to setup. By coincidence, the next day I realized I had to use cron for my task I ended up reading a nice post about Scheduling R Tasks with Crontabs to Conserve Memory.

In addition to explaining that scheduling R tasks with cron can help you conserve memory, since running repeated R tasks with cron is equivalent to opening and closing an R session every time the task is executed, that post provided a nice summary on how to set it up, which I summarize below:

sudo apt-get install gnome-schedule # install
sudo crontab -e # If you have root powers
crontab -u yourusername -e # If you want to run
# for a specific user

After that a crontab file will open to which you can add a command with the following form:

MIN HOUR DOM MON DOW CMD

where the meaning of the letters can be found on the table below that I have borrowed from this useful 15 Awesome Cron Job Examples blog post.

Table: Crontab Fields and Allowed Ranges (Linux Crontab Syntax)

Field

Description

Allowed Value

MIN

Minute field

0 to 59

HOUR

Hour field

0 to 23

DOM

Day of Month

1-31

MON

Month field

1-12

DOW

Day Of Week

0-6

CMD

Command

Any command to be executed.

So, to run the R script filePath.R at 23:15 for every day of the year we should add to the crontab file the following line:

15 23 * * * Rscript filePath.R

Check out 15 Awesome Cron Job Examples if you need more elaborate scheduling like every weekday during working hours, every 5 minutes and so on.