Some relevant O’Reilly books (obviously, I work for O’Reilly, but I learned to program with the famous “animal books”. These are great introductions and you can download them as DRM-free e-books):

Introducing Regular Expressions—Regex, which lets you extract data from formatted text, is one of the most powerful tools in data mining. Google Refine offers plenty of ways to use regular expressions, and it’s also a key feature of scripting languages like Python.

Learning Python—Python is my favorite scripting language: once you’re familiar with it, you’ll find it easy to build small bits of software that can, say, scrape thousands of web pages in a few minutes or turn a spreadsheet into thousands of graphs. And it scales up from there: Python is one of the languages that Google uses to build some huge systems.

R in a Nutshell—R is the most widely-used open-source statistical package, with free add-on libraries for every discipline. I got my start with Stata, and found R difficult to learn, but once you do you’ll find it to be a very powerful tool that can stand in for Python and full-scale database applications for many data-journalism projects.

Advanced resources: at some point you’ll want to use a database to store and analyze data. MySQL is the reigning king of open-source database software, and you can learn it with the aptly-named Learning MySQL. For certain applications, the freer forms of Mongo are more useful, and you might want to take a look at MongoDB and Python, which outlines ways to use Python to fill and analyze a Mongo database.