Use Bash to quickly sort, search, match, replace, clean and optimise various aspect of your data, and you wouldn’t need to go through any tough learning curves.

Updates
————
16/03/2017 – This course will be available for FREE until 19 Mar 2017. Take the chance to enroll now and get free updated course contents.
This is an entry level short course specifically designed to show you how to use Bash commands and shell programming to handle small to large textual data at the command line. The course is based on my book “Hello Big Data @ Bash” (published at leanpub/hellobigdata) which has also been offered for FREE in the supporting material section.
This is a course where you learn by doing projects.
However, you need to understand the fact that Bash may not the best way to handle all kinds of data! But there often comes a time when you are provided with a pure Bash environment, such as what you get in the common Linux based Super-computers and you just want an early result or view of the data before you drive into the real programming, using Python, R and SQL, SPSS, and so on. Expertise in these data-intensive languages also comes at the price of spending a lot of time on them.
In contrast, bash scripting is simple, easy to learn and perfect for mining textual data! Particularly if you deal with genomics, microarrays, social networks, life sciences, and so on. It can help you to quickly sort, search, match, replace, clean and optimise various aspect of your data, and you wouldn’t need to go through any tough learning curves. We strongly believe, learning and using Bash shell scripting should be the first step if you want to say, Hello Big Data!
This course starts with some practical bash-based flat file data mining projects involving:

If you haven’t used Bash before, feel free to skip the projects and get to the tutorials part (supporting materials: eBook). Read the tutorials and then come back to the projects again. The tutorial section will introduce with bash scripting, regular expressions, AWK, sed, grep and so on. Finally, it gives you a concise beginner friendly guide to the big data landscape including an overview of the critical Big Data tools such as HDFS, MapReduce, YARN, Flume, Hive and more. The course finishes with a near-complete list of references to all the relevant command line and Big data tools.

Ahmed Arefin, PhD is an enthusiastic computer programmer with more than a decade of well-rounded computational experience. He likes to code, but loves to write, research and teach. Following a PhD and Postdoc research in the area of data-parallelism he moved forward to become a Scientific Computing professional, keeping his research interests on, in the area of parallel, distributed and accelerated computing.
In his day job, he pets a few of the world’s fastest T500 supercomputers at a large Australian agency for scientific research.