You can press Enter to browse the file line-by-line, or Space to go through it by screen. less is similar, but you can use the Up and Down arrow keys to go up and down, and the Page Up and Page Down keys to go up/down much faster. Once you reach the end of the file, you have to press Q to quit.

You can also view the first or last several lines of a file with the head and tail commands. Say you just want to get the first four lines of a file.

me@blogclub:~/sandbox/tworkshop/data$ head -4 prettyExample.json
{
"contributors": null,
"truncated": false,
"text": "TeeMinus24's Shirt of the Day is Palpatine/Vader '12. Support the Sith. Change you can't stop. http://t.co/wFh1cCep",

What if you want to display the file all at once without stopping? You can use the cat command. Let’s try it with the example.json file.

me@blogclub:~/sandbox/tworkshop/data$ cat example.json
{"possibly_sensitive_editable":true,"text":"TeeMinus24's Shirt of the Day is Palpatine\/Vader '12. Support the Sith. Change you can't stop. http:\/\/t.co\/wFh1cCep","id_str":"175090352598945794","entities":{"urls":[{"indices":[95,115],"expanded_url":"http:\/\/fb.me\/1isEdQJSq","display_url":"fb.me\/1isEdQJSq","url":"http:\/\/t.co\/wFh1cCep"}],"hashtags":[],"user_mentions":[]},"retweeted":false,"place":null,"retweet_count":0,"in_reply_to_status_id_str":null,"coordinates":null,"source":"\u003Ca href=\"http:\/\/www.facebook.com\/twitter\" rel=\"nofollow\"\u003EFacebook\u003C\/a\u003E","in_reply_to_user_id_str":null,"in_reply_to_status_id":null,"favorited":false,"geo":null,"in_reply_to_screen_name":null,"in_reply_to_user_id":null,"truncated":false,"created_at":"Thu Mar 01 05:29:27 +0000 2012","possibly_sensitive":false,"contributors":null,"user":{"geo_enabled":false,"profile_link_color":"009999","id_str":"281077639","listed_count":1,"lang":"en","notifications":null,"location":"","is_translator":false,"follow_request_sent":null,"statuses_count":461,"profile_background_color":"131516","followers_count":43,"profile_image_url":"http:\/\/a0.twimg.com\/profile_images\/1428484273\/TeeMinus24_logo_normal.jpg","default_profile":false,"profile_background_tile":true,"description":"We are a limited edition t-shirt company. We make tees that are designed for the fan; movies, television shows, video games, sci-fi, web, and tech. We have it!","following":null,"profile_sidebar_fill_color":"efefef","contributors_enabled":false,"profile_background_image_url_https":"https:\/\/si0.twimg.com\/images\/themes\/theme14\/bg.gif","verified":false,"profile_sidebar_border_color":"eeeeee","profile_image_url_https":"https:\/\/si0.twimg.com\/profile_images\/1428484273\/TeeMinus24_logo_normal.jpg","default_profile_image":false,"protected":false,"show_all_inline_media":false,"profile_use_background_image":true,"favourites_count":0,"created_at":"Tue Apr 12 15:48:23 +0000 2011","name":"Vincent Genovese","friends_count":52,"profile_text_color":"333333","url":"http:\/\/www.teeminus24.com","id":281077639,"profile_background_image_url":"http:\/\/a0.twimg.com\/images\/themes\/theme14\/bg.gif","time_zone":"Eastern Time (US & Canada)","utc_offset":-18000,"screen_name":"TeeMinus24"},"id":175090352598945794}

It’ll look different on your screen — since the file is just one long line, it’ll wrap to the width of your window.

You may ask why anyone would ever want to dump the output of a whole file to the Terminal. Well, sometimes you know the file is small. But sometimes you want to use one file as the input to another file. I’ll get to that below when we look at file redirection and pipes. For now, let’s go to searching through files.

Searching through files

Say you have a big file and you want to know some basic characteristics about it. Does it contain a specific word? How many lines and words are in it?

There’s a number of UNIX commands that can accomplish this. For this lesson we’re just going to focus on the grep and wc commands. grep is a very powerful program that matches a pattern to any parts of the file that match it.

Here’s a basic example. Since we know from the last lesson that the line which has the actual tweet is called “text”, we can get the tweet with grep.

me@blogclub:~/sandbox/tworkshop/data$ grep text prettyExample.json
"text": "TeeMinus24's Shirt of the Day is Palpatine/Vader '12. Support the Sith. Change you can't stop. http://t.co/wFh1cCep",
"profile_text_color": "333333",

Note that it got all of the lines that had “text” in them. So if the user’s name was “UNIXtextwizard”, you would pick up that part of the file as well.

How about the inverse, i.e. we want everything BUT lines with “text” in them. We would just use grep with the -v option to get the inverse. Try this on your own.

The next command is wc. This tells us some descriptives about the file.

You may have noticed that there are a lot of options for each of these commands. I’m just showing you the bare minimum of what you can do with all of this stuff. If you ever want to know more about what these commands can do, you can use the manual file for the command. Type man [command], where [command] is the command which you want to see more information for. Try man grep. The man command uses less to display the manual file, so you can browse the file in the exact same way you used less above. If you browse forums enough for information on how to do something technical, you’ll often run into the phrase “RTFM”, which means “Read The F%&* Manual.” It’s good advice that I’d highly recommend. 🙂

I/O Redirection

I/O redirection is a way to redirect where the input or output of a file or command goes. In UNIX, there’s three I/O “streams” that act basically line files. One is called stdin (pronounced standard in), stdout, standard out, and stderr, or standard error (not to be confused with the statistical concept). That’s probably too much information for now, but it helps to contextualize what is happening when we use certain commands.

The easier one to understand conceptually is output redirection. Say you ran a really cool search and you want to keep the results somewhere. Let’s go with the grep from above.

You may be asking, but how’s that different from above? Well, grep expects a file by default. But there are commands that just take anything from the standard input stream (like from the keyboard) and not necessarily a file.

Finally, we can chain these two together. Let’s look for a different thing in grep, like “name”.

Note that I/O redirection with > will overwrite the previous file. If you want to append to an existing file, use >>.

Finally, the coolest kind of I/O redirection is with what are called pipes. Pipes take the output of command and use it as the input for another. They are represented by the | character, which probably the same key as \ on your keyboard.

Say you want to get the number of lines that contain the word “profile” in prettyExample.json. You can use a pipe to connect grep and wc. Check it.

Whoa! Awesome. We can get more complicated and chain a number of them together. For example, let’s find out how many user directories have the character “j” in them.

me@blogclub:~/sandbox/tworkshop/data$ ls -l /home | grep j | wc -l
3

If you recall, ls -l will print the directory listing by line. So we can treat it as an input that lists things by line for grep.

For an exercise, how would you merge the two files you created above (nameMentions.txt and textMentions.txt) into one file called nameTextMentions.txt?

Once you get the hang of pipes and I/O redirection, you will be using them all the time.

Editing and Writing Files

Now we get to the exciting part. This is where we start editing code.

First thing’s first — get a text editor. Go to http://www.jedit.org/ and download jEdit. jEdit is a free, open-source text editor written in Java. It has a ton of cool features and makes editing files on remote servers a snap. Which is why we are using it.

Once you have downloaded it and installed it, go to Plugins->Plugins Manager in the menu. You should get a screen that looks like this.

Click the “Install” tab and find the “FTP” plugin. Select the checkbox and click install. Once it installs, close the window. Now, select Plugin->FTP->Open from Secure FTP Server… from the menu. Type in all your information so looks like the screenshot below.

Now, once it loads, navigate to the “data” directory like you would with a GUI file manager. Open up prettyExample.json.

Once you’ve opened the file, it should be displayed just like this.

Pretty cool, huh? Now leave this file alone — we don’t want to mess with it. Close the file. Now you have a blank file. We’re going to start coding. In Python.

However, before that, I need to give a little background on Python. Python is an interpreted script language. This is different from languages like Java or C, which are traditionally compiled into a language that the computer can read directly. Python is different. It reads files like a script — line-by-line and executing commands in a procedural fashion.

So you are getting a prompt here, just like you do with the UNIX Terminal. Let’s do the first thing you learn in every programming language: print “Hello World”.

>>> print "Hello World"
Hello World

It worked! Hopefully. Let’s get out of here. Press Ctrl+D to exit. Let’s go back to jEdit. Now type the same thing into jEdit. Now, in the menu go to Plugins->FTP->Save to FTP Server…. Type in your information like before, then navigate to ~/sandbox/tworkshops/bin. In the file name box at the bottom of the dialogue, call the file hello.py. Click Save.

Finally, go back to the Terminal and get to the ~/sandbox/tworkshops/bin directory. (Protip — if you’re in the ~/sandbox/tworkshops/data directory you can just type cd ../bin).

Now we get to run the program from the file.

me@blogclub:~/sandbox/tworkshop/bin$ python hello.py
Hello World

I’ll leave it there for now. If you want to jump ahead and try stuff on your own (you know, RTFM), the Python documentation is at http://docs.python.org/. We’ll get into some of the finer points of Python in the next lesson.