Tuesday, June 24, 2008

Tokenization, Part 3: Functions

Tokenization, Part 3: Functions

Functions

In the last post, we saved the regular expression that we used to tokenize a string to a variable. But it would be more convenient to be able to save the entire tokenization procedure to a variable. Pretty much all programming languages let us save a series of statements or expressions—a function—to evaluate later. How does Clojure do this?

In fact, creating a function looks a lot like creating a variable. First, start Clojure and make sure that token-regex is still defined:

tokenize-str is the name of the function. Functions and variables use the same set of names, so naming a variable tokenize-str will get rid of the function named tokenize-str, and vice versa.

[input-string] is a square-bracket-delimited list of the parameters that this function accepts. In the case of tokenize-str, it takes one argument, named input-string. Expressions inside the function can refer to the value passed into the function using that name.

After you type in that line and hit enter, nothing will happen. The first parenthesis before defn is still open, so the Clojure REPL knows you’re not finished yet. You’ll need to enter the second line to continue.

The second line is just the re-seq function with both arguments as variables, like we used in the last posting. One variable is the regular expression from the previous def, and one is input-string from the function definition.

Functions return the value of their last expression. In this case, that is the function call to re-seq.

Now let’s give it a try:

user=>(tokenize-str"This is a new input string with different tokens.")("This""is""a""new""input""string""with""different""tokens")

Sure enough. Now calling (tokenize-str ...) is the same as calling (re-seq token-regex ...).

Saving Your Work

We’re starting to get enough code that typing it in every time we want to use it would be painful, inefficient, and worst of all, boring. Fortunately, like most other programming language, Clojure lets us save expressions to a file to execute all at once.

To do this, open your text editor and create a new file. Let’s call it word.clj and save it in whatever directory you’re currently working in. Next enter in all the code we’ve entered so far:

The problem is that you're not defining a function. You're defining a set (which can be used as a function, but that's beside the point here). "defn" is used to define functions. "def" is used to define everything else. Change that line to:

Being a bit pedantic here, but if it's immutable, it's not a variable, it's a value. (I think other Clojure documentation uses binding for any form of value-to-symbol assignment, some of which are mutable).