2013-12-29

One of the classical problems in Natural Language Processing is
parts-of-speech tagging i.e. given a sentence of length n, find the
sequence of part-of-speech tags for each word in the sentence. For
example,

$$\text{POS(“the”,“man”,“walks”) = (“article”,“noun”,“verb”)}$$

There are several approaches to parts-of-speech tagging, and I’ve
managed to build a fairly simple tagger based on something called
Hidden Markov Models.
There are several resources explaining HMMs out there already so I
won’t try explaining that in detail here.

Basically what you need to know is that we model the tag sequence to
be a first-order Markov process. This means that the likelihood of a
tag occurring is determined by what the previous tag was. So we’re
going to have to obtain a map that looks like this:

The probability of a verb following a noun is 0.7, an article
following a noun is 0.1, and so on. These are called the transition
probabilities.

The next thing we need to obtain is called the emission
probabilities, which is the probability of a tag emitting a
word. When represented in a Clojure map, this looks something like:

{:noun{"man"0.04"samrat"0.001...}:verb{"walks"0.002}...};; and so on

The Brown tagged corpus

To obtain these two maps, we will use the Brown corpus which is a
pretty substantial collection of sentences tagged painstakingly by
humans way back in the 60s. A sentence in the Brown corpus looks like
this:

tag-space is the set of all possible tags. The "START" tag is
something the code producing brown.counts adds to the beginning of
sentences. It is useful to have that because it makes the job of
calculating the transition probability of the first
tag($t(\text{START}\Rightarrow{t_1})$, which is really called the initial state
probability) easier.

With this we can obtain the transition and emission probabilities we
wanted:

However, what we want to find is the most likely tag sequence, which
is simply the tag sequence for which
$P(t_1,t_2,..t_n\vert{x_1,x_2,..x_n})$ is the maximum. Now, if given a
sentence, an index(posn) and the tag at posn, the following
function gives the probability of the most likely tag sequence upto
posn:

2013-08-14

I recently found myself having to play around with some stock
exchange data. The stock exchange in Nepal, unsurprisingly, doesn't
provide a data API so I had to scrape their website. The non-realtime
data isn't very interesting, just regular old scraping made a little
more tedious by the fact that whoever designed the website didn't
know about how to use HTML id attributes.

Now, to the live trading data. For the live data, the website shows a
ticker of stock prices, which I think is a really bad representation
of the data. If you want to know at what price ZXY was traded at, you
have to wait till the end of the ticker. If the ZXY stock was all you
were interested in, you'd still have to bear with the rest of the
ticker. And to get the actual live data, you have to hit refresh.
This is kind of okay on TV, but having to do this on a computer is
terrible. Computers are more interactive than TV sets and should be
treated as such. Bret Victor has given a great talk titled "Stop
Drawing Dead Fish" that conveys this in a much more articulated way.
The talk is about art, but I think having data represented on a
ticker is like drawing dead fish.

So, I got around to thinking about how to build a better interface for
the live trading data. To do that, I first had to build a streaming
API which pushes stock prices as the trades happen. And doing that
wasn't all that complicated, thanks to clojure.data/diff, watches
and http-kit.

The first step is to pull in the page and scrape out the ticker data
to get a map of the latest trades for each company like this:

Since we called our atom current-prices, it would be sensible to
reset! it now to hold the second, more recent map of trading data. Its
nice that we now have the trading data in a Clojure data structure
but note that reset!-ing our atom is really just the equivalent of
refreshing our browser- we aren't done yet.

Now, Clojure comes with a handy function called diff which is in
the clojure.data namespace. Here's how it works:

The diff function tells how one data structure varies from another.
The first map shows the key-value pairs that exist in the first map
but not in the second; the second map shows the pairs existent in
only the second map. And the third map shows the pairs that exist in
both of the maps.

diff works on seqs too, but we won't bother with that right now.

Let's see what we get when we diff the older and newer versions of
our current-prices atom:

Great. This is telling us that no trade happened for ABC. For FOO and
BAR this is showing the older and newer trading data.

Now, lets add a watch to our current-prices atom, so that whenever
we pull in new data, the watch function finds out the stocks for
which new trades happened and pushes its prices to the appropriate
clients.

Every time the current-prices atom is reset! or swap!-ed, the
function above gets called.

Here we're simply sending all our clients a string. In practice, you'd
probably pass JSON or EDN to only those clients who are interested in
a specific company. The send! function is from http-kit which has a
unified API for WebSockets, HTTP long polling and streaming. I wrote
about using Websockets with http-kit in a previous post.

And that's it. We have now built a streaming API using just a watch
function and clojure.data/diff. I think that's pretty cool.

2013-07-04

Table of Contents

Websocket is a relatively new network protocol that enables a
connection between client and server to have long-living connections.
What this means is that servers can push things to clients and
vice-versa through the same connection.

In this post, I'll provide a brief walkthrough to setting up a small
dashboard web app using Clojure and http-kit. I am assuming that you
are familiar with Clojure and already have Leiningen installed. You
can find the final codebase in this Github repo.

1 A (fake) realtime happiness gauge

Lets say that one of your main goals in life is to maximize happiness
in this world. Well, you'd want a way to measure what the happiness
level in the world is right now so that you can go save the day by
making some pissed people happy. Which is why we'll build a happiness
meter of sorts.

But this post isn't really about how to go about measuring happiness
so we'll just use Clojure's handy rand function to create some
random happiness data.

2 Project setup

We'll be sending our data to the browser using JSON which will be
parsed using Javascript and drawn into a graph. The first thing you
need to do is create a new project:

You are probably already familiar with compojure and cheshire.
http-kit might be new to you. http-kit is an alternative to the
Ring Jetty adapter(this is what you probably use if you create your
web apps using lein new compojure myapp). The main reason I'm using http-kit
here is because it provides an easy interface to Websocket.

ring-devel is required for hot code reloading, so that you won't
have to restart the server each time you make a change. ring-cors
is required to enable CORS, so that the whole world has open access
to our happiness data.

3 Websocket server

Because we decided to delegate our happiness-measuring to Clojure's
rand function, our program actually turns out to be quite small so
we'll just use one namespace; here is our program in its entirety:

At this point if you start the server using lein run and point your
browser to http://localhost:8080/happiness, you'll see the pushing
going on. But note that this isn't Websocket. What happened was
because you opened that page in your browser, with a http://
http-kit magically used HTTP long-polling instead. Its a similar technology to
Websocket that was common before Websocket came along. To use
Websocket you have to use the ws:// URI scheme, which usually
won't work in your browser's address bar. We'll get to that in just a
minute.

The most interesting function is the ws function. When it gets a
request it assoc's it into the clients atom and tells us that
someone connected. You'll notice it also has an (on-close …) form in
which we tell it to dissoc the function when our user closes his/her
browser tab.

Besides that the future form simply sends a small piece of JSON
every 5 seconds to all connected clients. I think the call to
(send! …) is pretty obvious except for the false part, which
tells the server to keep the connection open after sending our
message. By default, send! closes the connection after it has sent
a message.

Note that we are able to send! messages any time, as long as the
connection hasn't closed.

4 Front end

Now that we are successfully pushing all of that happiness data
around, we can finally represent it in a neat little chart. In the
last section, we found that its not possible to open a Websocket
connection like we usually open up HTTP connections. The way we
usually open Websocket connection from inside a browser is using the
Websocket Javascript API, like this:

varsocket = newWebSocket("ws://localhost:8080/happiness");

And that will open a Websocket connection instead of an HTTP one.
Then, you can tell Javascript what to do with the messages it
receives:

2012-10-17

In my previous post, I gave a pretty quick introduction to Clojurescript. If you haven’t already, I recommend you read through that post. This post assumes that you have some Clojure knowledge and already have Leiningen running.

In this post, I’ll show how to create a SQL database-backed Clojurescript app(you were expecting NoSQL, weren’t you?). For the lack of a better idea, I’m going to walk you through building a trivial app that helps keep track of books you’ve read. You can view the source code for the app on Github.

The Setup

We’ll use Noir as the back-end(with Hiccup generating the HTML); on the front-end besides using Clojurescript we’ll also use a Clojurescript library called Fetch, which makes client-server communication(as in AJAX) really easy and another one called enfocus for DOM manipulation(mainly stuff like event-handling). For dealing with the database we’ll use clojure.java.jdbc. To compile our Clojurescript we’ll use a Leiningen plugin called lein-cljsbuild.

So, first create a Noir project called books(I’m assuming you’re using Leiningen 2):

lein new noir books

Now, let’s add some dependencies and some Clojurescript-specific settings to our project.clj:

If you’ve gone through the first post, this should be pretty self-explanatory.

Adding a database

The first thing we’re going to do is set up our database. For the sake of simplicity, in this post I’ll use SQLite, however I think its safe to advise you guys not to use SQLite in production. Anyway, you’ll also need to add [org.xerial/sqlite-jdbc "3.7.2"] to the list of dependencies.

Pull in the newly added dependency using lein deps, then create a file in src/books/models called db.clj. To that file add:

The add-book function does exactly what you’d expect and the code should be pretty easy to understand. The argument to the function should be a Clojure hash-map, so a call to that function would look like:

(add-book{:title"Clojure Programming":review"Great book. I really need to work on completing this one, though."})

The db-read-all function pulls all entries from the :books table and returns a vector of the entries.

Views

Now, we’ll work on our views. Open src/books/views/welcome.clj to edit it. This is what it should look like:

The most important part of this is the defremote definition. Its defining a fetch remote, which simply calls the add-book function from the books.models.db namespace that we defined above. The little println call is simply there to help us see in a short while whether our program is working.

Client-side

Now, we finally get to writing some Clojurescript code. Create a new file inside src/cljs/main.cljs and into it type in the following:

In the namespace declaration you’ll notice that we’re bringing in stuff into our namespace from the Clojurescript libraries that we talked about in the beginning- Fetch and Enfocus. You’ve already seen how the server-side of our Fetch remote works, now you’ll see how the other half of it, the client-side works.

Starting from the top, the two functions get-book-title and get-book-review use enfocus to extract the value of the “title” and “review” fields in the browser. Read the enfocus docs to find out exactly how that works.

The function get-book-data simply puts the title and review into a Clojure map and returns it. push-book then pushes this map to the remote function we defined in our welcome.clj file.

The next block of code sets up a listener that calls the push-book function if the submit button is clicked. And the last line loads this listener when the web page loads.

Compile the Javascript using lein cljsbuild once and make sure you’ve added the Javascript file to your template(in common.clj). If you visit the browser now, you should see the form as expected. Fill in the title and review and hit “Submit”. And what happens? Nothing! Well, actually something does happen. If everything worked fine, the little println call in our remote function should have printed out some text in the process where you’re running the Noir server. Also, if you try running the db-read-all function we defined, you should see that a book was in fact added when you hit “Submit”.

Congratulations! You’ve created a Clojurescript application backed by a database. I know its a really trivial app, silly even but I do hope this post helped at least a few people get started with Clojurescript. And if you are interested in moving forward with this app, here are a few thoughts:

Show a list of the books already added. Should be quite trivial to add using the db-read-all function.

2012-10-14

There doesn’t seem to be much written about running Clojurescript, especially considering how great a tool it really is. I know there is a book that’s coming out soon, but I had some trouble getting started with Clojurescript so I decided to put together this post, that hopefully at least some of you will find useful. This post does assume that you have some knowledge of Clojure and that you’ve got Leiningen already running.

To those not familiar with Clojurescript, its a Clojure compiler that targets Javascript. This simply means that it turns Clojure code into Javascript. It’s like Coffeescript. To find out why you might want to use Clojurescript(and Clojure) check out this talk.

Getting started

As I said, you need to have Leiningen installed. For this post, I’ll use Noir as the backend for a really simple app that doesn’t do much. However, I’ll show how you can have the app’s client and server side communicate with each other, which’ll make use of Noir. So, we’ll just start off with a Noir project:

If you’re using Lein 1:

lein plugin install lein-noir 1.3.0-beta3
lein noir new cljsintro

And if you’re running Lein 2:

lein new noir cljsintro

Great! Now if you cd into your Noir project and do lein run your app should run and you should be able to see the default Noir page, when you visit http://localhost:8080 on your browser. Nothing special there. To be able to have your Clojurescript compile, we’ll use the lein-cljsbuild plugin. To do that, you need to add a couple of things to your project.clj:

We’ve added 2 main things to the default project.clj: :plugins and :cljsbuild. The :plugins part is pretty self-explanatory- we just added the lein-cljsbuild plugin to our project. The second thing that we added, :cljsbuild gives the plugin the configuration necessary to compile our Clojrurescript code. Let’s take a look at the configuration. Our :builds sequence contains only one map which means that we want all our code to compile with the same settings. Inside :builds, the :source-path tells the compiler where to look for the Clojurescript source files. And the :output-to tells the compiler where to put the compiled Javascript file.

Before talking about optimizations, lets tackle off :pretty-print- its pretty simple, setting it to true will cause the resulting JS file to have pretty-printed code, and setting-it to false will not. Now, to talk about optimizations- Clojurescript is compatible with with something called Google Closure(don’t confuse yourself between Closure and Clojure), which optimizes Javascript code. I’m really not familiar with Google Closure, but apparently, its really powerful and will help your code load and run faster. You can set :optimizations to three possible values: :whitespace, :simple and :advanced. Here, we have set it to :whitespace which is the most basic level of optimization but you can set it to :simple and :advanced when pushing code to production.

Clojurescript-ing

We’ve told the compiler that all our Clojurescript is to be found at src/cljs, so you’ll need to make that directory. Also, before writing any Clojurescript, let’s make a few changes to the Noir app. Open common.clj inside /src/cljsintro/views and make a few edits:

I’ve made two changes to the default template- on line 3 I’ve added include-js, which we used on the last line to use main.js on our HTML files. Note that you didn’t have to type in the resources/public where the js folder lies in because Noir is already looking there for static files.

Now, finally lets create a file inside the cljs directory called main.cljs and add the following:

(ns cljs.main)(js/alert"Hey, there")

That’s the Javascript equivalent of just alert("Hey There");. To compile it run

lein cljsbuild once

which will compile the code just once. Alternatively if you do lein cljsbuild auto, the compiler will watch for changes in the source-path and re-compile when a change is made.

Run the Noir app with lein run and if you visit http://localhost:8080/welcome you should see an alert box. Cool.

DOM

A lot of people use Javascript for manipulating the DOM- that is, adding effects like making things happen when buttons get clicked. You can do all of that stuff with Clojurescript. There are a couple of libraries available like jayq(which is a jQuery wrapper), domina and enfocus. I’ve personally used enfocus because its better documented compared to the other two. These are pretty easy to use.

Go, fetch

At the beginning I talked about making the client and server sides of our app talk to each other. Now, let’s do that using a neat library called fetch.

The first thing we’ll need to do is add fetch as a dependency. Strangely enough, fetch’s Github Readme page doesn’t tell what the latest version is and I have to go to project.clj to find it out. At the time of this writing its “0.1.0-alpha2”, so add [fetch "0.1.0-alpha2"] to :dependencies.

Now recompile the Clojurescript code and refresh your browser, and you should be able to see the result of adder applied to the numbers we provided in a JS alert box. This is nothing special, as we could have defined adder in the Clojurescript code itself, but the same principle can be applied to use with functions that needs to be run on the server.

Hope you found this post useful; you can shoot out any questions on Twitter @samratmansingh or email me. Some resources that you might want to check out: