Tuesday, November 15, 2011

Clojure/conj: The Talks

The previous mammoth post was about the questions and answers I found at the Clojure/conj conference and training. This mammoth post is my notes from the talks.

Thanks to my employer, Canonical, for the opportunity to go to the conference. When Canonical gives employees time to go to a conference, we have to summarize it. The summaries are often company-internal emails, but I like blogging about them. So here we are.

I went to all of the officially scheduled talks. With the exception of one presentation that I simply didn't follow largely because of my own limitations, I enjoyed them all. For most talks, the slides are available in a git repository and the videos will be up soon (presumably on the conference website?), so you can go and get the information you want yourself if you like.

After reading books, doing Clojure koans, and solving on-line Clojure exercises, what should you tackle next? A project! Stuart gave some ideas of interesting Clojure corners that might lead to cool projects. I took a number of notes, but I think the slides are the best summary if you are interested.

Among many other ideas, Stuart talked about how Clojure's reader could be used to make a data serialization format. He described the parts of Clojure that would be used to do this (the reader and the printer).

The abstract serialization in the reader that he mentioned ("#=(...)") also struck me as the tool necessary to potentially serialize anything, including references to functions. That would then give the building block equivalent to Python's pickle, from which an object database could be built....

David's project was interesting--he's dividing up a SPARQL query across databases, and reassembling the results as they stream back. The stream processing is via a complex tree pipeline that can split and join; the assembly is all done in a single process using a fork-join pool, and he was presenting a DSL and tool for describing and building that tree.

Interestingly, he can interpret the same code tree describing a given processing pipeline in one of three different ways. He can process it to define the node structure, he can process it to convert the description into code for each node to run, and he can process it to define some kind of records that are generated.

He accomplishes this either via a macro namespace trick, or a more strightforward use of a zipper to walk around the code tree and build the desired result. The macro trick is cool: he has different namespaces for the different interpretations of the code tree, each providing the same function names; and his code switches the namespace of the code to be processed depending on what result he wants. It was the kind of trick that he said he couldn't decide if he preferred or not, but it's a neat idea.

Anthony is among the more impressive 17-year-olds that I've seen. He talked about the code he uses to provide the sandbox that lets anonymous web users more-or-less safely evaluate Clojure on his JVM. You can see this by playing around with Clojure on the web at Try Clojure, or by solving some Clojure puzzes on the web at 4Clojure, or by interacting with clojure on IRC via lazybot.

This was an interesting talk about the internals of ClojureScript. In my notes I have a couple of random points that struck me.

[As with adapters,] using Clojure protocols is probably safe from conflicts with other library code if you own the data structure or the protocol. If you are a middle-man, gluing one person's data structure with another person's protocol, the likelihood is somewhat higher that conflicts will occur. [This observation struck me as profound when I heard it, but in retrospect I'm a bit less convinced. :-)]

The data structures in Clojure copy on write up to a certain size (32 elements, I heard later), and then begin to use the clever code to share structure. The JavaScript code always copies on write, at this time. That might adversely affect the performance of certain applications.

Lunch

I had a nice conversation with Frank Gerhardt. A couple of interesting points came up.

The nice tools Clojure provides for thread-safe code are not inherently useful tools for using the resources of multiple machines. For applications that must spread across multiple machines anyway, would it always be simpler/better to have multiple single-threaded processes all communicating in the same way, ignoring whether they share a machine or not? It seemed so to me, on the face of it. If so, the limitations of the Python GIL are unimportant for these sorts of use cases.

Clojure's STM does not have any conflict resolution tools, as far as we could tell, beyond commute. That, like the inability of other code to participate in the transaction that I mentioned in the previous post, limits the utility.

Something I learned at the conference is that Scala and Clojure, as two functional languages on the JVM, share several goals, and are able to share a number of ideas for their persistent (read "persistent" as immutable and structural-sharing-on-write) data structures. Phil Bagwell works with the team developing Scala. He wrote a paper years ago that was the basis for Rich Hickey's implementation of persistent vectors. At this year's conj, he mentioned some other cool persistent data structures.

The one he spent the most time talking about was one that could concatenate two vectors at essentially constant time, which is not much of a trick for linked lists but very nice for an array deque. See the paper I linked to for information on it. Other goodies he mentioned in passing were collections for parallel work (as in Java's fork-join) that allowed work to be stolen from other parts of the tree without conflict; and, if I took my notes correctly, non-blocking resizing concurrent hash tries, which will shrink memory usage as they shrink in size, unlike other similar data structures.

He also spent a bit of time at the beginning of the talk talking about Clojure advocacy. He emphasized that it is important to reduce the imagery of the risk of a new technology. He showed a (as far as I could see, data-less) graph indicating that people generally consider new technologies to be riskier than they really are before they have been integrated into a company or community; and that people generally consider formerly new technologies to be more valuable than they really are after they have been accepted and integrated. I may be misrepresenting that, but that's how I read it. It appeals to my cynical side.

A lot of people in the Clojure community are excited by core.logic, which is essentially Prolog reimplemented in Clojure. It is based on work described in The Reasoned Schemer, the authors of which were in the audience. They gave what was reported to have been an amazing after-hours talk. I understand that it was videotaped; I intend to watch it when it comes out.

Ambrose did a very nice job. I don't have much to add that you can't get better by following the link I gave above. I've played with stuff like this years ago and I'm a bit skeptical of its practical utility, but I'm interested enough that I plan to set aside some time and give it a whirl.

I was blown away with this presentation. I learned more practical information from other talks, but this was one of the highlights of the conference for me.

In part, it was because he explained the scope of the problem that genome researchers are trying to solve in a way that seemed clear to me, and it was so grand, and so promising. He described the problem as trying to solve a substitution code in a message written in a language for which you have a dictionary of only 10% of the words, no grammar, and a majority of irrelevant, garbage characters mixed in-between the words. If we get it right, we learn how to grow hearts and lungs.

The presentation was also a highlight because the presentation was so visually rich (this was among the more compelling Prezi presentations I've seen).

And finally, the approach he took to solving the gigantic problem he had was interesting, and reportedly worked well. I linked above to the description he gave me of the "quality threshhold clustering algorithm." He's starting a company now with the belief that his process can be usefully applied to a number of problem sets. I don't think I work on those problems right now, but I bet more of them exist.

I was going to dig into the analysis steps he described, but as I started to, I realized I simply didn't really understand them well enough for it to be of value. He gathered possible results, filtered and refined them, ranked them, removed overlapping results, and had his answer. Said at that high of a level, it sounds like good advice for many generic data-informed activities, such as buying a car, so I have to assume I'm missing some secret sauce in there. :-)

Stuart HallowayOusterhout's Dichotomy Isn't

Slides are not up yet as of this writing.

Clojure's community sometimes represents Clojure as an optimal combination of simplicity, power, and focus. This talk was about the "power" part of the triumvirate.

Clojure is good at "scripting" and "system" tasks, despite being a dynamic interpretive language, and so it is (another) example showing that Ousterhout's Dichotomy is arbitrary.

What is a "powerful" language? He argues that power is work per unit time, and so a language that gives you concision and simplicity to produce good things quickly is a powerful language.

The W3C has some advice to choose the language with the "least power." Given the definition of power as work per unit time, this seems silly. [That said, I looked at the document, and it actually seems like it's the start of a reasonable idea to me. Their definition of "power" is something akin to "complexity" though, because they regard imperative languages as more "powerful" than functional languages, and prefer functional languages for this reason.]

Making Clojure development a bit easier is a way to bring more power, because it can make people work faster. He'd love to see better error messages, more sample code, findable libraries, automated deployment, and a beginner-friendly environment. Much of this can be done in libraries, and he encourages developers to work on it. [+1000! And yes, I know I am an encouraged developer.]

Neal Ford

Neal's Master Plan for Clojure World Domination

Slides are not up yet as of this writing.

Neal's talk centered around how to get Clojure into the "enterprise"--big companies with lots of money.

How do things become popular in enterprises? "Rain down or sprout up." Mollify CTO's fears by becoming pervasive and a politically acceptable alternative, or make it easy for developers to bring you into the company by having a vehicle that carries the technology in on the back of some desired feature.

How do you make something popular? Propagandize, decide on a message and stay on it, and stay relentlessly positive.

How do you build a bridge to popularity? Fight the status quo, and recognize that people who are also disrupting the status quo are competitors, not enemies.

David Nolen

Predicate Dispatch

Slides not up yet as of this writing.

We have core.logic (Prolog-in-Clojure) and core.match (pattern matching), so we ought to be able to have a very efficient predicate dispatch instead of Clojure multimethods. The principal advantage is openness: users will be able to adjust predicate dispatch outcomes more easily than they can with multimethods.

David outlined some of the challenges and some possible solutions. Rich Hickey (Clojure creator) replied that he is in favor of some of the solutions, and that we ought to do it. People are happy. Sounds good to me!

Kevin appears to have been the only person at the conj with production code using ClojureScript (Clojure compiled to Javascript). He reported on how it was to use the tool.

Using ClojureScript instead of JavaScript is like using the metric system instead of the imperial measurement system: it's simpler, easier, and more fun.

He objects to the characterization that "only geniuses use functional languages": there are simple rules to follow, and he found them easier.

Interop with Javascript is interesting because sometimes you want to get references to methods. (.aMethod js/an_object) gives you the reference. (. js/an_object (aMethod)) calls the method.

A Javascript literal is coming as #js(...).

The "thread-left" macro ("->") works well with Javascript APIs that link together. "item.select(...).append(...).text(...)" becomes "(-> item (.select ...) (.append ...) (.text ...))". Moreover, the "thread-left" syntax lets you feed in other functions nicely, unlike the native Javascript spelling. For example, "(-> item (.select ...) (myfunc) (.append ...) (.text ...))" has inserted "myfunc" to operate on the item (or even create a new one) and then the rest of the call proceeds as before.

Using the same language on client and server is very nice, but it's a lot nicer when that language is Clojure than Javascript (e.g. Node.js).

ClojureScript + Google Closure compiler + a Javascript library + your code makes for a lot of steps, and when something goes wrong inside, it can be hard to debug. The "this" binding problem bit them.

Final thought: Tools don't make the artist. We sent people to the moon with sliderules and the imperial measurement system. Do good stuff with what you have.

I know from having looked at blog posts of his that Christophe is a programmer to learn from. However, I had a lot of trouble understanding him. This was the one talk that I really simply didn't follow. I intend to look at his slides and try again.

I think Launchpad's Technical Architect, Robert Collins, would agree violently with the main thrust of this presentation. He's been working in this direction quite a bit lately.

Mark pointed out that one of Clojure's strengths is viewing everything as data, so that everything is simple abstractions with a general API. Code is data, HTTP requests are data, HTML structure is data, SQL queries are data, and so on.

Our logs are data too, and yet logging is usually synonymous with writing to a file. We should instead focus on events, data, generality, and openness. Log crunching should become event processing, with events aggregated and consumed by many possible services.

Heroku's pulse is an example. It is a real-time metrics service. The demo was very cool, with lots of tiny live graphs. :-)

I asked if they were using a message queue for their event aggregator (which is what Robert is doing). They wrote their own, because they said they did not need or want routing. That seems questionable on the face of it, but I have no idea what the problem really looks like from the inside.

Chas Emerick

Modeling the World Probabilistically Using Bayesian Networks in Clojure

Bayesian networks are a nice kind of AI because there are no black boxes: the simplicity means that you can determine the cause of inferences, and affect them. The speaker explained how Bayesian networks work. He has been developing a Clojure-native Bayesian model named Raposo, though it is not released yet.

Daniel has done some amazing work to get Clojure working on Dalvik, the Android VM. The differences between Dalvik and the JVM do not sound trivial. Clojure can work on Android, thanks to his work, but Scala and Java are faster to start and lower in memory. He has improved this, but more work needs to be done.

Clojure has the potential to be a first-class Android language. Dynamic development on Android is a killer feature. It needs better community and tools.

The author is working on an O'Reilly book called "Decaffeinated Android," due out next year, about Scala and Clojure on Android.

From the Q & A, startup time of Clojure applications on Google App Engine is also an issue; it's possible that the optimizations that Daniel made for Clojure might be of interest for that purpose as well.

(I took a lot more notes on this one, but I suspect the slides are sufficient if you want more.)

Rich Hickey

Keynote (Areas of Interest for Clojure's Core)

Slides not yet available.

I took a lot of notes on this talk, but I'm just going to include some things that particularly interested me. If you want a lot of details, the video will be up soon enough, hopefully.

Rich is interested in giving Clojure multiple build targets, some providing leaner performance characteristics, some providing better debugging, and so on. I hope these are mix and match, at least to some degree.

He has been thinking for some time about a way to rethink transients, which are a way to have temporary mutables within a function for performance. They combine policy with basic behavior, and he wants to separate the policy out. He's considered "pods," which would handle threads and locks for you. That might still conflate too much in one concept. He discussed making "accumulators" but I didn't take good enough notes on how they might work.

He agrees that Clojure's reader might provide a data format that could be superior to JSON and XML. To do that, we need to have the "X" in XML: extensible. We could maybe do that with a metadata tag describing how to read normal data structures. An example might be #instant "1985-04-12T23:20:50.52Z", which tags a string as something that would be parsed as an instant in time. Custom tags could be namespace-delimited. Configuration could describe how to map a particular tag to a particular data structure within a given application. Maybe this is a grass-roots way of getting Clojure in to some areas?

Cascalog (code) is a high-level abstraction above Hadoop map/reduce. It is tedious and verbose to express a problem in map/reduce. Cascalog's abstractions provide an opportunity for composition.

This looks like a very nice project. I like his emphasis on composition: you can define functions that compose parts of your query, to make a collection of helpers for your domain and needs.

Here's an example of a query to make a Hadoop map/reduce cluster print people younger than 30 to stdout (from some data set that I didn't clarify in my notes): (?<- (stdout) [?person] (age ?person ?age) (< ?age 30)). If I remember correctly, the (age ?person ?age) represents an implicit join. You can build your queries with predicates, filters, aggregators, generators, and sub-queries.

Nathan works at Twitter and recently made another splash with his Storm project: a Hadoop for real-time data, that allows for distributed and fault-tolerant real-time computation. For example, if you want to process tweets coming from Twitter and generate constant statistics from a flow of data, that's one of Storm's use cases. I found the introductory slides to be helpful in understanding the applicability. It is written in Clojure, but has the ability to connect to many languages, including Python.

In large part, this was a bit of a remedial review of basic functional data structures shared by Scala and Clojure. I had studied much of this before, but forgotten a lot of it, so I appreciated it. He covered linked lists, bankers deques, 2-3 finger trees, red-black trees, and the bitmapped vector trie that Phil Bagwell (Scala) and Rich Hickey (Clojure) worked on.

The most interesting thing to me was his discussion of how modern computer architectures affect the performance of these data structures. He used the phrase "locality of reference" to describe the idea that architectures optimize for arrays and loops by getting chunks of memory together.

Practically, this means that the bitmapped vector trie performs very well, because information is divided in memory into sequential chunks of 32 elements. The JVM or Java (unclear to me) does special tricks so that linked lists can take advantage of the same memory chunk optimization. 2-3 finger trees, while a very cool data structure and with good performance characteristics from a big-O perspective, are comparatively very slow practically because they cannot take advantage of this. I suspect the same design characteristics would lead to similar performance characteristics for Chris Houser's finger trees.

This talk focused on the "why" of Lisp macros. He began with this lesson (paraphrased, but hopefully in the right ballpark).

The apprentice came before the Macro master. The master asked, "what are macros good for?" The apprentice put his finger to his lips in thought. The master drew his sword and severed the finger. Before the apprentice could run away in terror, the master commanded, "Stop! I ask again, what are macros good for?" The apprentice controlled his shock and fear. Thinking, he tried to put his finger to his lips...and he was enlightened.

Nothing like a good finger-severing to catch my attention.

In Q & A, the speaker refused to explain the story. My interpretation is that macros let you do something that you could not otherwise do in any way. In any case...

He discussed possible use cases for Macros.

Binding forms and control flow are cited by others as use cases, but macros are not necessary (he cited a Ruby example).

Icing? For amusement, he showed a SchemeScala macro in use that interpreted code inside it similarly to Commodore-64-era Basic, complete with line numbers and gotos.

Anything in a language that is semantically useful can eventually become syntax.

Macros are good at giving first class control of second class data (e.g. Clojure metadata).

Lightning talks

Alex Redington talked about scheduled events and his Monotony project.

A RedHat employee talked about Immutant, an application server for Clojure that they are developing. I captured these soundbites: "...Ruby in front, Clojure in back..." "inter-language messaging" "uniform deployment" "simple horizontal scaling"

Alan Dipert and John Distad have a (another?!) Lisp on an Arduino! They gave a quick demo of flashing lights, which was cool. AFAICT they actually had a REPL running on the Arduino, or at least controlling it.

Chris Granger talked about his Korma SQL query tool. I've seen it before and it looks interesting. Like Nathan Marz in his discussion of Cascalog (above), he emphasized composability as a key feature of why you would want to use Korma rather than, say, raw SQL, or some other tool. He also mentioned that Korma is divided up into two parts: code that assembles a data structure that represents the SQL, and code that converts the data structure to SQL. The clear delineation means that you can easily control the output by controlling the data structure, and you can use as much or as little of his actual syntax as makes sense for your purpose.

Craig Andera

Performance in the Wild: Faster, Baby

Slides are not up as of this writing.

This talk was the one that hit closest to home for me practically: it was about optimizing database-driven transactional web-based systems, and didn't have a lot to do with Clojure specifically. I thought he had a lot of good points, and I took a lot of notes.

Craig was describing a way of approaching performance engineering: a model-based optimization loop.

Models of your traffic are important, and always wrong. Performance is not a scalar quantity, and the world is not deterministic but stochastic. How wrong does your model have to be to not be useful?

When generating models, consider a distribution of latencies at a given load for a given transaction mix (reads, writes, tables accessed, etc.) Consider other constraints, such as how old cached content is allowed to be, under what conditions. Perform load testing with simulated users based on your models, and determine the slowest speed of your application for, say, 80% of requests, 90% of requests, and 98%.

Work within an optimization loop: benchmark, analyze, recommend, optimize, and benchmark again. Analysis may lead you to say you are done because you have met your goals; or coming up with recommendations may lead you to say you are done, because the remaining available optimizations are not cost-effective.

When benchmarking, understand what you are measuring, because your assumptions are often wrong. Start benchmarking early in the project, not late [this is my favorite piece of advice]. This can give you immediate feedback as to whether a particular change causes problems. Watch for errors--count them. They have experience creating load for benchmarks with jmeter, ab, and httperf.

When analyzing, empirically identify the biggest factor in the performance of your system using a profiler.

When optimizing, fix just one thing. Remember and recognize that a redesign may be necessary.

They have had success with always having two cards on the board, with one active at a time. One is for "Benchmark, Analyze, and Recommend" and one is for "Optimize."

This was a really fun presentation, and seemed to be a crowd favorite. He showed how he and other researchers are using Clojure to build Overtone. This project is trying to help make new ways for people and computers to collaborate on music, and in particular on live music performance. There's a parallel between this effort and computer-assisted chess games. He demonstrated a variety of live music making with tracks and synthesis, using a monome and using the Clojure REPL. It was really cool, and it sounds like a fascinating effort.

Things I missed

More lightning talks after-hours.

An after-hours presentation by Byrd and Friedman about logic programming. This was supposed to be fantastic; I hope to watch the video when it is released.

Some other after-hours unconference bits.

In summary...

This was a fantastic conference. I'm really glad I went, and I'd love to be able to go again next year.