Off late, I had been reading the Plan9 papers from Bell Labs. The papers prompted me to go back and read some of the books written by the same folks at Bell Labs.1 What is so astonishing is that the books written by these finest engineers are the clearest form of writing about programming that has ever been produced. These books almost never exceed 300 pages. Let me list some of the books here:

The C Programming Language, Kernighan and Ritchie.

The Unix Programming Environment, Kernighan and Pike.

The practice of programming, Kernighan and Pike.

Software tools, Kernighan and Plauger

C traps and pitfalls, Andrew Koenig

The AWK programming language, Aho, Weinberger and Kernighan

Elements of programming style, Kernighan and Plauger

Programming Pearls, Jon Bentley.

More programming pearls, Jon Bentley.

Compilers: Principles, Techniques, and Tools, Aho, Sethi and Ullman.

The design of the Unix operating system, Maurice J. Bach.

I believe this covers the list related to Unix/C and other related tools. There could be more that I don’t know about.

At the risk of repeating myself, here are the salient features of these writing (and also the Plan9/Unix papers).

concise.

clear and unambiguous.

lots of clean code examples to illustrate the points.

wonderful exercises.

lots of real life examples.

The AWK book has an example showing a simple implementation of make, in ~5 pages. Tell me any other book that does that.

Why is it that these authors consistently wrote these high quality books and papers? I believe it is because they worked on it first and were genuinely interested in sharing their work with others. They also strived for the highest quality in all their work - be it writing software or writing documentation.

Contrast it with the modern world. Within a few months of a new language or a library or better a “framework” appearing in the Internet, a bunch of books gets announced on Twitter and twitter handles setup to announce the upates of the book. Most of these are written by people with the sole intention of getting more search hits for their names in the popular search engines and have not built anything big and “real world”. Some of these books only discuss very superficial examples and lack exercises and real examples. Some present-day authors like to fill their books with footnotes. If I were interested in history rather than content, I would rather look up elsewhere on the web.

The Bell Labs books were all “Real World” books without attaching “Real World”2 on the titles that some of the current generation books do in order to differentiate themselves from the non-real-world ones. This is a pity and is mostly the author’s fault. In their rush to fame, the poor reader and her money and time has no place!

If you are a potential author, please read at least some of the above listed books and try to emulate them, please, for the sake of computer science!

Reading these papers also prompted me to think how much of an antithesis of Unix, the “modern unices” have become.↩

I didn’t mean anything bad about “Real World Haskell” or “Real World Ocaml”. They are both fine books written by great authors with lots of experience on the topic and it shows.↩

There are lots of books on the programming language, Haskell, which, the folklore say, has a steep learning curve. I am no Haskell expert, having embarked on the journey to learn Haskell myself two years ago and still very much learning. A friend of mine recently asked me for recommendations on Haskell books, which inspired me to write this post. Again, I am no Haskell expert, still very much a journeyman Haskeller.

Usually two books get recommended by almost everyone. Those two books are

There is no question that these two are great books. I have paper copies of both the books and still use them. I think everyone aspiring to learn Haskell should read these books, especially RWH. I am not a big fan of LYH though. I feel that the Haskell Tutorial or YAHT already covers everything in LYH in the same or better way.

There are some other books that I like along with RWH. One of the thing I look, in any programming book that I intend to buy (investing my money) and read (investing my time, because I am serious about learning the material) is that they contain exercises. I feel that testing one’s knowledge of understanding is extremely important and good exercises of varied difficulty (good books indicate the level of difficulty of exercise problems) is very important, as far as I am concerned. So, with that in mind, here are my recommended books on Haskell:

Actually I read the previous edition of this book co-authored by Phil Wadler but that book had its examples in Haskell’s pre-cursor language, Miranda. I think just about every programmer should read this wonderful book for the clarity of presentation.

Hutton’s book is probably what one should read (and work through) to get deep into Haskell. It has some great exercises as well. Some people compare this book to K&R C book. For those experienced in other Functional programming languages, this is a great book.

Writing real world programs

I found myself staring at my editor sometimes with just the below lines on it:

main =

The problem is that most of these books teach the purely functional part of Haskell beautifully well. While that is very important and requires a different mindset, especially if one has a prolonged exposure to imperative programming, it can only help to get the room warm by heating up the CPU. One need to interact with the real world to do some stuff in and out. It is extremely easy to do I/O in Haskell. Just start using it without worrying about Monads.

And that brings me to the next topic:

Do not learn monads by analogies. Do not read any Monad tutorials which compares Monads with anything else. Just start using it. Take some time to learn the type signature of Monads and start building programs with them.

If you really want to read a monad tutorial, I highly recommend these two papers.

Other misc resources

Another great resource is Don Stewart’s StackOverflow answers on Haskell and the #haskell freenode irc channel. I am not big into irc. I join the channel occasionally, it is one of the most friendly places to hangout with other Haskell programmers.

2013, like other years, was a mixed bag. For the first time in years, I had to witness the death of someone very close to me. I spent a lot of time in hospitals, in front of Intensive Care Units, slept on the benches on the hospital corridors, talked to many others like me who were anxiously hanging out at hospitals waiting to hear from the Doctors who made lightning visits, uttered a few words and left.

Some of the people went home alive, some unfortunate old and young didn’t.

On the positive side, I completed two courses on Coursera, Algorithms Part-1 and Algorithms Part-2. I really felt after completing these courses that I made a step forward in my own quest to become a better programmer.

I have no big ambitions for 2014. I just want to be a better human being, spend more time with family, do more of what I like and be with people I love.

I also hope that the world would judge people for what they are, rather than by the tags they carry (age, sex, qualifications, job…).

Ever noticed an apparel that you looked up on a website showing up as an Ad when you are browsing another website? What is going on here? How did a web page show you ads for products you visited on a totally different website?

Partly this is the work of those facebook like buttons and Google’s +1 buttons. Let us say you were logged into facebook on a browser tab. Now you visit many other pages on other tabs. Some of these pages make have the “like” buttons. Now, here is the deal: Every time you visit a page, a series of HTTP GET requests are made by the browser to get the elements (like images etc) on the page. Facebook knows from the cookies that who you are. Now they also get a HTTP GET request for a button along with this cookie and so they know which website this button appears in and so they know you visited that page.

Advertisers and their partners sometimes use cookies or other similar technologies
in order to serve and measure ads and to make their ads more effective. Learn more
about cookies, pixels and similar technologies.

Here are a few plug-ins I use with Iceweasel (that’s the name of the popular Firefox browser on the Debian GNU/Linux system) that help in making web browsing, a pleasant experience.

1. Adblock Edge

Adblock Edge(ABE) is a fork of the excellent Adblock Plus (ABP). AdBlock Plus sold out to Ad companies like Google and included a bunch of ads in their whitelist. ABE is a fork before they made the change. I guess we are indebted to ABP author for the great contribution. ABE with “EasyPrivacy” and “EasyList” filters can make the web browsing experience a lot lot nice! To see the difference, try browsing a few popular websites with and without ABE for a day.

2. HTTPS Everywhere

HTTPS Everywhere is a plugin to force https protocol if it is available, for safe and secure browsing. Most websites which requires one to login (like email, banking etc..) all implement https. But some still don’t or give an option for http vs https. In such cases, this plugin forces the use of https.

3. Duck Duck Go search widget

I had been trying to move away from Google for most of my daily browsing needs including search. Duck Duck Go search quality has been improving steadily and is very much usable for most purposes. DDG explicitly has privacy of its users as one of their goals. They are a company like Google, so they can change their policies (like the way Google did with the “don’t be evil” goal). So, watch out. Until then, enjoy DDG. Unlike Google, DDG does not wrap URLs in the search results with a redirector to track clicks.

4. Greasemonkey + NoScript

It is interesting to see the amount of code we execute on our machines without explicitly invoking a program. Every webpage include a number of JavaScript files which gets downloaded and executed when we visit websites. What do those JavaScript files do? Some of them are libraries like JQuery. Some of them are explicitly there to track users (like the Google Analytics scripts). We, the users, should have control on what should run on our machine and tracking should be opt-in, rather than opt-out.

It is also well known that a user can be uniquely identified from the Browser’s user agent string.

A number of websites work quite nicely without any JavaScript at all. GMail has a mode which works well without JavaScript. But unfortunately many don’t work well (like Amazon.com, for instance). But with NoScript, one could make this experience less painful.

5. RefControl

Everytime one clicks a URL on a webpage, which takes us to another page in the same website or a different website alltogether, the HTTP request message also sends a Referrer header which tells the website, where the request came from. This is a crucial piece of the puzzle in constructing a graph of anyone’s web browsing habbits. We could turn off those referral requests with the RefControl plugin.

6. Disconnect

There is yet another privacy plugin called “disconnect” that promises to keep trackers (twitter, facebook, g+ buttons, cookies etc) away. Since I use it in conjunction with other plugins, I don’t know how good it is working. Looks like Disconnect is some kind of a well funded company.

Apparently there are many in this category being developed by funded companies like Ghostery, DoNotTrackMe and so on. I used Ghostery and DoNotTrackMe in the past. But currently I use Disconnect as its code is freely available.

7. Other Misc settings

A few other tips:

Turn On the private browsing mode in the browser if you don’t want to store the history. Some people like to have the history to make their browsing experience easy and it has its own merits and demerits. I visit facebook only on a browser in private browsing mode. This is not enough. One also need to make sure that no other websites are visited while the facebook page is open in a tab. One need not worry about logging off. If one closes a browser in private browsing mode, no cookies are stored, so the “like” buttons on other websites cannot track the identity. (Remember, they still can profile a user based on the User Agent string)

I also clear history and cookies when I quit the browser. This can be set up on Firefox preferences.

Turn on the “Do not Track” option. Both Firefox and Chrome has this option. But make sure that you turn the DNT option on, it may not be on by default.

Use a browser that has its source code published as Free Software. This means, Firefox or variants, Chromium, or one of those webkit derivatives like Epiphany. Note that Google Chrome is not Free Software but Chromium is. Mozilla is a non-profit corporation and I trust them more with protecting the web users than a for-profit corporation that explicitly wants to know everything about everyone.

Google has access to your emails(Isn’t it ironic that they filter out email SPAM and show you spam in the form of ads on the side?), your likes/dislikes/opinions, your location and also your DNA. They also wants to know what you see and also track your eye movements within the screen and elsewhere. The Moto-X phone from Motorola/Google has its microphones on all the time reportedly to take voice commands. But it is also the new stark reality. In the name of convenience, people are enticed to give up their privacy.

Tor onion router is one of the best guard against censorship and tracking. There are many ways to use Tor along with Firefox at the cost of a bit of latency. I like to use the OS Distribution called Tails on a USB stick when browsing from an internet cafe. Tails is a special GNU/Linux based distribution that can be installed on a USB stick, which has a bunch of privacy tools built in, including a special version of Firefox with Tor button enabled.

Turn on the “Block pop-up” windows option to block the annoying popups.

Install only those extensions that have their source code published. It is a bit hard to find that from the Firefox add-on page. One has to go to the specific page for an add on and look under “Version Information”. Chose only those extensions that is made available under a Free Software license. Remember that browser is a very critical piece of software used by anyone in their daily work flow and it is extremely important that we don’t leave it to others to decide on the issues related to privacy.

YouTube has become as anoying as the regular Idiot Box these days with a lot of ads before and in-between the videos. I use YouTube Center to get rid of them and also give me a few other features like download the videos for offline viewing and so on. Not related to privacy per se, but helps in making YouTube video viewing, a better experience. It is highly likely that YouTube may do something to break this extension by changing their protocol, so that show the ads and the developer has to play a catchup game.

There is another Firefox plugin called RequestPolicy that can catch cross site requests. It is recommended for security paranoids. It gives information on the connections made by a website into other domain names (eg: http://foobar.org making connections to Google Analytics website). These connections are reported and can be blocked as well.

If you are concious about your privacy on the Internet (which every Internet user should), you should read the articles on the Electronic Frontier Foundation.

PowerSet of a set S is a set of all subsets of S. For example, Powerset of {a, b, c} is { {}, {a}, {b}, {c}, {a, b}, {a, c}, {b, c}, {a, b, c} }. For any set of length n, the powerset will have a length of 2^n.

Writing a program to find the powerset is easy to write, if we can visualize powerset. A simple inductive way to think about it is that the subsets of the set S, either has an element in S or not. We recursively apply this rule, the base case being that the empty set is a subset of S. This can be visualized as a binary tree as shown below.

Members of the powersets are the leaves of this tree. You can now easily come up with a relation:

In other words, take the powerset of S without first element, let us call this set as S’. Now, for each of the element x in S’ find the union of the first element with x. This is the first part of the power set. The other part involves those that does not have the first element and this is S’ that we have already computed. The final answer is the set union of first part and second part.

Compiling a C program on a GNU/Linux system involves a lot of magic under the hood. One of them, which is taken for granted is that the kernel version running on a system can be different from the version of kernel header files used to compile a program. The Linux kernel developers work really hard to give this guarantee to the userspace programs. Read on for a case where that guarantee got broken.

ioctl

ioctl(2) is the standard Unix way of controlling a device file from userspace. For example, let us say, for debugging, we want to read and write some registers from an i2c device. One of the ways to do this is to provide an experimental ioctl command to read/write the registers.

The ioctl call in the userspace has the following prototype:

int ioctl(int fd, int cmd, ...);

The driver API is usually implemented using a table of function pointers. The ioctl function pointer API is a little different from that of the userspace API but for this discussion, that doesn’t matter. The key point is that the second parameter cmd is passed unchanged into the kernel ioctl function call.

What is cmd?

cmd is the ioctl command code. cmd can be thought of as a 32-bit bit-field derived from a few other things to make it unique. Here are some things used to define command codes:

a magic number (defined by the kernel for each subsystem in Documentation/ioctl/ioctl-number.txt.

a sequential number that the programmer assign for the code.

type of the command (is it a read or a write or a read-write command?)

size of the data being read/written.

These 4 sets of information is used to create the bitfield by the macro _IOC

The bug

I am writing a video4linux driver for an HDMI input device. Unfortunately, this is suppose to work with a 2 year old kernel (v3.0) shipped with Android JellyBean release running on a TI OMAP4 device. For some reason, the kernel headers shipped with AOSP is a bit different from that in the kernel version 3.0.

The particular control code of interest to me is the VIDIOC_DQEVENT, which is defined as follows:

#define VIDIOC_DQEVENT _IOR('V', 89, struct v4l2_event)

I have the following code snippet in a simple userspace application (not showing the entire code here):

I observed that the select is succeeding but the ioctl call with the command VIDIOC_DQEVENT was failing with an errno ENOTTY. A bit of grepping in the driver source revealed that the ENOTTY is coming from my own driver’s default handler. This means that the switch statement didn’t succeed with the command code we passed. That was strange! This clearly showed that VIDIOC_DQEVENT has different values in kernel and userspace! Printing its value made it clear that this was indeed the case.

A bit more printing revealed that struct v4l2_event which is used to calculate the control code VIDIOC_DQEVENT has a size different by exactly 8 bytes in userspace vs that in the kernel. This was very strange because this indeed means that kernel ABI guarantee is broken.

The kernel header file include/linux/videodev2.h has the struct v4l2_event defined as follows:

Now comes the interesting part. Notice the union u in the struct v4l2_event? The largest element in the union is a 64 byte array. If you do the math, you can see that no other element in the array exceeds this size, so even though userspace has some extra structures in the union, in theory, we are not going to exceed 64 bytes. But struct v4l2_event_ctrl has another union inside which has a 64-bit value.

The compiler decided to align this value at a 64 bit boundary and also align the reserved array by another 4 bytes, resulting in a struct v4l2_event_ctrl with size increase of 8 bytes and this exceeds 64 bytes, making it the largest element in the union.

The fix

I fixed it in my system by copying the relevant portion of the userspace header into the kernel header so that the struct v4l2_event definitions match. I could do that because I know that there is no other user of the Video4Linux events in my system.

C, being a portable assembler, arranges these data sequentially in the memory and aligns them appropriately. What if you want to find the offsets from the base of each of the element?

This macro does the trick. There are other more explicit ways to calculate it. But I found this macro very neat.

#define OFFSET(x, y) &((x *)0)->y

We use it this way:

int offset;
offset = OFFSET(struct T, bar);

How does this work? The idea is based on the fact that if the structure is put in memory starting from address 0, a pointer to the element inside the structure is also the same as the offset, since the base address is 0. So, we cast an integer (0 in this case) to a pointer pointing to the structure and just find the offset to the element whose offset we are interested in. The address of that element should be the same as offset, as the structure is assumed to be laid in the memory starting at address 0. That’s it.

The linux kernel defines a similar macro in the include/linux/stddef.h called offsetof:

I had been doing a few MOOC courses in the past few weeks. Given the fact that I have to juggle between family, work, commute and the courses, I decided to put some breaks on the information intake.

I turned myself off from email, news etc (well, not completely but perhaps ~90%) in the past few weeks by unsubscribing from mailing lists and also by closing browser tabs that I am unlikely to read or benefit from in short term. I consistently kept number of open tabs in the browser to <= 5. I also installed browser plugins to warn myself and block twitter/hackernews etc after 10 minutes of usage per day from 6 AM to 10 PM. (I didn’t put restrictions after 10pm). Overall my information intake was much less.

The result is that I was much happy and got a lot of things done which made me even more happy. I didn’t miss anything (that would have helped me). I had a lot of withdrawal symptoms initially when I unsubscribed from some of the mailing lists which I had been reading for the past ~4 years. But I really didn’t miss anything at all that is of immediate use to me.

I felt a big void when the course ended. For a day or two, I didn’t know what to do with the new found free time. But after that I quickly filled it with trivia. But I learnt a bit from those intense periods of learning. I now have closed down all those pdfs and tabs that I had open and am getting back to work. I had temptations to re-join mailing lists. But I deliberately decided not to do so.

Working on something every day and getting into the “flow” helped me greatly in getting something done and contributing to some happiness. There are many sources of unhappiness in my life which I cannot do much about. But there are a handful that I can do something about, I felt working on some interesting and mind-bending problem certainly was worth it.

About an year ago, while casually browsing, I wandered into Oracle Labs website and found Guy Steele’s page. If you don’t know who Guy Steele is, perhaps watching this video is a good start. In summary, Guy Steele was a student of Prof. Gerald Sussman and they together invented the legendary language SCHEME. I am a big fan of Scheme and its simplicity. I think I have watched almost every Guy Steele lecture videos freely available out there. In particular, I am a big fan of the Dan Friedman 60th birthday lecture, “Growing a language” lecture and so on. Youtube is your friend.

Coming back to the topic, I couldn’t resist emailing him and just say that I am a big fan of Scheme. With his permission, I am reproducing the email conversation we had and his great advice.

> On Jul 6, 2012, at 6:53 AM, Ramakrishnan Muthukrishnan wrote:
>
> Thank you for the reply. Delighted to see a reply from you. I think I
> have watched all the public videos of talks you have given on various
> things Scheme related (Growing a language, Dan Friedman 60th birthday
> lecture etc) and am a big fan! Scheme has totally changed my
> perception about programming. I still have to learn a lot of things
> more deeply but I think I finally found something that I seem to
> really like -- Programming languages and thanks to you and your work
> for that. I also plan to read the "Lambda, the Ultimate" AI memos too.
> Enough to keep me busy for the next few years! Any advice from you in
> my endeavour in Programming language research will be highly
> appreciated.
>
> Thanks again
> Ramakrishnan
The main advice I have is what you are already doing: keep reading!
The "Lambda, the Ultimate" papers have held up pretty well over the years,
I think, but there is much, much more. I recommend any paper that has
Phil Wadler, Simon Peyton Jones, or Charles Leiserson as a co-author.
Also, study more programming languages. Any will do, but there seem
to be a lot of good ideas nowadays in Haskell, Clojure, Python, and Scala.
You are much better off, I think, knowing three good programming languages
than one really great one. And read code in each of these languages, maybe the
code for their standard libraries as well as an application. Good luck!
Yours,
Guy Steele

Okay, I decided to throw away my previous blog posts and start afresh. I plan to write more frequently (let us see how that goes). Last time I put a lot of restrictions on myself on what to write about. I think I wasn’t very successful with that.

I decided to try Hakyll, a static webpage generator written in Haskell. I have always been very bad at creating “eye-candy” web pages. So this is a bare bones first version. I haven’t bothered to even change the default CSS file. Instead I want to spend my energy in writing some content.

The nice thing about Hakyll is that it is so easy to build and install using cabal. The website can be compiled into a binary, which is so convenient.