I have been getting emails that it would be a wise idea to launch a Digg media website. Yeah! Why not?
Since Digg already has a video section there is not much point in duplicating it. The new site could just be digg for pictures.

Update 2008.07.30: I received this PDF, which said that I was abusing Digg's trademarks! So I closed the site. You may visit http://digg.picurls.com to see how it looked like. I also zipped up the contents of the site and you may download the whole site digpicz-2008-07-30.zip!

I don't want to use word 'digg' in the domain name because people warned me that the trademark owner could take the domain away from me. I'll just go with a single letter g as "dig" and pictures, to make it shorter picz. So the domain I bought is digpicz.com.

A new extractor (data miner) has to be written which goes through all the stories on digg and finds ones with pic/pics/images/etc. words in titles or descriptions (In reddit generator suite it was the reddit_extractor.pl program (in /scripts directory in .zip file)). Digg, as opposite to Reddit, provides a public API to access its stories. I will use this API to go through all the stories and create the initial database of pictures and further monitor digg's front page. This program will be called digg_extractor.pl

SQLite database structure has to be changed to include a link to Digg's story, story's description, a link to the user's avatar.

The generate_feed function in static HTML page generator (page_gen.pl) has to be updated to create a digpicz rss feed.

HTML template files in /templates directory (in the .zip file) need to be updated to give the site more digg-like look.

That's it! A few hours of work and we have a digg for pictures website running!

Digpicz Technical Design

Let's create the data miner first. As I mentioned it's called digg_extractor.pl, and it is a Perl script which uses Digg public API.

First, we need to get familiar with Digg API. Skimming over Basic API Concepts page we find just a few imporant points:

All requests must include an Application Key which is any valid absolute URI (used just for statistics).

The documentation also lists count and offsetarguments which control number of stories to retrieve and offset in complete story list.

So the general algorithm is clear, start at offset=0, loop until we go through all the stories, parse each bucket of stories and extract stories with pics in them.

We want to use the simplest Perl's library possible to parse XML. There is a great one from CPAN which is perfect for this job. It's called XML::Simple. It provides an XMLin function which given an XML string returns a reference to a parsed hash data structure. Easy as 3.141592!

This script prints out picture stories which made it to the front page in human readable format. Each story is printed as a paragraph:

title: story title
type: story type
desc: story description
url: story url
digg_url: url to original story on digg
category: digg category of the story
short_category: short digg cateogry name
user: name of the user who posted the story
user_pic: url to user pic
date: date story appeared on digg YYYY-MM-DD HH:MM:SS
<new line>

The script has one constant ITEMS_PER_REQUEST which defined how many stories (items) to get per API request. Currently it's set to 15 which is stories per one Digg page.

The script takes an optional argument which specifies how many requests to make. On each request, story offset is advanced by ITEMS_PER_REQUEST. Specifying no argument goes through all the stories which appeared on Digg.

For example, to print out current picture posts which are currently on the front page of Digg, we could use command:

Digg is a place for people to discover and share content from anywhere on the web. From the biggest online destinations to the most obscure blog, Digg surfaces the best stuff as voted on by our users. You wonâ€™t find editors at Digg â€” weâ€™re here to provide a place where people can collectively determine the value of content and weâ€™re changing the way people consume information online.

How do we do this? Everything on Digg â€” from news to videos to images to Podcasts â€” is submitted by our community (that would be you). Once something is submitted, other people see it and Digg what they like best. If your submission rocks and receives enough Diggs, it is promoted to the front page for the millions of our visitors to see.

One of my upcoming web projects uses the AJAX technology and jQuery. I had watched a dozen of video lectures on JavaScript before and thought that I have seen them all. Today I came across John Resig's website and found that he just had been at Google and gave a video lecture on Best Practices in JavaScript Library Design.

John Resig is a JavaScript Evangelist, working for the Mozilla Corporation, and the author of the book 'Pro JavaScript Techniques.' He's also the creator and lead developer of the jQuery JavaScript library and the co-designer of the FUEL JavaScript library (included in Firefox 3).

This talk explores all the techniques used to build a robust, reusable, cross-platform JavaScript Library. We'll look at how to write a solid JavaScript API, show you how to use functional programming to create contained, concise, code, and delve deep into common cross browser issues that you'll have to solve in order to have a successful library.

Here is the video:

And here are the slides:

Things that caught my attention in video:

(01:49) jQuery was released on Jan. 2006 and it's main focus is on DOM traversal.

(05:39) Similar objects should have the same method and property names so there was minimal learning curve.

(07:16) Fear adding methods to the API. You should keep the API as small as possible.

(11:38) jQuery 1.1 removed reduced size by 47% by removing unnecessary methods, breaking compatibility with jQuery 1.0. A plugin for 1.1 was released as a separate package to provide the old 1.0 interface.

(14:10) Look for common patterns in the API and reduce it to its code.

(15:33) Be consistent within your API, stick to a naming scheme and argument positioning.

(17:11) Evolution of a JavaScript coder:

Everything is a reference!

You can do OO code!

Huh, so that's how Object Prototypes work!

Thank God for closures!

(21:20) In JavaScript 1.7 there is a let statement which declares variables local to a block.

(22:43) If you wrap your entire library in (function() { ... library code ... })() the code will never mess other library code.

(32:09) You can tweak your Object constructor to make Constructor() work the same as new Constructor().

(34:37) There are three ways to extend jQuery, you can add methods, selectors and animations.

(36:37) Message passing from one component to another is best done via custom events.

(37:45) Quirksmode is a fantastic resource which explains where specific bugs exist in the browsers.

(38:53) DOM Events and DOM Traversal problems are solved in depth, many others, such as getting an attribute and getting the computer style still require hard work.

(41:09) In Safari 2 the getComputedStyle is null if it's called on an element with display: none or on an element which is within an element with display: none. Safari 3 implemented the interface but they just return undefined.

(45:08) Use structured format for documentation. What's nice about it is that it can be converted to other formats and given to users.

(48:16) Let your users help you by putting out documentation in a Wiki.

(49:36) Don't trust any library that doesn't have a test suite.

Here is the QA:

(52:40) Is jQuery more or less targeted on FireFox and wouldn't actually be reasonable to use, say, on a cellphone?

(53:29) How do you filter noise in community?

(54:30) Is jQuery going to get multiple build sets?

(55:05) Will there ever be time when library development like this is not necessary anymore? Or do you think that the ecosystem of libraries is good for advancing state of the art?

I remembered that reddit had decided not to display posts with a submission time less than two hours ago.

This left me thinking, if the scores are not displayed for new posts, what's the point of having vote boxes on a just posted article page? I thought, it wouldn't make sense if it wasn't available. Quickly did I find a link on reddit's new page which seemed to have received a few votes and added a reddit's button to an empty HTML document.

A reddit voting button/widget can be embedded on a site by putting the following JavaScript code fragment anywhere in the HTML source:

Greasemonkey is a Firefox extension that allows you to write scripts that alter the web pages you visit. You can use it to make a web site more readable or more usable. You can fix rendering bugs that the site owner can't be bothered to fix themselves. You can alter pages so they work better with assistive technologies that speak a web page out loud or convert it to Braille. You can even automatically retrieve data from other sites to make two sites more interconnected.

Greasemonkey by itself does none of these things. In fact, after you install it, you won't notice any change at all... until you start installing what are called "user scripts". A user script is just a chunk of Javascript code, with some additional information that tells Greasemonkey where and when it should be run. Each user script can target a specific page, a specific site, or a group of sites. A user script can do anything you can do in Javascript. In fact, it can do even more than that, because Greasemonkey provides special functions that are only available to user scripts.

Do you see where I am aiming? I will write a "user script" in JavaScript programming language which I just learned in more details to find the "just posted" links on a reddit page and replace the original up/down vote box with the vote box widget which reveals the current count of votes!

There is a great free book available on GreaseMonkey which explains it through code examples. It's called "Dive into GreaseMonkey." It's only 99 pages long and can be read in an hour if you know JavaScript already!

Writing the User Script

The basic idea of the script is to parse the DOM of reddit's page, extracting all the posted links and find the links which do not have score displayed (which are newer than 2 hours), then replace the HTML of original up/down vote box with the widget's HTML.

First we need to understand how reddit's entries are layed out on the page. To do this we could view the HTML source of the page, but this method requires too much effort for us because we'd have to prase HTML in our heads. Let's use something more visual. There is an extension to FireFox called FireBug which allows to explore the HTML of a page in a much nicer manner.

We see that each entry on the page is wrapped in two <tr> elements, where each of them have a class name "oddRow" or "evenRow". Our GreaseMonkey user script will have to find these rows and extract the title information from the first row, and date information from the second row.
To do this we use the DOM's getElementsByTagName function to retrieve all <tr> elements on the page, next we loop over these elements matching those having a class name "oddRow" or "evenRow" and maintain a state whether we are matching the first or the second row and call extraction functions for each row accordingly.
Look at find_entries function in the final script to see how it extracts all the entries from the page.

Once we have extracted the entries, all we have to do is replace the HTML of the original up/down vote box with HTML of a up/down vote widget. (See the display_votes function in the final script)

And we are done!

Here is how the reddit page looks like when the GreaseMonkey script has run:

Notes: this script works only with the wonderful FireFox browser. To run it you will also need GreaseMonkey extension.

Once you click the link, FireFox will automatically ask you if you want to install this script. Select "Install" and visit reddit.com/new to have infinite power over the regular users!

Here is a screenshot of how the GreaseMonkey user script Installation dialog looks like:

ps. Has anyone successfully debugged GreaseMonkey scripts? I could not find a way to set breakpoints or even load the user script in any of the debuggers which come with FireFox. Any suggestions?

pss. The current implementation of the script replaces the original up/down vote box with an iframe. This is kind of ugly. I'll leave it as an exercise to a curious user to change the script to retrieve the score via XMLHttpRequest interface and change the "published NN minutes/hours ago" status line to one with the score in it. :)

I have a great interest in the C and C++ family of programming languages and their history, and I have read two of Bjarne's books - C++ Programming Language and The Design and Evolution of C++. I enjoyed every page of these books and they made me not only a decent C++ programmer but also made me understand how the language was formed, what it's goals were, where it was headed and how the language got various constructs it has now. If you ever consider becoming a great C++ programmer, these books are a definite read.

The most fundamental things these books taught me was to think think of various levels of abstraction and approaching a given programming problem from various programming paradigms.

When I found the link I put aside all the things I was working on and started watching the video lecture! I love C++ that much!

A note aside for people wanting to learn C++. I see people argue on programming.reddit.com and other sites that C++ is not worth learning, that it's is a dead language and X is better than C++, etc. Don't listen to this crap! If you ever watched Guy Kawasaki's The Art of Startvideo presentation, the 11th point of success is "Don't let the bozos grind you down." That's what they are trying to do if you listen to them! Just start learning C++ and you will succeed with it!

Now, back to the lecture. Here, I cite what the lecturer has to say about his lecture:

A good programming language is far more than a simple collection of features. My ideal is to provide a set of facilities that smoothly work together to support design and programming styles of a generality beyond my imagination. Here, I briefly outline rules of thumb (guidelines, principles) that are being applied in the design of C++0x. Then, I present the state of the standards process (we are aiming for C++09) and give examples of a few of the proposals such as concepts, generalized initialization, being considered in the ISO C++ standards committee. Since there are far more proposals than could be presented in an hour, I'll take questions.

Just like I did while learning JavaScript from video lectures, I am going to timestamp blog about most interesting things that caught my attention!

Here I list the things that caught my attention in Bjarne's C++ video presentation. Time in the brackets is when it appeared on the video. '+' before the brackets indicate that I knew it already, '-' that I didn't (just for personal notes). I will write down some obvious facts about the language even though I know them, so you got an idea what the lecture was about.

+(05:30) C++ is a better C in a way that it can roughly do the same as C but also has many new features

+/-(06:55) Highest level goals of C++ are to make it a better language for systems programming and library building and make it easier to teach and learn.

(08:46) Joke: The next Intels will execute infinite loop in five minutes and that's why you don't need performance. :)

+/-(10:00) The main problem for a new revision of the standard is the popularity of C++. Existing and new users want countless improvements. Adding a new feature needs to keep the existing code absolutely stable. Each new feature makes the language harder to learn.

-(43:43) There are too many ways to initialize things in C++ and they work in various contexts, C++0x introduces uniform initialization syntax which can be used in any initialization.

-(46:50) Fundamental cause of lots of problems in C++ with generic programming is that the compiler doesn't know what template argument types are supposed to do.

+/-(49:30) C++ 98 got templates right in a way that parametrization didn't require hierarchies, parametrization could be done with non-types, the code generated had uncompromising efficiency and that it turned out that template instantiation was Turing complete!

-(54:42) Concept aims of C++0x are direct expression of intent (lecture got cut here (they ran out of type or something) :( and the next moment was somewhere in the future), no performance degradation compared to current code, relatively easy implementation within current compilers and that current template code must remain valid.

-(55:33) Lecture continues here from where it was cut. It's something about type system how it makes sure correct data types using just declarations, and about compile time type contracts through templates.

-(1:06:14) Quick summary: template aliases, initializer lists, overloading based on concepts, type deduction from initializers, a new for loop for ranges.

After the lecture the following questions were asked:

(01:09:40) What's your opinion about the Microsoft implementation of C++?

A: Microsoft's implementation is the the best out there, they conform to the standards pretty well and the code generated is also good. GNU gcc is also good. Though, they want you to use their "Managed C++" called C++/CLI which is totally unportable. Apple does the same with their version of C++ which is Objective C/C++ and and so does GNU. They all play this game of trying to get users just to use their product and not switch to their competitor products.

(01:11:56) Do you think you'll ever design a new language from scratch?

A: Certainly not from scratch. You have to answer the question, why are you designing a language? You design a language to solve a certain problem. If I ever designs a new language it will be because I feel that some problem needs a solution.

(01:13:39) You mentioned threads, are there other things like transactions and cache mangement?

A: Concurrency is becoming very important. The question is how do you do it? My solution is to provide language primitives out of which you build libraries that use these primitives and provide various models of concurrency. Doing it directly with language primitives is too hard.

(01:16:25) How long after the standard is out do you expect to see a production compiler?

A: After the C++0x standard is approved and released the vendors will start releasing compilers right away. Some of them have already built in some of the upcoming features.

A: Yes, it is possible to do GC in C++. An implementation already exists and I will have a discussion tomorrow on whether to put it in standard. There are two problems, though. One is that people would start writing poor code never caring to free the used memory which would lead to poor performance. The other is that GC can be a performance virus.

(01:24:39) A lot of academic institutions have dropped teaching C++. As a result there are a lot of poor coding practices and poor coding solutions coming in from people. Are there any plans to have some documentation on how it would be more teachable?

A: I have become an academic for the last couple of years and someone talked me into teaching undergrads. I am more used to serious Ph.D's from good universities with 10 years of experience and it's not quite the same! :) I tried out ideas and I wrote a text book which will get out some time next year.

(01:26:24) How soon after you created C++ did you see it start to take over the industry?

A: The first commercial release was in 1985. I had access to data how many C++ users there were and kept track of it during 80s. From 1979 till 1991 the doubling rate was 7.5 months. And now we are at 3 million users.

(01:28:25) A lot of template classes at the moment use template hoisting to make them more efficient in terms of code size at compilation time. Is there anything being done to address the issues that make it necessary?

A: There is a trick of avoiding a lot of separate template instantiations based on void pointer. I don't see any changes to that. That's a portable way of doing it.

A: It would be a good idea, but what's mostly called generic programming at runtime level has either so many indirections that it runs at 1/10ths of the speed of non-generic code or it's not too generic and you can't do any of the interesting stuff.

(01:31:33) You talked about having user defined types act the same way as built in types. Pointers are used for various optimizations like function overloading and smart pointers. Do you see a problem here? Is it being solved?

A: First of all, I think smart pointers are overused. Secondly, we can emulate inheritance with smart pointers. You can basically build a perfect smart pointer now. I worry about smart pointers because if you use a smart pointer and I give you one and we have no agreement on how mine works, we got a race condition. We have no lock on this code and we got two pieces of code which poke in the same area. You have to be very careful of the semantics of this smart pointer.

(01:33:38) There are interesting parallels between templates and duck typing used in dynamic languages. Will templates overtake classes for writing code and filing contracts?

A: Yes, templates has roughly the same as duck typing in scripting languages done dynamically. I think interfaces will be much better specified with concepts and there is still a large components of duck typing. Templates are becoming more important. Please remember that templates by themselves are nothing! They help you to abstract you over something.

(01:37:14) Have you ever gotten any death threads because of the changes in the language?

A: I have never gotten any death threads for any reason. And lets keep it that way!

A: Yes, I like underscores. I do not like the camel stuff, it's less readable.

(01:37:57) The new language features you come up with. There are so many languages upcoming right now? Do you try to reuse any of the things they have done?

A: I try to learn from new languages, particularly, the users of new language. But grafting from one language to another is much harder than most people think. When you see something work in one language, then you see what problem are they solving, can we solve as elegantly in C++? If the answer is no, then we see how it can be solved and see the way it was done in other language. But simple grafting is a very hard exercise.

(01:38:52) When you initially designed the language, did you start from rigorous specifications or how did it start?

A: I am trying to be rigorous, but it's still informal in a sense that it is written in English. I started out with C specification written by Dennis Ritchie. Some things have improved, some have become more obscure because of the more words people use. We have tried several times to see if we can make it also formal. It would be nice to have formal sematics either for all of it or parts of it. It has not been that successful over the years. But I am very happy to report that a group from IBM this year managed to prove that C++ inheritance system was formally sound. It's proven. So 20 years later they proved that I didn't screw up.

(01:40:18) With Sun releasing some hardware which runs Java bytecode, are you afraid that it could take away C++'s embedded position?

A: Java would kill C++ totally in 2 years, Sun said in 1996. They sort of been repeating this story over and over again. There is a lot of Java, and there is a lot of C++ and it's a big world.

(01:41:05) How do you balance things at compile time and runtime, for example exceptions?

A: If you need to have balance, something at runtime, then you have to have it at runtime. For example, most of the good uses of virtual functions can't be done at runtime because you don't have the information. Talking about exceptions, there are compilers which add no overhead if no exceptions are thrown. There are trade offs and some things you just need to do at runtime.

You can watch the lecture right here as an embedded flash video, or you can download the this lecture:

Ever since I published my personal sed, ed and awk cheat sheets in .pdf and .doc formats, I have been receiving suggestions that I should also create plain text versions of them. People said that it was ridiculous to have UNIX tool cheat sheets in .pdf or Microsoft Word (.doc) formats and not to have them in plain text.

I agreed and converted the UNIX tool cheat sheets to plain text format and did some ASCII art formating so they looked neat.

Enjoy!

(If you also want to download printable .pdf or .doc of these cheat sheets, follow the three links at the beginning of this post!)