Monday, May 30, 2005

I watched all of Doug's 1968 demo today. I cannot help feeling we believe progress was made just because some eye candy has changed. Even today, some of the things Doug et al showed back then are still sci-fi for most of us. Sigh...

One of the projects I'm working on is my compression stuff. I've neglected it long enough already...

Something I am working on right now is the ability to cascade codecs. The main problem is that I do not want 500mb to be processed by a codec, then the resulting 400mb by another, then the resulting 300mb by another and so on.

This is problematic because codecs will typically output bits instead of bytes so you can't simply have a byte-sized symbol propagate through the stack just like that --- you'd have to implement tons of boundary checks, and ifTrue:/ifFalse: will kill your performance on top of being ugly. So what to do...

The other day I found a very nice solution. Typically, a simple codec will have source/destination streams. Initially, complex codecs will look that way. When you add another simple codec to the complex codec, it goes from

sourceStreamcodec 1destinationStream

to

sourceStreamcodec 1forwardingBuffer 1codec 2destinationStream

Forwarding buffers are growth-limited "write streams". Upon reaching their size limit, they will #flush by sending #runOnSourceStream: self to the recipient and then reinitialize themselves. Thus, flushing is performed only when it's necessary.

Flushing on demand to finalize the destination stream is accomplished by cascading the message #flush across codecs, forwarding buffers, and streams. No special case logic needed.

Sunday, May 29, 2005

So how come ten is a good number and I haven't written a word about soccer star Diego Maradona yet?

Maradona was the best in a time when soccer was much more geared towards destruction of play than ever before. And he did the most astonishing things while having fun. Most importantly, he is an artist. I find great pleasure in watching him perform.

But, unlike Pelé, Maradona is quite unheard of here in the US. Maybe it's because of his trouble with addictions --- but wait, isn't there a lot of noise about widespread doping in US flagship sports, haven't popular US track and field athletes been banned for life? And the reactions look too much like CYA and business as usual. So Maradona's lack of popularity can't be a chemical problem and there has to be another reason.

Maradona was born extremely poor. His father worked 14 hours a day to feed a numerous family. He had something like 8 brothers and they all shared one of the two bedrooms in his house. People in his neighborhood often went without eating. There was only one drinkable water faucet in his block.

From this place he was thrown on the throne of soccer by means of a powerful kick in the rear end, as he says. With his background, he had to deal with his country believing they were the best because of him, and with being the best in Europe with the expectation that he'd behave with Victorian gentleman manners. Somehow that didn't quite work.

For example, when he met John Paul II, the pope gave rosaries to his relatives. When it was Maradona's turn, the pope told him he had "something special for him" --- and gave him another rosary. When Maradona checked, it was identical to the other rosaries. So he went and asked the pope "this one is the same as the others, how is mine special?". No answer.

Maradona also saw all the wealth in the Vatican, and how its residents don't go through hardship. But especially after being criticized about spending too much money in his own catholic wedding party, he answered back commenting on all the money the Vatican would spend on Africa trips where the pope would kiss the ground, avoid leaving any money for kids starving to death, and still keep the gold hanging from the roof back at home.

Maradona is very direct and he usually expresses himself without political correctness, which is just another euphemism for being hypocritical and exhibiting amoral behavior.

Since Maradona is a human being, he has the right to express himself in any way he wants. It's amazing that I have to write the previous sentence, as if it wasn't obvious. But it may make you realize how much public figures don't do that because they'd rather shy away from the enormous responsibility that comes with standing for your beliefs.

Even though he was not specifically prepared to do so, Maradona had the presence of mind to take a huge responsibility on his shoulders. In my opinion, he outperformed the vast majority of us at that.

I wish we had more courage to ask difficult questions. But in this country that kind of behavior is not fashionable. That is probably why Maradona, like so many others, is unheard of in the US.

Saturday, May 28, 2005

I realized how much of what I do when I write stuff for myself is like painting, or composing music... creating something in an artful way.

I don't understand why people would prefer consuming instead of creating. Because see, instead of creating new things, that darned grass is always tall and you need money to buy a tractor to cut those 50 acres of grass you bought on credit. Your 5000 sq ft home is not full enough of things craving your attention, so you need to spend your weekends shopping and doing forklift work. Cars are not enough, SUVs are not enough, you need to commute using a moving truck just so you can carry 100 of your coworkers plus all their shopping plus the new team equipment for that soccer match...

I don't buy any of that. I want to create. And today is an excellent day to implement Life in a way I've never seen it done before.

Sunday, May 15, 2005

Today I read a bunch of stuff that I found extremely disturbing. Deep packet inspection is helping ISPs turn the Internet into a dumb-terminal world. In short, you pay tons of money to have the "privilege" to pay more tons of money to get "stuff". And that's it. Nothing can be shared, and nothing is for free in this "golden" environment. Plus you get zero privacy and all your communications are spied on.

What really pissed me off was that a company offering deep packet inspection hardware targeted free VoIP software Skype in its marketing literature. In other words, bandwidth spent on Skype is a lost revenue opportunity. So what you as an ISP should do would be to purchase deep packet inspection hardware to block Skype traffic, then turn around and sell VoIP services to your customers at a higher price.

This is utter control freak bullshit and I am at a loss to explain the indifferent behavior of conslamers (a mixture of conned consumer slaves).

Clearly the problem is the greedy ISP serving big corporation interests to turn your life into a Truman Show. So all of that needs to be removed from the picture. And before you say "where am I going to get my email address and stuff then?"... remember how the Internet started? Few machines connected via TCP/IP, a resilient transport protocol? New York is a huge city with over 10 million people in it. There should be at least 1 million computers here. And there's this thing called wireless networking.

If you didn't have to deal with your ISP, wouldn't you encapsulate your home network, provide an interface to your sorroundings, and let TCP/IP do the dirty work of properly routing traffic in the same way your ISP does? Sure it may not be as efficient at first, but it would be free. And for privacy purposes, you could simply run something like Freenet on top of this free network so you get your privacy back in a medium that would be harder to contaminate with unsustainable dreams of endless exponentially growing profits.

P2P applications such as VoIP could easily run on this, without a single telemarketing call because of RSA keys.

An issue would be how to connect large geographically disjoint parts of the network. I think ISPs would like to have any traffic whatsoever by then. But maybe that could also be unnecessary --- you could run your wireless hubs in the car, in the truck, in the RV park, in your privately owned land thanks to donations signed and processed thanks again to PGP, etc...

So you'd keep your ISP for accessing dumb-terminal stuff. And for the really interesting stuff, just go wireless and remove the middlemen completely from the picture.

There has to be something wrong with this, because if it could work I am sure it would have been done already. Maybe it exists, and because I am still connected to the dumb-terminal internet I do not know about it! If it isn't, I wonder what is the limiting factor, besides bandwidth and FUD.

I hope that somehow we will learn to dismiss our conditioned fear reflexes soon. I think life is too short and extraordinary to waste on a quest to amass a pile of papers other people judge as valuable on grounds as weak as they get.

Saturday, May 14, 2005

Why is it that factorial is usually implemented in terms of self * (self - 1) factorial? This illustration of the call stack hides better ways to perform the same task.

No, the iterative version is just as bad. The fact is that multiplying two large integers is most efficient in the n^2 case. Assuming the complexity function depending on the two arguments is more or less continuous, multiplication of large numbers is most efficient when the two numbers are of similar size.

So what's this deal of n*(n-1)*...*2*1? In fact this is the worst way to calculate factorials!

A better way is to first multiply pairs of numbers, 1*2, 3*4, 5*6, ..., (n-1)*n, then multiply those results in pairs, and so on until you have the answer. Now you'd say "ah, the setup time kills you". Not so. In fact, even for relatively small numbers, the difference is huge.

In VisualWorks, large integer performance is very good. In this old machine, calculating 10,000 factorial takes 1.6 seconds. That number is 14,808 bytes long. Not bad at all!

Enter (1 to: 10000) aggregateInPairsUsing: #*. It takes less than 0.3 seconds to calculate the same number. Even with setup time and recursion of indices inside (1 to: 10000), it is five times faster! 1000 factorial is three times faster, 500 factorial is twice as fast. The break even point is near 190 factorial.

This reasoning also applies to other similar problems. For example, string concatenation is also most efficient when joining strings of similar size.

I had been wondering how come Smalltalk blogs don't talk a lot about Smalltalk. Today I got an answer: the good thing is that if you use Smalltalk, you don't need to talk much about it. And it's totally right because Smalltalk really stays out of your way. Can you imagine the commercial?...

You had it writing syntax sugar for your coffee. You have a chance: it's you and the problem, one on one. Do you have it in you to beat it?

A while back I found a deficient implementation of #match:. Blaine and I decided to write a nicer one. At first, we went over the basic idea of why #match: works. We found that the pattern can be broken into pieces, like this:

[*][...][*][...][*]

Then we decided to do something that sounded loudly inefficient. We decided we'd parse the pattern, break it up into two kinds of tokens, and then let another object match tokens to the string. The tokens would be either the star, or stuff that needs to be matched exactly.

Once we were done, we were surprised to see that on slower hardware, parsing and then matching was ~3 times faster than the deficient version. What was killer was that since each token knew its identity there was no need for, you know, aCharacter == $* and that kind of stuff. No ifTrue:/ifFalse:.

Then we ran the profiler and found that because we supported using ? inside the textual tokens to match any single character... read this: about 50% of the time was spent in Character>>==.

Even though this could seem like a good thing, we were not happy. So we subclassed our abstract token class and created PoundToken (the others were StarToken and MatchToken). We changed the parser, blablabla, and even though the parser did more work, the thing went twice as fast!

Insanely fast, actually. It was only 3 times slower than VW's #match: implementation.

After we wrote the parsing version of #match:, I inlined the whole thing into a single method to see how it would compare to VW's code. It was about 2% slower, but the code was much more readable.

Lesson 1: it is much faster to avoid checking + ifTrue:/ifFalse: to begin with by creating a class --- even if the check is for identity.

Lesson 2: don't inline your algorithm 9 times before you get it right, and then leave the non-inlined version around.

Monday, May 09, 2005

Well, that's the plan. I will attack the Minesweeper problem with Laws of Form and the game of Life.

I have a feeling that a game of Life approach will do better than my previous linear algebra attempts because of its inherent ability to encode multiple interacting signals in the same space, as opposed to a single signal trying to become closed and distinguish the solution from everything else.

Blaine lent me a book on the game of Life, and as I was reading I stumbled on this paragraph:

The best way to get the feel of Life is to use a checkerboard or a sheet of graph paper. Conway and colleagues played with black and white checkers. The black checkers marked on cells. Conway identified cells due for a birth and placed a white checker on each. A second black checker marked on cells due to die. When all births and deaths had been identified, the double black checkers were removed from the board and the white checkers were replaced with black checkers.

Sunday, May 08, 2005

The reference finder was a good exercise to put forms and distinctions to use. I was wondering what's my next project going to be though. I think I settled on doing more work on Minesweeper, about which I see now that forms/distinctions could play an important part in.

Ok, the first offender was growing the visited form storage set. Adding a pregrow configuration for the ObjectRegistry took care it --- I used 524288, which of course is a power of two.

Second offender, the empty object check. Adding a special check for nil helped. Then, rewriting it as fast-outs instead of using #or: logic made it even faster. It would be even faster if the code didn't have to check if objects have instance variables or indexed fields each time, but a complete solution is outside the scope of this optimization session.

Third offender, using a method which had boundary checks when they were not necessary.

By this time, the worst offender is the empty object check --- because it's full of ifTrue:/ifFalse:. Hmmm... I will have to resort to heavy artillery to fix this!

My previous approach had just a few ifTrue:/ifFalse: in it, and was relatively fast for what it was doing. But the form/distinction approach takes less than half the time!

I haven't even tried to make it go fast yet. What I learned in the previous version is that one of the issues is instance creation --- the form/distinction scanning model relies heavily on it. If it is also an issue with this version, a discarded/free list of forms should help curb that.

As you traverse the object graph, you need to avoid scanning the same object twice because otherwise you're doing extra work. In other words:

Calling a name twice is the same as calling a name once

That's one of the axioms of Laws of Form! Argh! This stuff is everywhere!

So the typical approach is to keep all the objects you visited in some sort of identity set. The issue though is that you typically need to send two messages to it per object you visit. The first one is #includes: (to see if you called a name once already), and the other one is #add: (to remember a name you just used).

But if you look at how that is implemented, you'll see the collection is doing the same work twice! I had enough of that and implemented #ifExcludedAdd:. The collection will answer whether it added the object or not, which is the two pieces of work I need for the price of one.

I wonder what other things the profiler will tell me now. But first, time to package & publish to StORE...

Aha... the code in my previous post was very nice but on second inspection, breadth first traversal proved to be more complicated than that. In fact, the complication was so great that I deleted two classes that were getting in the way.

I had separated traversing a form from the stream on the form, so I had a 3-layer stack: traversal on a stream on a form.

But the traversal is itself a type of stream, so it's silly so have it stream a stream. So FormStreams are gone.

I realized how limited is the scope of traversal in typical collection streams. So for form traversal (which is pretty much graph traversal), stream feels like a bad name because it brings with it connotations of streaming arrays or serial ports.

So now the design is a 2-layer stack: traversal on a form. The cost of this is that the traversing methods are longer. On the other hand, there's no position keeping necessary since forms already provide linkage. Not bad... let's see what some refactoring can do.

So I'm here writing form traversal stuff for my reference finder, and I think I wrote a piece that deserves to be polished further. Take a look.

(self passenger shouldCrossInto: self stream peek) ifTrue:

[

self stream cross.self takePassengerOnTraversal.self stream uncross

]

Isn't that absolutely beautiful? It definitely needs to get better though. I still don't like how obvious the control loop is, and I think there has to be a way to get rid of it completely and let diving through be a matter of stream position and such. More to come!

Monday, May 02, 2005

So why is it that, tonight, I come back from work and I find that channel 77 is still displaying the win95 blue screen of death?

I am feeling more and more unwilling to ask difficult questions, such as the ones below, because I am afraid of what the answer will be.

Why should I believe this elevator is safe?

Why should I trust this tunnel to hold the river outside while I am inside?

Why should I think it's ok for nuclear reactors in this country to leak?

Why should I pay attention only to the loss of 7 astronauts when it took more than 50 shuttle flights in which foam damaged the tiles for the foam and the out of focus cameras to become an issue?

Why should I justify 300 people to be worth more than 3,000,000,000 others?

The other day I found that, roughly speaking, 40% of kids do not finish high school. I demand to know how all these future worksumers will be able to determine what is good for us when we are old and don't have a job anymore --- and how will they be able to maintain what we have built without blowing this planet apart.

Incidentally, did you see anybody being held publicly accountable for the foam decisions? If that runaway bride is to be sued for causing such a waste, why didn't those negligent managers follow the same path of public ridicule, exposure and court rooms? What about other people who have done even more damage?