making hard things easy, the impossible, possible

As computer programmers, software developers and human beings working with numbers, we often take our base 10 decimal system for granted. But what if, as a society, we were starting over? If we looked at all the choices and all the advantages and disadvantages of each, would base 10 still be the choice?

Past cultures have indeed used other systems. We likely settled on base 10 because of the practicality that we have 10 fingers to use in assisting our understanding and manipulation of those numbers.

In this age of computers and computation, it might seem that a computer like base 8 or 16 could be a good choice too. But the base 12 system seems to have a lot to offer.

Base 12.

The base 12 also known as the duodecimal or dozenal system has a lot of supporters who would support a move to a base 12 system.

For familiarity, we do have some systems that use 12. We have 12 hours in the day, 12 months in our year, 12 in a dozen, 12×12 (144) in a gross, 12 inches in a foot, etc. For the musically inclined, there is 12 TET, twelve tone equal temperament.

What is attractive about the dozenal system? To start with, the number twelve has 6 factors, namely 1,2,3,4,6,12. Base 10 has four; 1,2,5,10. So what does that do for us? It makes many fractions and divisions simpler and easier. For example 1/3 in base 10 has a repeating representation of 0.33333…..

1/3 in base 12 is 0.4, ¼ = 0.3. Of course it is not better in all fractions; 1/5 = 0.24972497…… so it is not better in all ways, just more.

Hold on! You might think. Base 10 lets us do great things like moving the decimal with great effect!

The good news for base 12 is that this benefit is not unique to base 10, and would work just fine in other systems including the dozenal.

On the practicality of having 10 fingers, dozenal proponents point out that (looking at your palm) your four fingers have 12 segments (phalanxes) in which to work with, which can be indexed by the thumb.

MapReduce is a popular and effective technique that’s used to apply concurrency to problems that often involve large amounts of data, in order to improve performance.

Hadoop is a popular implementation of the MapReduce model or technique.

MapReduce is named after the functional programming functions map and reduce. The map function applies a function to each element in a list, and reduce aggregates or combines the results. MapReduce can distribute the Map work to many machines, and then Reduce summarizes the work into a final answer.

MapReduce and Smalltalk

So how would this work in Smalltalk? To start, let’s determine what the Smalltalk equivalents to map and reduce are.

The collect: method can be used as a Smalltalk equivalent of map, since it can collect the result of a block applied to every element in a collection.
The fold: method (or inject:into: ) can be used as an equivalent of reduce, since it can reduce the results to a single object (simple examples: finding the maximum, minimum, or sum value).

Pragmatically though, you might also think of map as mapping out the work (to be performed concurrently) to multiple cores or machines, and reduce as combining or summarizing the results from the map work. If you are following the pattern it doesn’t matter if you use collect: or fold: specifically.

The purpose of Cincom’s MatriX framework is to simplify concurrency. The MatriX framework allows you to easily make many linear solutions concurrent.

The example below shows how to create a solution to a problem, and then use MatriX to create a mapReduce-style solution using the same code with minimal alterations.

A Simple Example

Let’s say that we had a long list of documents (files) and we wanted to get a count of how many times each word occurs in the set of documents. In Smalltalk, we would want to collect the word counts for each file and then combine or fold the results into an aggregated summary. So how might we do this in Smalltalk?

Let’s start with some basics.

A method to return a list of filenames to use for counting word occurrences

A method that parses the file into tokens (words)

A method that, given a file string, returns a count of the words found in the file

A method that summarizes (reduces) the word counts into one set

A method that provides a local solution using the above methods

We can test and debug by first running it locally, and then move forward distributing the work.

Below are the methods for the above basics, respectively:

Note: Be sure to change the dir in the myFiles method to a location on your machine.

So now that we have this running, we want to distribute the workload to allow the files to be processed and words to be counted, concurrently. The word counts will come back to a central place (our main image) where they will be summarized.

We have a new type of collection, available in both our Cincom Smalltalk products, ObjectStudio and VisualWorks.

The name of it is …. (drum-roll please) “Treap” ….. What?!

I know, it doesn’t exactly roll off the tongue.

So what exactly is a “Treap”?

The name Treap is a contraction of the names “Tree” and “Heap”, and is a type of balanced binary tree. It is not new as a computer science data structure, but as a new addition to our products, you may want to know where and how it can be used effectively.

What is it, and how does it compare with more traditional Smalltalk collections?

Treap is more of a hybrid collection, which makes it very versatile and useful, in the right context.

It has fast keyed lookup (like a Dictionary).

It can use ordered access (like an Array or an OrderedCollection) with enumeration, but the objects are ordered or sorted based on the key.

It also allows bidirectional access to the ordered list.

Treap is structured as a balanced binary tree of nodes. Each node holds the key and value objects and are also linked as a bi-directional linked list.

In what circumstances might a Treap be useful?

If you are using a Dictionary for lookup and also enumerate through the sorted keys of the dictionary, a Treap might be a better solution, since it already maintains a sorted order based on the keys.

Alternatively if you are using a SortedCollection, but need faster random look-up speed, a Treap might just be the best choice.

Another use is if you have to find an object (quickly), and then access the objects just before or after it. In this scenario you would look-up the node, and then get the prior or following node with #previous or #next respectively, to navigate the linked list forward and backwards.

Are there any drawbacks?

Just like using any Collection or data-structure, there are advantages and trade-offs to using each one. For Treap, the main drawback is probably the space and overhead to have and maintain the nodes and their links. My suggestion would be to only consider using it only where you are getting the benefit of the order and look-up speed, and possibly navigation.

Finding Treap. Treap came into the products as a means for supporting the performance of Text2. You can use it as Text2.Treap for access. In the future Treap may move into the Collection hierarchy where you would expect to find it.

Performance: In some simple bench-marking, I find look-up speed using Treap comparable to a dictionary. If you then have to enumerate through the elements based on the sorted dictionary keys, using a Treap is significantly faster, since it essentially skips the sort by already maintaining that order.

In the meantime, I have made some additions and tweaks for Treap that you may find of use and interest. The methods add in some of the enumerators for a collection class (#do: #select: #collect: #keysDo: ) that were either missing, or could perform better. See the methods (for the instance side of Treap) below.

I look forward to any feedback, particularly if you have a good use for Treap in your applications!

We have some very exciting new work in our development builds, a new source code editor, which I wanted to exercise.

Around the same time, I saw the article linked above about the game 2048. It is a simple four by four grid with numbers that you slide around, trying to add them up to …… 2048!

I downloaded a free version of 2048 onto my iPhone, and there is also browser based version here. I checked it out and it looked pretty interesting, so I decided to implement a simple version of it as an “afternoon app”. An Afternoon app is a simple time-boxed implementation. Since Smalltalk lets you do a lot in a short amount of time, it is a great choice.

I built it and published it on the public repository. You can find it as TwentyFortyEight.

It is an MVC designed app with these three classes.

TwentyFortyEightApp

TwentyFortyEightModel

TwentyFortyEightView

I reverse engineered it by observation. I tried a couple simple ways to process the moves, then refined them. My #process: method takes an array of the four items in a row or column, and looks like this:

process: array

“process an array of the grid”

self compress: array.

self combine: array.

self compress: array.

^array

#compress: slides the elements down to the far end of the array.

#combine: adds adjacent cells of the same number, one pass

The second compress is required in situations where there are two combines in the one pass.

Since you can move four directions, up, down, left, right, it gives #process the row or column for right, down respectively. For left and up it reverses the array for #process, and then flips it back for integration into the grid.

I made an assumption which turned out to process incorrectly in some circumstances. Rather than fix it, I left it as a challenge to find (not too hard, not to trivial) and change for those who want to tinker with the application. There is another needed fix (hint: the new value that appears) which is a very easy fix.

How can the implementation be improved? Other than the needed fix mentioned above, the application does not detect when the game is over, or allow you to restart it. These are easy fixes.

Another relatively easy improvement would be to add the running score. When numbers are combined, their value can be added to the score tally.

A big improvement that would take some more time would be to add animation. This would probably require model restructuring or changes to communicate in terms of the moves in order for the view to animate them. Visually, adding animation would be a big improvement. I welcome any developers to give it a try and share their results. I may give it a shot myself.

What else could you do? You could try using different algorithms for auto-playing the game, in order to discover or test heuristics in order to maximize your score.

If you have any more ideas or any questions, I would be happy to hear from you. Reach me at the email below.

One of the biggest advantages of using Smalltalk is clear expression. Because of the syntax and message passing, a non-programming domain expert would likely understand the intent of well written Smalltalk code that provides a solution in their area of expertise.

Over time Smalltalkers have found ways of integrating methods into the base libraries with the primary intention of using them to create clear, terse code. Often a developer can peruse a Smalltalk class and discover methods to use in talking to their objects. Occasionally, there are clever enhancements that are not obvious without a good example.

Some recent enhancements in enumeration and sorting are good examples of this.

Enumeration

Some enumerators like select: collect: reject: are widely used and fundamental

evenNumbers := (1 to: 100) select:[:ea | ea even].

Can be written slightly shorter as:

evenNumbers := (1 to: 100) select: #even.

squared := (1 to: 100) collect:[:ea | ea squared].

Now can be written as:

squared := (1 to: 100) collect: #squared.

Under the hood

How was this done? You might think at first that the enumeration methods were modified to accommodate this. They were not, which is part of the beauty of this solution. Instances of Symbol can respond to #value: used by enumeration, and they send the message (represented by the symbol) to the receiver.

Sorting

Sorting is something fundamental to many algorithms and applications.

Sorting can often be as simple as:

sortedNumbers := myNumbers asSortedCollection.

When the objects in the collection know how to compare themselves.

For more specific sorts:

employees asSortedCollection:[:a :b | a lastname <= b lastname].

And even subSorts:

employees asSortedCollection:[:a :b |

a lastname = b lastname

ifTrue:[a firstname <= b firstname]

ifFalse:[ a lastname <= b lastname] ].

This can be done in a new, simple and clear manner:

employees sorted: #lastname ascending.

employees sorted: #lastname ascending, firstname ascending.

What if you need a calculation in a block?

emloyees sorted: [:ea | ea …. ] ascending

What I like about this is that this technique is far terser, and the intentions of the simpler code are clear and simple. Any drawbacks? You need to understand this by example, and will not get the big picture simple by browsing the #ascending method in class Symbol.

Bottom line:

Add this to your developer techniques list and your sorting code will be shorter, simpler, clearer, and more easily understood …. and isn’t that what Smalltalk is all about?

For developers finding and solving performance bottlenecks can be highly productive with the right tools and knowledge, and it can be a very rewarding part of application development

Most developers find the performance of Cincom Smalltalk to be more than adequate, especially when compared to other dynamic languages. We have a high performance Jit’ed (just in time compilation) VM. But what if you need more?

We take the performance needs of our customers seriously, and address it on a number of fronts. Here are some notes on approaches for finding performance in Cincom Smalltalk (ObjectStudio & VisualWorks):

1) Big performance gains are done by changing the algorithm and approach. Smalltalk is excellent at letting you see the big picture and rearrange structure to change algorithms, seeing the forest through the trees if you will. Lower level languages are far more difficult to do this since you are much more locked in to an approach.

2) If there is a small time critical section, you can write it in C and call it from Smalltalk.

Many think they will do this, but most end up not needing to when performance is better than expected.

3) We have Polycephaly, a framework that lets you easily leverage multi-core processors. Many customers have adopted this, and have gotten 2-5x throughput improvements. Polycephaly gives you 80% of the benefits, with 20% of the difficulty. We have Polycephaly II being introduced in the upcoming release (preview) which lets you include remote machines.

4) We have 64 bit vm’s which let you utilize a very large object space. This allows some applications to keep all its data cached in object memory, boosting performance significantly.

5) It is possible to use VW with CUDA GPU acceleration for number crunching. Modern GPU’s can give supercomputer like speed to number crunching.

6) We are continuing to incrementally improve the performance of our VM’s. Most recently we have improved garbage collection performance in our VM’s. This is a staid area of the VM, yet we continue to find ways to improve it.

7) We have performance profiling tools to pinpoint where time is being spent, so you can focus on areas that will give the most rewards. Research has demonstrated that developer’s guesses as to where time is spent is usually inaccurate, which is why these tools are so valuable. Our profiling tools let you find where lots of time is being spent so you can focus your efforts where it will make the most difference, or get that last increment in performance to give you the edge.

Looking back in history, Xerox PARC had bright minds and lots of money which they used to invent many aspects of modern computing. The VM technology they created is very sophisticated, and is still a significant barrier to entry in the dynamic language field. Sure the technology has been out for quite a while, but typically only companies with strong resources (think google v8 vm) have been able to do something with the sophistication of our vm technology. In the meantime, we have not sat on our laurels, but have continued to refine and improve the technology.

Last month I gave a presentation at the STIC conference with one of our engineers (Dirk), on how to build software to support a business, using Cincom Smalltalk. It went very well and we got some very good feedback. I think the demo shows pragmatically how CST can be used to build robust software. I’ll post links when the video is available. For now, the slide as available: