Sikuli: Create “smart macros” based on screenshots

dotTech has discussed many macro programs, such as Easy Macro Recorder[1], Do It Again[2], etc. While they all have their particular advantages/disadvantages, the one common problem among them all is the inability to create macros based on program/window recognition; all the macros are based off screen positioning, so they are very susceptible to error if the environment changes. For example, if an icon happens to be in the wrong place or a program window happens to be resized the macro is completely thrown off. In other words, none of the programs discussed in the past have been able to create what I am dubbing “smart macros”. Sikuli, on the other hand, is a different story.

Sikuli is the product of a research project undertaken by a bunch of smart people at MIT (Massachusetts Institute of Technology). Sikuli is a program that can create “smart macros” by leveraging the power of image recognition. Confused? Maybe this example will clear things up. Let’s say you want to create a macro that launches Internet Explorer and goes to dotTech. To do this, you would tell Sikuli to launch Internet Explorer, you would take a screenshot of the address bar in Internet Explorer, and you would tell Sikuli to type in the address bar:

[3]

The cool thing is you do not need to worry about if you have Internet Explorer running or not or how the Internet Explorer window is positioned or the size of the window. You just need to make sure you take the screenshot of the address bar properly and make sure you input the proper commands in the Sikuli script; Sikuli handles the rest for you. Look:

As you can see in the video, after running the macro the first time, I made the Internet Explorer window larger and launched the macro again. Where most other macro programs would fail, Sikuli was able to load dotTech in Internet Explorer even though the window size was different.

With Sikuli you can pretty much automate anything that has a graphical user interface; your macros can be as simple as the one I demonstrated above, or as complex as playing a video game (I did not create the following video):

However, the intelligence comes with complexity; Sikuli may be semi-intelligent when compared to other macro programs, but it is also harder to use. Other programs, like the previously mentioned Do It Again, allow you to record a macro and then play it back whenever you want. With Sikuli, on the other hand, you have to “program” the macro. I put “program” in quotes because the programming involved with Sikuli is fairly basic; you really only need to know a few functions and there is a help guide you can refer to if you are lost. Plus, it isn’t all “programming” – a lot of it involves simply taking screenshots.

In that regard, Sikuli’s “Command List” (the one located on the left side of Sikuli IDE’s main program window — see very first screenshot above) lists all the commands you can couple with screenshots. You simply need to click on the command you want to use, take the screenshot (Sikuli has a built-in screenshot function), and Sikuli will insert the command + screenshot into your macro. The non-screenshot commands you need to type in yourself; a full list of all supported functions is available here[4].

If you feel intimidated by having to program with Sikuli, watch the following demo video – that illustrates step-by-step how to use Sikuli – before you make up your mind about Sikuli (I did not create the following video):

As you can see, yes it is harder to create macros with then other “dumb” macro creators, but it isn’t all that hard to use and the time spent to learn and use Sikuli is well worth the “smart macros” it can create.

Now, as good as Sikuli is, it isn’t perfect. It is still in beta so there are bound to be bugs; and Sikuli depends fairly heavily on screenshots, so if the look of a program/window is changed, Sikuli will bug-out and show an error. In fact, a program interface doesn’t even need to change to throw off Sikuli – it can just be transparent. While I was testing Sikuli I learned that transparency is Sikuli’s Achilles’ heel. If the background object behind a transparent window changes color drastically, Sikuli has trouble matching the screenshot you took while you created the macro with the part of the program interface it is looking for while running the macro. For example, when I was creating the Internet Explorer macro there was a blueish background behind the Internet Explorer window so that is what showed up in the screenshot; when I run the macro, if there is a black object behind the Internet Explorer window (i.e. because of the transparency, the black shows through), Sikuli has trouble finding the address bar.

Furthermore, currently Sikuli is an academic project; so unless the developers of Sikuli think they can turn it into a business and break away from academia and start selling it, don’t expect very quick updates to the program.

In the end, though, despite its problems and downsides I have one phrase to describe this program: Bloody brilliant, mate! You can grab Sikuli from the following links: