Posted
by
timothy
on Thursday January 21, 2010 @04:21PM
from the mind's-eye dept.

coondoggie writes "Computer users with rudimentary skills will be able to program via screen shots rather than lines of code with a new graphical scripting language called Sikuli that was devised at the Massachusetts Institute of Technology. With a basic understanding of Python, people can write programs that incorporate screen shots of graphical user interface (GUI) elements to automate computer work. One example given by the authors of a paper about Sikuli is a script that notifies a person when his bus is rounding the corner so he can leave in time to catch it."
Here's a video demo of the technology, and a paper explaining the concept (PDF).

Sounds like the Microsoft FrontPage of coding software. Why do with text what you can do with pictures? And we all know FrontPge went on to become the defacto standard for web development....that had to be fixed by an real web developer later.

But on the upside, dedicated FTE's for "reinstalling corrupted FrontPage extensions" did skyrocket during the FrontPage era.

Actually I think this is more interesting than either FrontPage or LabView, because it allows you to script GUI apps that were not designed to be scriptable. Even for apps that are scriptable, it provides an increase in user efficiency as you don't have to learn the API commands to do things that you already know how to do in the GUI.

How useful it is will depend on how well the image pattern matching deals with corner cases. Consider you need to click on a text field, however there are many identically loo

>And we all know FrontPge went on to become the defacto standard for web development....that had to be fixed by an real web developer later.

Do you want to democratize technology or just have it controlled by elites? Non-techies want to do things like scripting and web design without paying a professional, the same way they want to fix things around the house or fix the car. When it comes to small or easy jobs, a non-expert can do just fine. Why should we piss on the DIY'ers because they dont have a Master's degree in CS? Frankly, a lot of computer stuff is pretty easy and paying someone is ridiculous.

While Im certainly no fan of Frontpage, I feel that it wasnt much worse than Mozilla Composer or other WSIWYG html composers.

Yeah, that's real easy for a programmer to say. Ever used a brownie mix? I'll bet a pastry chef would say, "I'd like to see people who wish to bake brownies actually learn how to bake brownies properly." Tools like Sikuli are the programming equivalent to brownie mix. It's easy gratification. (... or at least easier than learning to capture part of the screen and then do fuzzy image pattern matching on it.) If I were a very casual, light duty programmer, this would be pretty helpful sometimes.

Actually brownies (or a lot of food) isn't that hard to cook if you follow the instructions and people would probably be healthier if they would actually learn how to cook rather than rely on chemically pre-made foods. It's no surprise some of the fattest nations are full of people who can't cook.

Elite or competent? I'm all for people tinkering with software in their spare time the problem is people who arent qualified start thinking *everything* in software development is as simple as the tiny little things they are doing. Then we end up with Visual Basic(the birth of Visual Basic came with the motto "its so easy you know longer need programmers... managers can write the code"... that worked out well).

Then we end up with Visual Basic(the birth of Visual Basic came with the motto "its so easy you know longer need programmers... managers can write the code"... that worked out well).

From a business point of view, it actually did. People used VB, and particularly VB macros in Office, to do things that resulted in a lot of dollars flowing through a lot of organizations. Yes it did eventually need to be changed out, but in it's time, for it's purpose, you can't really fault it. It truly did work.

Why should we piss on the DIY'ers because they dont have a Master's degree in CS? Frankly, a lot of computer stuff is pretty easy and paying someone is ridiculous.

Thousands of cars on cinderblocks and dozens of houses with flooded basements are testimony that sometimes, paying someone is the only thing that isn't ridiculous. There's DIY, and there's "OMG you are SO in over your head." Anyone whose software development abilities are so stunted that the "advancement" outlined in TFA would help them is absol

There is a minimum level of skill and talent required to do anything. The only thing that happens when you make something "so simple anyone can do it", is a minefield of crap software. Instructing a computer to do something requires the ability to think abstractly, and organize/plan with an orders of magnitude more sophistication than "Do I want eggs or pancakes for breakfast?". Arguing that the 'elites' are pushing down the 'DIYers' is disingenuous. A real DIYer will overcome the learning curve of what

Do you want to democratize technology or just have it controlled by elites?

If I have to be in charge of cleaning up their mess then I would prefer that they leave development to the professionals. I think that is what the parent is getting at. We professionals are tired of rescuing dabblers who get in over their heads because their "easy to use" tools are just powerful enough to get them into trouble, but not powerful enough to get them out. If people agree to be responsible for their own results, good or bad, then I say let them do as they wish. Unfortunately, it never seems to w

HTML and CSS is pretty easy to learn, at least enough to produce the shit people produce with front-page.

Sikuli is good though it would appear it takes control of your computer so really it's pretty useless aside from dong batch repetitive jobs. You can probably get around that by learning more Python but the demo didn't interest me but I'm sure loads of people will love it for automating tasks.

Front-page on the other hand was awful and produced loads of awful sites that unfortunately affect more peop

Just because code is text (and can be readily generated) doesn't mean that anyone with notepad should be able to write it.

Wrong. It means exactly that. BASIC was a language meant to make it easier for people to program, and it was THE introduction to programming for many people. Now, am I arguing that BASIC is the right language for any given task? no. I haven't coded in BASIC for 15 years. But making the barrier to entry easier can only be a good thing for attracting new blood.

Watching the YouTube demo, I immediately thought of how basic this is compared to AutoIT's functions, and even the quick record function is faster to "program" with than this screenshot function.

It says it can tolerate some changes, but what if there is a completely different visual theme installed? What if a drop down is not on the same item it was when you made the script? AutoIT can take care of this by reading the underlying GUI code to allow for these kind of things. As someone who has been automating

I wonder how Sikuli copes with "click page down till you find the icon you need to actually click on".How about if the stuff you click on might look rather different each time? e.g. the IP address might not be 0.0.0.0 but something else the second or third time around.

And what if the stuff you need to click on can only be identified by text or an icon that you don't click on - you actually click on the stuff to the right (or left or whatever) of it. This one isn't a biggie - it shouldn't be too difficult to

The MS test crap in the latest versions of VisualStudio do it as well, and they'll be happy to find a button (if its a standard control) to click on using other data rather than mouse coordinates as well.

As a professional test automator, I'd like to point out that automation by image recognition is the method of last resort. The #1 concern in GUI automation is maintainability, and image recognition is the least maintainable method of automation there is short of recording mouse coordinates and keypresses. If you change your theme, if the developer rearranges the controls, if any text is changed, the script is broken. The idea of using image recognition for web page automation is right out.

I am currently working on automated GUI tests for an application, and Sikuli looks pretty great -- even when compared to enterprise level automated GUI testing tools costing in the order of thousands of dollars per user licence.

Some of the commenting below on maintainability problems seem pretty superficial. For example, to ease maintainability you could build a framework abstracting GUI component images from regression test scripts. For example, you could assign a screenshot as a variable and then refer

I just tried it out for an hour or so with our web application, and it seems to be doing it's job. One thing that it didn't manage to do is click somewhere relative to the matched image. It always seems to click in the middle of the image, which is annoying when you want it to click one checkbox out of many based on it's preceding label.

Perhaps it's possible to use some kind of nesting, so you could try to find the image of the checkbox inside a previous match that includes the label, but I didn't find out

Yes, but that kind of defeats the purpose of using image recognition so you don't have to care about the exact layout of your application. Inserting a new control on the page could break the test if you used tabs

I was thinking the same thing. Where this would be really handy would be in applications that paint their own windows and don't expose the gui handles for AutoIt to latch on to. Specifically, this would work great for Great Plains or online poker clients.:)

- selecting and clicking on see-through buttons (the background will change too much)
- the program access to the actual game for seeing, clicking and typing- the game's anti-hack detection / counter-measures- macro playing lag (see video)

I've done a fair bit of mindless semi-afk mining during my time playing eve, and never had much trouble with suicide attackers, can flippers, or other such stuff. I'd imagine that taking the usual minimal precautions like parking in a dead end, low traffic system would work relatively well.

Depending on how robust sikuli is, it might be possible to make a mission running macro, which could be even safer than blasting rocks (with the right ship setup, and such). Barring that I'd likely use sikuli on a secon

Two groups that are so utterly disconnected from the real world that they both have no idea why their favorite toy hasn't taken over the world even those its the simplest, most efficient, easiest to use, most feature rich (insert whatever here) on the planet.

Computer users with a rudimentary skill who do not have a basic understanding of Python can always build a Python programming AI in Lisp (or at least that's what I gathered from the MIT docs I browsed) and thus save themselves the trouble.

If a friend wanted to learn just enough programming to do a few light chores, what would you recommend? Python is arguably one of the easiest languages to learn. Randy Pausch used it for Alice [alice.org], which has been successful for teaching middle school girls how to program. So if "computer users with rudimentary skills" means rudimentary programming, then that works for me.

FTFA: "Sikuli -- which means God's eye in the language of the Huichol Indians in Mexico". Mexican Indians love their hallucinogenic Peyote [wikipedia.org]. On the other hand, MIT researchers want the masses to program with the mouse. Well, I know about "correlation is not causation", but MIT sure is an interesting place to be.

Yea- this might work until the icons change. I don't see this working too well in practice. I don't know about Mac- but on my Ubuntu system the icons got updated last week. And it happens often enough that these scripts would need updating to be a serious pain and expense. It isn't like an ordinary user could figure this stuff out either. Despite it being so simple your still going to need an IT person to create these scripts. Now you just have dumber IT people. Probably people who COST you more money in p

From what I seen is this a macro program that can use screenshots rather then key/mouse data to automate tasks. So you PROGRAM your PC in the same way you PROGRAM a VCR to record a show. It is NOT the same as writing an application.

But it seems very intresting once you got past this difference. Macro's are very handy for testing in my experience but often have a problem because a tiny mis-alignment can ruin it all. If this program is smarter because it can regonize where data is supposed to go... well that would certainly make automated tests a bit easier.

Interesting stuff. Just don't think you will be writing software with this.

Yes, you could use Sikuli to fire up a text editor, individually press the keys to write all the lines of code, launch the compiler/linker/whatever. So it meets your weird definition of completeness. However, I suspect you could not use Sikuli to write a program that writes a Sikuli program to write Sikuli. I could be wrong, though.

This is the same sort of scripting you can do with many already existing languages. Autohotkey for example. The only new feature would be the ability to copy the screenshot directly into the program as apposed to taking it outside the program and referencing the file directly. I'd say that this scripting language is actually weaker because of it.
As far as using this inside a game... they are already hardened against this sort of thing. For example, next time you're in EVE look at the buttons you use. They

Come on, let's cut through the default Slashdot snark. The image capture aspect of Sikuli is brilliant! I don't like the tagline "program anything with Sikuli" because 99% of software should be written in something else. But think of writing test scripts that can use the image matching features. If the software works as advertised, then you could throw together UI test cases way faster than anything else I've seen. System administration tasks should be a good match too. The resulting code would be brittle and hard to maintain, but for quick one-off scripts, sure... I can see it.

The script may not work if the UI style is different from the one recorded or if the UI language is different from the one recorded. Generally, any option that can change the UI from computer to computer will create a problem for Sikuli.

I'd be curious to see how they handle the back end, especially as some others pointed out it does make calls that seemingly require some hook into the OS. As for its usefulness, I doubt it will really take off beyond being a decent prototype. It relies on image matching so if you use and change a custom icon set all your scripts would be kinda worthless. Same goes if the programs you are "screenshot scripting" receive a major overhaul in the GUI department. Until it can address those issues, I doubt it wil

Sikuli is certainly not commercial-grade UI testing software. It was never intended to be, this is academic software written to explore ideas, rather than to polish them to perfection. Also, it is not a "general" programming language. The previous posters that compared it to video-programming are right: not all programs have to target complicated algorithms and data-structures, there is plenty of space for automating "simple stuff".

As an idea, I find the readability of the code particularly interesting. Sikuli code is about the closest you can come to self-explanatory, step-by-step instructions on how to achieve whatever a particular program does. Add a few comments to the most arcane steps, publish those programs to an online repository, and presto! executable step-by-step tutorials.

Yes, the developers may have to address the variability of themes on people's desktops. It is certainly possible to do so (for instance, by keeping a list of mappings from any of a set of "supported" themes to a "canonical" theme, which would be used in all examples), but, as far as ideas go, I really think that Sikuli is a very refreshing idea.

I totally agree. I watched the youtube video (is WTFYV the equivalent of RTFA?), and I was kind of impressed. Although the demo shows an interaction with a bunch of buttons, the real power is the image recognition. She showed how with one command each you can script the two of the fundamental interactions you have with images on the screen: click it, or wait for it to appear. The fuzzy visual recognition algorithms are a huge plus. If you wanted to script something in your room using a web-cam, this i

Some accountants seem to think everyone needs to learn accounting in order to function in society. But people have other jobs. Some of us like our dumbed down tools because they fill a need. My tax software lets me do my taxes without learning "proper" accounting. Similarly, I know some people who benefit greatly from a little passing knowledge of high-level scripting languages like VB, JavaScript, or even Python.

For those kinds of people, Sikuli looks pretty cool because they can do things that would b

Wow they just created the old VB SendKeys command. I was actually doing stuff like this 12-14 years ago with SendKeys command in VB. In "practical" use back thenit sucked and I am certain that has not changed.

I did this exact same thing in AutoIt [autoitscript.com], except that it needs exact matches of images instead of a fuzzy recognizer. (Plus, I also had rule triggers and state vs just a single list of imperative commands)

The fuzzy match is a nice addition, but this automation concept has been available for years.

What AutoIt does is take a hash of the pixels in a rectangular area. If you interactively capture an area's hash when the screen is in the desired state, then that area can be scanned during the script run to see when/if it matches the desired hash again. The area's location can be relative to a window, control, screen, etc, and the software can scan around various locations in case it moved.

There's no lossiness in any of the image manipulation, but the same pixels need to show up.

How you sanitize your inputs in a language that checks what is displayed on the screen? Instead of xss or sql injection you could end being hacked by watching a mail attached normal picture if that kind of programming becomes popular.

The idea is cool and innovative, and makes automating a point-and-click interface a breeze. It certainly has applications.

But overall, it just seems like a Bad Idea. It will be as reliable as screen-scraping in browsers and would therefore be wise to be avoided, and for the same reasons.

Even just changing the theme of your OS or the icon sizes could well be enough to confuse the image processing. The code won't be portable, and in the end, for anything but the most simple tasks, the person using it would still require some programming skills. Because of this, I think between Sikuli and command-line scripting, command-line scripting has more staying power.

Okay, I have done a fair amount of programming and yet with a new Mac I have not yet dived into the SDKs, etc. I once wanted to do some batch resizing of photos and yet couldn't get it done in Automator easily without being scared of losing the original photos, on my first dive into it. Yes, I actually wrote a great auto-compositing and resizing program once driving the Gimp on linux. It was awesome. But that was years ago and now I have a nice new computer. And where did that code go. Yes I'm sure Automato

The subtitles were a bit of a surprise. Can MIT not afford better than built in microphones on cheap laptops? Between her vaugely asian accent, the poor quality of the audio (seriously, you're TELLING people how to do something, the audio is important here - did they record this in a shower stall or something? my netbook's audio sounds 100x better than this), and then apparently some sort of wacky audio encoding basically makes her impossible to understand. People who speak english as a second language aren

On the contrary, my experience has been that non-native speakers of English are actually better at understanding other non-native speakers. I don't know why that is, but intuitively it makes sense -- non-native speakers probably learned from a diversity of other non-native speakers.

I was at a WinHEC panel session in 2008 and the panel leader had absolutely horrible English (I'm sure he was intelligent, but he wasn't intelligible). Somebody else, clearly of another racial background (the specific ethnicities

There are far easier ways to commit click fraud than actually looking at the screen to do it. The ad companies tend to ignore the same request multiple times from the same IP so this changes nothing.

People who commit 'click fraud' aren't writing crappy little screen scrapers to do it, its far easier and faster to write a plugin for firefox to do what you're say and just find the text of your ad on the page and trigger the link. No need to futz with whats displayed or 'moving the mouse' to the right spot,

I mostly agree with you, it's always silly to automate a sequence of GUI actions.

However I can see where they're going here; the program examines your screen and finds the widget to click on or enter data into, much like a human looking at the screen and deciding what to do next. Extend that to the real world, a robot that looks around your room for the remote control and turns on the TV, then surfs through the channels until it recognizes something you like to watch. By then it will also be capable of unde