Each toplevel window is mapped to a canvas element, and the content in the windows is updated by streaming commands over a multipart/x-mixed-replace XMLHttpRequest that uses gzip Content-Encoding to compress the data. Window data is pushed as region copies (for scrolling) and image diffs. Images are sent as data: uris of uncompressed png data.

Input is gathered via dom events and sent to the server using websockets.

Right now this is Firefox 4 only, but it could be made to work in any browser with websockets.

Now, I want to know, Is this useful?

There are two basic ways to use this, you can either run your own apps on your own server and access it from anywhere (kinda like screen). Or you can put it on a public server that spawns a new instance of the app for every user (gimp on a webpage!).

If you had this technology, what cool stuff would you do with it? What apps would you run, and how would you use them?

175 thoughts on “Gtk3 vs HTML5”

Comment navigation

This is amazing. It means we can use the existing GTK applications and make them available in the browser. And it also means that the scope of the GTK framework has just increased a lot – we can use the same API for desktop and web!

Could you clarify about the technical background? Is every screen update sent as image, or is stuff like GtkLabel text actually rendered by the browser? How does it compare bandwidth-wise with VNC and X11?

This could be used by translators of an application to preview their result and make sure the screen looks OK with the translated messages. In fact, a tool like Transifex (www.transifex.net) could show the preview for them.

I’ve been pondering if this is possible and you just proved that it is very much possible. Being able to write applications for the desktop that works out of the box on the web would be the wet dream of many web developers out there who don’t really want to work on the web but have to do it.

Nice! Awsome! Incredible! GTK+ just gained network transparency. That would be a boost for remote management, especially if the various distributions really start dropping X. And it doesn’t even require a specialized client application, just a browser. It would could even be useful for desktop-sharing and the remote help desk!

This is incredible stuff – really shows the capabilities of HTML5 (not to mention your own coding prowess!).

We’re getting to the point of having fully-featured apps, running on a server, accessed via a naked web browser with no plugins. That’s huge. Add some kind of decentralized file store, and you’ve got a properly decentralized networked application framework with almost limitless possibilities.

It can handle anything that draws only using cairo. That includes WebKitGTK,
but not clutter (that uses OpenGL).

@jono chang:
I have not really measured the bandwidth yet, but my guess is that it does
pretty good on typical UIs, but its clearly not good for displaying video
or large fancy animations.

@oliver:

Technical details on rendering:
Not every drawing operation is sent over the wire, instead we send updates
after any expose event handling is finished. We also keep track of the
last image we sent to the client. So, when we get to update the image
we can compare the two images and send only the rectangles that were changed,
and additionally we send only the pixels that changed, the rest are sent as
black alpha=0 pixels, so these compress well. Additionally we do catch window
scrolling and send these as rect-list copies, so scrolling involves just a
single bitblit + the image for the newly scrolled in data.

roc:

What would help is a way to transfer image data that is more efficient than
base64 encoded data uris.

Jon Smirl:

I don’t see any reason for the primary way to render the buffer being html.
However, the same binary could easily allow both access via html and some more
efficient protocol for wayland buffer rendering.

@Havoc:

Firefox uses a fair chunk of X specific hacks in its gtk code… Would be
easier to run GtkWebKit in firefox!

@Vladimir:

While sending cairo commands (or some other rendering commandset) over the
link seems like a good idea I don’t think its actually the right approach.
Much of the UI in apps are rendered as pixmaps anyway, and there are several
layers of overdraw when rendering a full window, so i’m not sure there is less
data to sent. And, with the image data version its very easy to compare the
last and new frame and only send the difference, something which is very
hard to do in a full rendering api style protocol.

@Michael:

Good spotting, yes, keyboard input is one of the things left to do. Trivial
keyboard input should be easy, however there will probably be issues with
more complex input, as the browser ui steals many key combinations.

@Martin Sevior:

Its gtk+ 3.0 only, so you need to port abiword to do all rendering via cairo.

@Reeks:
Technically it might be possible to do OpenGL via webgl. However, its gonna
be hard to catch the gl calls and forward them. You’ll need a custom libgl.
Also, its unlikely to perform well if you’re using lots of textures. Could
work well for simple cases though.

@rms:

Well, emacs gtk+ port uses a lot of direct X calls, so it won’t work as is.
You’d have two alternatives to make it work:
* port all emacs rendering to “pure” tk+/cairo use
or
* Duplicate the work i’ve done for the backend as a separate emacs rendering
mode. Its really not that hard.

I think the second would be the easiest.

@lucasshrew:
@jyf:

Its not very unlike a vnc or an xserver, yeah. In fact I originally thought
to do an xserver implementation in the same way. However, in practice i think
it makes more sense to export an app, rather than a full desktop. The user
already has a desktop already.

@Luke Leighton:

Well, pygtkweb is “like gtk+”, which is only useful if you’re writing new
code for web use. This *is* gtk+, so any existing app runs with just a rebuild.

@murrayy:

Sure, its themable, but the themeing happens server side. We could have the
client tell the server what theme to pick though.

@Yann:

Sure, you can use any gtk+ binding. Its just a normal gtk+ app afterall.

@Pjvandehaar:

There are many ways to display html in a gtk app, the best one atm is GtkWebKit.

@Wingo:

No keyboard input yet, nor clipboard. However, simple keyboard input should
be easy to add (accelerators stolen by the browser window is tricky though).
I’m not sure if clipboard is doable. Need to look into how js+dom can modify
the clipboard.

@mikeC:

I don’t think a BOSH like approach will work. There are far to many events
going from the client to the server on e.g. mouse move for it to work with a
new http connection per message sent.

@Stu:

As i said above, i’m not sure forwarding cairo rendering commands is actually
more efficient.

Sounds very cool indeed. At least at http://cofundos.org/project.php?id=67 people have been waiting for something like that. How invasive changes did you need to do? (I’m assuming small changes all over GTK and then some thread to handle http?)

I’m split on this. As a geek, the elegance and ingenuity is something that puts a big smile on my face.

As a web developer, it slightly concerns me. For certain specialised purposes I can see this being somewhat useful – esarbe mentions using it as a replacement for X remoting, which would be awesome. I like the idea of being able to point a web browser at my workstation.

My concern is Zeeshan’s comment “web developers out there who don’t really want to work on the web but have to do it”. There are specific reasons why writing software for the web is as it is, not least accessibility, and replacing HTML, CSS and Javascript with a big canvas tag is not a good thing.

Keep improving this. There are lots of ways to make it faster. For example cache images in the client, compute deltas to the screen DOM and send them as JSON, build client side widgets (text editor, dialogs, menus) and replace GTK at the widget level instead of drawing level. Browsers have WebGL now so you can remote GL apps. Search for the Google Quake in Chrome demo.

We can start with two paint functions, normal and this one. But over time you’d like to move to only the HTML5 one and optimize the app to use it. The high level concept here is that HTML5 is the new toolkit, GTK/HTML5 is a transition tool to this end. As apps under go this transition they will evolve their UI to become better optimized on the HTML5 toolkit.

Making this transition has major impacts. It makes every Linux app network transparent to any OS. It replaces X transparency. It is a great solution for people using VMs. HTML5 standardizes everything. HTML5 is themable via CSS.

Don’t worry about using a socket to the local toolkit. That process is very fast. You are using it today in the X server. The X server is the identical model as using HTML5. xlib packages your drawing requests and sends them over a socket to the xserver which draws them. You are generating HTML5 requests and sending them over a socket to a HTML5 toolkit which will render them into Wayland. But the HTML5 is much more efficient since it is higher level and it is standardized.

Video can be handled by setting up a separate stream from the HTML5 front end. App asks for a video widget, you make an HTML5 widget, HTML5 widget asks the server for a video stream, server redirect the video stream from the client to the HTML5 engine. You will have to assume the the HTML5 client has a codec that can decode the stream.

This is part of changing the apps. You don’t want the apps doing their own audio and video decompression. You need to send the compressed stream to the front end.

Another example: think of Google Docs if they gave us the source code and let us run the server on our local machines. Google Docs UI is going to greatly improve once it is converted to HTML5.

GL textures can be sent across the wire and be cached in the remote GL engine. People should be storing these in GPU memory instead of system RAM anyway. I am always running out of system RAM while my GPU memory sits there empty.

This thing exists for years on Qt, it is known as Vedga (former Glan), see english part of http://kalpa.ru. Just not in a browser: thin universal client speaking gzip-compressed protocol, and QT app runs on server, just it’s screen is local. Very fast, works even on a modem connection.

A remote QT is totally different with a gtk app run on the web! You can not integrate legacy QT app to next generation web apps. However, Alex finds a way to get gtk worked. I guess that’s why people like it.

I will state something key. Don’t depend on webgl. GLX with X11 fails across network due to the huge size and amount of traffic opengl can send backwards and forwards. virtualgl would be a good place to start as well as the method wayland uses for rendering.

Really GTK needs a way to tag Opengl that can be sent to client and opengl that need to be processed server side.

Interesting would be seeing integration with the likes of eyeos. So we can have like a full desktop in the webbrowser.

With that said, does this project not scare the hell out of anyone else? With malware, phishing, and arbitrary-code-execution exploits rampant (mostly on Windows thankfully), what’s to stop criminals from embedding Gtk+ apps in webpages that phish for my passwords or worse.

I can’t image it’s very hard to recreate the gnome-keyring or gksu window and present it to unsuspecting users through html5 websockets.

Web-browsers are capable of connecting to multiple domains. If each domain uses this gtk interface, how do I determine which domain that login popup belongs to? I don’t want to login into my “gtk facebook” page only to realize later that login box was from a malicious domain I happened across.

With a second look, after that knee-jerk reaction, it’s clear that this technology is confined within the canvas object of the web browser, which will eliminate arbitrary-code-execution and malware as security concerns, but still leaves phishing.

Assuming that the address bar of your web browser cannot be hidden, or the title bar text changed (in such a way that you can’t tell the window belongs to your web browser), then popup windows with gtk interfaces can be distinguished as local or internet-originating, and/or their domains identified. This assumes, however, that all browsers meet the minimum requirements and users are vigilant.

One aspect that may generate confusion and become a potential vulnerability, is focus indicators. I assume that the gtk app on the remote client will still display the caret in it’s text box after I’ve moved focus to another html5 object (perhaps a text box). Now my screen is “indicating” that keyboard input can go to one of two places. And this will get worse if more remote apps are visible on screen at the same time. Are canvas focus events (enter/leave) sent to the remote-gtk app, to handle focus indicators?

Again, Alex, great work. I can see the potential applications for this, but only for private use on private networks. I hope my concerns for security are taken for what they are–my opinions–but lead you to greater thoughts for making this technology secure enough for use as a public standard on public networks. I would caution against relying solely on security provided by web browsers (the “let someone else handle it” mentality). Not all browsers are equal in this regard.