Monday, December 28, 2009

In the last few weeks, I've had a surprising number of discussions about commit messages. Many of them were with developers new to a project, trying to get them started. So here's a list of things you should do when committing, and why you should do it. Hint: the linux kernel mailing list gets it right, go there to learn.

Any software project is a collaborative project. It has at least two developers, the original developer and the original developer a few weeks or months later when the train of thought has long left the station. This later self needs to reestablish the context of a particular piece of code each time a new bug occurs or a new feature needs to be implemented.

Re-establishing the context of a piece of code is wasteful. We can't avoid it completely, so our efforts should go to reducing it to as small as possible. Commit messages can do exactly that and as a result, a commit message shows whether a developer is a good collaborator.

A good commit message should answer three questions about a patch:

Why is it necessary? It may fix a bug, it may add a feature, it may improve performance, reliabilty, stability, or just be a change for the sake of correctness.

How does it address the issue? For short obvious patches this part can be omitted, but it should be a high level description of what the approach was.

What effects does the patch have? (In addition to the obvious ones, this may include benchmarks, side effects, etc.)

These three questions establish the context for the actual code changes, put reviewers and others into the frame of mind to look at the diff and check if the approach chosen was correct. A good commit message also helps maintainers to decide if a given patch is suitable for stable branches or inclusion in a distribution.

A patch without these questions answered is mostly useless. The burden for such a patch is on each and every reviewer to find out what the patch does and how it fixes a given issue. Given a large number of reviewers and a sufficiently complex patch, this means many man-hours get wasted just because the original developer did not write a good commit message. Worse, if the maintainers of the project enforce SCM discipline, they will reject the patch and the developer needs to spend time again to rewrite the patch, reviewers spend time reviewing it again, etc. The time wasted quickly multiplies and given that a commit message only takes a few minutes to write, it is simply not economically viable to omit them or do them badly.

Consider this is a hint for proprietary software companies too - not having decent SCM discipline costs money!

How to do it better

There's no strict definition of the ideal commit message, but some general rules have emerged.A commit should contain exactly one logical change. A logical change includes adding a new feature, fixing a specific bug, etc. If it's not possible to describe the high level change in a few words, it is most likely too complex for a single commit. The diff itself should be as concise as reasonably possibly and it's almost always better to err on the side of too many patches than too few. As a rule of thumb, given only the commit message, another developer should be able to implement the same patch in a reasonable amount of time.

If you're using git, get familiar with "git add -p" (or -i) to split up changes into logical commits.

The git commit format

If you're submitting patches for git, the format is mostly standardised. A short one-line summary of the change (the maximum length of the line differs between projects, it's usually somewhere between 50 and 78 characters). This is the line that'll be seen most often, make it count. Many git tools are in one way or another optimised for this format. After that one-line summary, an empty line, then multiple paragraphs explaining the patch in detail (if needed). Don't describe the code, describe the intent and the approach. And keep the log in a present tense.

Learn to love the log

I have used CVS (and SVN to a lesser extent) in the past and log was a tool that was hardly ever used. Mostly because it was pretty useless, both the tool and the information available. These days I look at git logs more often than at code. The git log tool is vastly superior to CVS log and the commit discipline in the projects I'm working on now is a lot better. I grep git logs more often than code files and I use git blame all the time to figure out why a particular piece of code looks the way it does. It's certainly saving me a lot of time and effort. It's come to the point where the most annoying X server bugs are the ones where the git history stops at the original import from XFree86. If you're not using your SCM's log tool yet, I recommend to get more familiar with it.

How not to do it

There's a bunch of common sins that are committed (yay, a pun!) regularly.

SCM is not a backup system! My personal pet hate. Developers who use it as such tend to do end-of-day commits, checking in everything at the end of the day. The result is useless, a random diff across the code with changes that are impossible to understand by anyone including the original author once a few months have passed. (On this note: universities, please stop teaching this crap).

Per-file commit. More often than not a logical change affects more than one file and it should not be split up into two commits.

Lazy commit messages, any commit labelled as "misc fixes and cleanups" or similar. I've seen my fair share of those on non-FOSS projects and they always come back to bite you. Impossible to find when a bug was introduced, hard to bisect and makes it harder for anyone else to keep track of what's happening in the project.

Two changes in one patch. Something like "Fixed bug 2345 and renamed all foo to bar". Unless bug 2345 required the renaming, fixes whould be split it up into multiple patches. Others may have to take one of those bug fixes and apply it to a stable branch but not the other one. Picking bad patches apart into useful chunks is one of the most time-consuming and frustrating things I've done since it doesn't actually add any value to the project.

Whitespace changes together with code changes. Needle in a haystack is a fun game, but not when you're looking at patches. It's a great way to introduce bugs, though because almost no-one will spot the bug hidden in hundreds of lines that got reindented for fun and profit.

The ever-so-lovely code drops. Patches with hundreds of lines of code to dump a new feature into the code while at the same time rewriting half the existing infrastructure to support this feature. As a result, those hundreds of lines of code need to be reviewed every time a bug is discovered that is somehow related to that area of code.It's easier and less time consuming to first rework the infrastructure one piece at a time, then plug the new feature on top. As a side-effect, if a project relies on code dumps too often it's discouraging outside developers. Would you like to contribute to a project where the time spent filtering the signal from the noise outweighs the actual contribution to the code?

Unrelated whitespace changes in patches. A reviewer needs to get the big picture of a patch into their brains. Whitespace-only hunks just confuse, a reviewer has to look extra hard to check if there's a real change or whether it can be ignored. That's not so bad for empty lines added or removed,it's really bad for indentation changes.

There's plenty of excuses for the above, with the favourite one being "but it works!". It may work, but code is not a static thing. In a few weeks time, that code may have moved, been rewritten, may be called in a different manner or may have a bug in it. At the same time, the original developer may have moved on and no-one knows why the code is that way. In the worst case, everyone is afraid of touching it because nobody knows how it actually works.

Another common excuse is the "but I'm the only one working on it". Not true, any software project is a collaborative project (see above). Assuming that there's no-one else is simply short-sighted. In FOSS projects specifically we rely on outside contributors, be it testers, developers, triagers, users, etc. The harder it becomes for them to join, the more likely the project will fail.

Another, less common excuse these days is that the SCM used is too slow. Distributed SCMs avoid this issue, saving time and by inference money.

Thursday, December 24, 2009

Ran into a new bug? Wasn't there with the last version? Want a developer to fix it? Too easy, just bisect it. And here's how:

First of all, we assume you've been running whatever your distribution ships. Find out that version and find the upstream repository. Now clone that repository and check out the matching version. For the example below, let's assume 1.2 is fine and 1.3 is broken.

You can get the configure flags from your distribution's build system. e.g. Fedora's koji. Look up the build of the version, go into the build.log and copy the configure flags from there. After that, the project will be installed in the same location, with the same flags as the rpm/deb/whatever file.Verify it's still broken with this version, if not it might be distribution-specific patches. If it's still broken, check out the working version and verify that one too.

$> git checkout project-1.2 # the working version$> make && make install# test

So, you've just verified that vanilla 1.2 works and 1.3 doesn't? Great, off to bisecting.

git now gives you a version in between those two, simply build and install it, test it and depending on whether it works or not, run "git bisect good" or "git bisect bad". After a number of test runs, git will tell you which commit introduced the bug. And you're done, send this commit info to the developers and they'll have a much easier time fixing the bug.

Now, of course nothing is as easy as it seems so you may see versions that you can't build (git bisect skip) or you'll run into dependency issues between versions so you can't bisect further. I've had some reasonable success with the process above and the smaller the range of commits you can provide to the developers the better.

You can use the same process for distribution-specific patches too. After checking out a a working version, just apply all patches from that distro in-order and then bisect between the working version and HEAD. Works like a treat.

Friday, December 11, 2009

Hello, and welcome to this week's episode of "confusing wordcronyms". This time we'll have a look at letter combinations starting with X, ending in org or 86, and referring to a popular windowing system. Hint: the "X Window System" is an umbrella term for a windowing system based on the X protocol.

Here we go:

X

preferred shorthand for lazy typers or people with most of their keyboard missing. Refers to some incarnation of the X Window System. Is also a symlink to the binary of X servers.

X.Org

Official name of the X.Org project. "The X.Org project provides an open source implementation of the X Window System. [...] The X.Org Foundation is the educational non-profit corporation whose Board serves this effort, and whose Members lead this work." [source]

x.org

URL used by the X.Org project. Sometimes used to refer to X.Org by those with a broken Shift key.

XFree86

"The XFree86 Project, Inc is a global volunteer organization which produces XFree86®, the freely redistributable open-source implementation of the X Window System continuously since 1992." [source] Due to some not-so-exciting political issues, the code produced by XFree86 was forked and most developers shifted to X.Org instead. Today's code is the continuation of what used to be XFree86 code, though XFree86 itself is mostly dead now.

Xorg

The binary name of the X.Org X server implementation.

xfree86

The DDX name for the commonly used X server. A DDX is the device-dependent part of the X server and together with the device-independent bits (DIX) and some other parts forms your binary. The xfree86 DDX is what loads your video drivers and input drivers (amongst other things). Other well-known DDXs are kdrive, Xnest and Xdmx. All patches you see with xfree86 prefixes or pathnames patch this particular DDX.

xf86

Shorthand for drivers that can be loaded by the xfree86 DDX. For example, xf86-input-evdev, xf86-video-ati, etc.

xorg

Firstname of the xorg.conf server configuration file. Lovingly spelled lowercase because we can.

If you're reading this, chances are that you are running Xorg, the xfree86 DDX from the X.Org X server implementation with various xf86-something drivers. How can that possibly be confusing? Pass the mustard, thanks.

Tuesday, November 24, 2009

run-time configuration works by having the driver configuration exposed as a shared memory segment and letting another process change this state. hw monitoring works by dumping the hw state read from the device into the shared memory and then having another process print this state.synclient was the default UI (though syndaemon used SHM as well).

X Server 1.6 introduced input device properties, a generic way to attach information to input devices. The information can be attached by drivers or clients, and both drivers and clients are notified about changed values. Hence, it's a good vehicle for device-specific configuration.

synaptics 1.0 had input device property support in the driver, but SHM configuration is still available.

synaptics 1.1 had synclient and syndaemon use input device properties by default, but SHM configuration is still available.

synaptics 1.2 (current release) has configuration through SHM removed from both the driver and synclient/syndaemon and the SHM area is not writable by the user anymore.

All releases support hardware monitoring through SHM. However, there's little reason to enable it these days unless a developer asks you to do so to get more information on the data the touchpad provides. I haven't asked anyone for the monitoring output for months, usually the data from the kernel is enough.

The following GUI tools use properties:

gnome-mouse-properties + gnome-settings-daemon

gsynaptics (merely a wrapper around synclient). discontinued, see gpointing-device-settings.

Wednesday, November 11, 2009

Many of you are probably familiar with evtest, a small debugging utility written by Vojtek Pavlik to check what input device data is leaving the kernel. I use it a lot and one of the standard requests I make in bugreports is run evtest against the device file. The information it prints tells us the capabilities a device has and the events being sent when a certain action is performed.

I've used this information to add uinput device to my uinput test devices collection to debug various server and/or driver failures.That worked for some cases but suffered from a major caveat: it was incredibly hard to reproduce issues that resulted from complex interactions. For months, I've been meaning to fix this and last weekend I finally had time to sit down and hack something up. That work is now in the evtest repository.

It works quite simple:The user runs evtest-capture against the device and performs the action that reproduces the bug. evtest-capture saves the event stream into an xml file, this xml file can then be converted into a standalone uinput-based C program that resembles both the device and the interaction. I can re-create and run that program on my test box and reproduce and hopefully debug the issue.

Wednesday, October 28, 2009

Thanks to Alan Coopersmith's efforts, X11R7.5 was released a few days ago. Except - what does that mean?

This post is intends to shed some light onto the components of the X11R7.5 release and where the version number comes from.

X Window System

If you're running a desktop system other than Windows or OS X, you're most likely running some instance of the X Window System, also referred to as "X11" or simply "X". X consists of several components that all make up the "X Window System", yet some of them are more visible than others.

X Protocol

The core component of X is the X Protocol. This is what defines X, it is essentially the API.The X Protocol consists of the core protocol, dating back to the 1980s and a number of protocol extensions, essentially additions to the core protocol. If you hear terms like X Input, XRandR, RENDER, etc., all of these are protocol extensions.

X Server and the drivers

The X Server is the process that talks to the hardware drivers and listens to requests from applications to draw things. It also handles input events and passes them on to the right application. Depending on your hardware, you have a number of drivers. These days many setups have evdev and synaptics for input, and intel, ATI or nvidia for graphics.

The X Server supports the core protocol and most protocol extensions, but different X servers may support different versions. Generally, the most recent X Server supports the latest version of the protocol.

Xlib and friends

Xlib (or libX11) is the library that allows applications to talk X Protocol to the server. It wraps the low-level protocol into a slightly higher-level API. These days, most applications that display a GUI use Xlib at some point - though Xlib is usually abstracted away by a saner toolkit such as GTK or Qt.

Xlib has been the single toolkit to talk X Protocol for ages, but in recent years XCB is gaining some traction (and in fact recent versions of Xlib use XCB at the lowest level).

X applications

A number of applications are traditionally part of the X Window System. One of the well known ones is xeyes, but other, crucial tools such as setxkbmap and xkbcomp are part of these applications as well.

Misc other stuff

There are a number of other packages that include fonts, misc utils, data packages etc. I'll skip the details, it's just important to know they're there.

X11R7.what?

Back a few years ago, all the above component were part of one repository. To build one of the components, you also had to build the others. To release one, you'd have to release the whole lot. Over time, the version numbers crept up to 6.9 for this so-called monolithic tree.

X11R6.9 (X11 Release 6.9) was the last monolithic release. Around 2005, the monolithic tree was split up into separate repositories for each component. This also reset the version numbers for most of the components - those that inherited the 6.9 version numbers (or even 7.0) were reset to 1.0.

Since then, the X11R7.x releases (referred to as "katamari") are quite like distributions. They cherry-pick a bunch of module versions known to work together and combine them into one set. The modules themselves move mostly independent of the katamaris and thus their version numbers may skip between katamaris. For example, X11R7.4 had the X Server 1.5, X11R7.5 has X Server 1.7.

This is where much confusion comes from. Many users don't know whether they're running 1.7, 7.5, 1.0 or 6.8. The intent of a katamari is simply to provide a set of modules that are sufficient to get a basic GUI running. That's why over time modules get added or removed from the katamari as well. A module that was part of X11R7.5 may not be part of X11R7.6 and of course the other way round (a full list of which versions are included is at the top of the X11R7.5 Changelog).

Which version actually matters?

Katamaris matter mostly for distributors. They represent a set of versions known working together and make for easy picking. A distribution is free to start out with a katamari and then update to newer modules as they are released. The katamari is merely a starting point, not more.

For this reason, it rarely matters to an individual user whether a module they're running is part of a katamari. For bug reporting, developers need to know the versions of the individual modules affected so they can narrow down which bug may be triggered.

To get the versions for the X Server and the drivers, look at the /var/log/Xorg.0.log. The first line states the version of the X server. Drivers are loaded dynamically, so you need to search for them in the log. For example, my log says:

Friday, October 2, 2009

It finally happened! After nearly 4 years of development, MPX has been released as part of XI2 in the new X Server 1.7.

The whole thing started when I started my PhD in late 2004. The problem I found was that there was no support for collaboration on a single shared display. All the solutions at the time were hacks at the toolkit or application level. I found that the only way we can get truly collaborative interfaces is by adding it into the windowing system itself. So started hacking on X in late 2005. I went from scratching my head and wondering how some of the stuff could compile (I had never heard of K&R function declarations) to rewriting large parts of the input subsystem and even ended up as release manager. Not in a single day though.

Now we're done. MPX is out, and we have generic low-level support for multiple input devices. You know the whole one keyboard-one mouse paradigm we've had since Doug Engelbart invented the mouse? It's over, you don't have to restrict yourself anymore when writing an app.

Of course, this is a low-level change and when you wake up tomorrow, not a lot will have actually changed. We still need the toolkits to support it, we need apps to pick it up, we need the desktop environments to start thinking about what can be made useful. Nonetheless, basic collaboration features are already there and it can only get better from here.

Thursday, August 20, 2009

I've had an interesting meeting with Jens Petersen yesterday about input methods. Jens is one of the i18n guys working for Red Hat.

Input methods are a way of merging several typed symbols into one actual symbols. Western languages rarely use them (the compose key isn't quite the same), but many eastern languages rely on them. To give one (made up) example, an IM setup allows you to type "qqq" and converts it into the chinese symbol for tree.

Unfortunately, IM implementations are somewhat broken and rely on a multitude of hacks. Right now, IM implementations often need to hook onto keycodes instead of keysyms. Keycodes are a numerical value that is usually the same for a key (except when it isn't). So "q" will always be the same keycode (except when it isn't). In X, a keycode has no meaning other than being an index into the keysym table.

Keysyms are the actual symbols that are to be displayed. So while the "q" key may have a keycode of 24, it will have the keysym for "q" in qwerty and the keysym for "a" in azerty.

And here's where everything goes wrong for IM. If you listen for keycodes, and you switch drivers, then keycode 24 isn't the same key anymore. If you listen for keysyms and you switch layout, keysym "q" isn't the same key anymore. Oops.

During a previous meeting and the one yesterday, we came up with a solution to fix them properly.

Let's take a step back and look at keyboard input. The user hits a physical key, usually because of what is printed on that key. That key generates a keycode, which represents a keysym. That keysym is usually the same symbol as what is printed on the keyboard. (Of course, there are exceptions to that with the prime example being dvorak layout on a qwerty physical keyboard)In the end, IM should aim to provide the same functionality, with the added step of combining multiple symbols into one.

For IM implementations, we can differ between two approaches:In the first approach, a set of keysyms should combine to a final symbol. For example, typing "tree" should result in a tree symbol. This case can be fixed easily by the IM implementation only ever dealing with keysyms. Where the key is located doesn't matter and it works equally well with us(qwerty) and fr(dvorak). As a mental bridge: if the symbols come in via morse code and you can convert to the correct final symbol, then your IM is in this category. This approach is easy to deal with, so we can close the case on it.

In the second approach, a set of key presses should combine to a final symbol. For example, typing the top left key four times should result in a tree symbol. In this case, we can't hook onto keysyms because they may change with the layout. But we can't hook onto keycodes either because they are essentially random.

Wait. What? Why does the keysym change with the layout?

Because we have the wrong layout selected. If you're trying to type Chinese, you shouldn't have a us layout. If you're trying to type Japanese, you shouldn't have a french layout. Because these keysyms don't represent what the key is supposed to do. The keysyms are supposed to represent what is printed on the keyboard, and those symbols are Chinese, Japanese, Indic, etc. So the solution is to fix up the keysyms. Instead of trying to listen for a "q", the keyboard layout should generate a "tree" keysym. The IM implementation can then listen for this symbol and combine to the final symbol as required.

This essentially means that for each keyboard with intermediate symbols there should be an appropriate keyboard layout - just as there is for western languages. And once these keysyms are available, the second approach becomes identical to the first approach and it doesn't matter anymore where the physical key is located.

The good thing about this approach are that users and developers can leverage existing tools for selecting and changing between different layouts. (bonus points for using the word "leverage") It also means that a more unified configuration between standard DE tools and IM tools is possible.

For the IM implementation, this simplifies things by a bit. First of all, it can listen to the XKB group state to adjust automatically whether IM is needed or not. For example, if us(qwerty) and traditional chinese are configured as layouts, the IM implementation can kick in whenever the group toggles to chinese. As long as it is on us(qwerty), it can slumber in the background.

Second, no layout-specific hacks are required. The physical location of the key, the driver, they all don't matter anymore. Even morse-code is supported now ;)

Talking to Jens, his main concern is that XKB limits to 4 groups at a time. This restriction is built into the protocol and won't disappear completely anytime soon. Though XI2 and XKB2 address this issue, it will take a while to get a meaningful adoption rate. Nonetheless, the approach above should make IM for the large majority of users more robust and predictable, without the issues coming up whenever hacks are involved.

I think this is the right approach, Jens agrees and Sergey Udaltsov, the xkeyboard-config maintainer too. So now we just need to get this implemented, but it will take a while to sort out all the details and move all languages over.

Saturday, August 8, 2009

A few months back, in January or February I decided to switch to zsh as default shell and it has made my work a lot more effective. So I encourage you to try it, it has a number of features that are quite useful. Towards the bottom of this post is my own setup, feel free to use it.

Disclaimer: some or all of the features below are probably available in other shells. This is not a "$SHELL is so much better than $OTHERSHELL" posting, this is about how a particular setup has made my work more effective.

The main features I found useful, in no particular order:

history size of 5000 with duplicate removal means I type most commands now with Ctrl+R. Most of what I do is repetitive enough that if I have typed some weird command a few months back it will still be in the history.

merged histories. ever had 15 terminals open and then found out that the history of one is not available in the others, and on closing only the last one is added to the history? not a problem anymore.

git branch display - one of the scripts makes my prompt display the git branch if i'm in a git directory. since I frequently work with 5+ branches, that's really handy. So for example, my prompt looks like this:

:: whot@dingo:~/xorg/xserver (xi2-protocol-tests)>

indicating that the xserver repo is on branch xi2-protocol-tests. It also displays whether I have commits queued up or local changes, so I don't forget to commit something before pushing. Type disable-git-prompt to disable this again if your repo is _really_ big (e.g. the kernel), otherwise it takes forever to get the prompt to display.

"GUI" selection for tab-completion. hit Tab and below the line you get a list of all files and you can go through with them using Tab. Like this:

So anyway, have a look at my zsh files and use them as you will. Save them as $HOME/.zshrc and $HOME/.zsh/ to get started.

Thursday, July 23, 2009

This post is part of a mini-series of various recipes on how to deal with the new functionality in XI2. The examples here are merely snippets, full example programs to summarize each part are available here.

The ClientPointer principle

The ClientPointer (CP) principle is only partly interesting for normal applications since no XI2 client should ever need it. The exception is the window manager. About one quarter of XI2 protocol requests and replies are ambiguous in the presence of multiple master devices. The best example is XQueryPointer(3). If there are two or more master pointers, XQueryPointer has a <50% chance of returning the right data.

The ClientPointer is a dedicated master pointer assigned to each application, either implicitly or explicitly. This pointer is then used for any ambiguous requests the application may send. This adds predictability as the data returned is always from the same device. Given the above example, XQueryPointer requests from one client will always return the same pointer's coordinates. Thinking of xeyes this means that the eyes will follow the same cursor. For any requests or replies that require keyboard data, the master keyboard paired with the CP is used.

The CP is implicitly assigned whenever an application sends an ambiguous request. Then the server picks the first master pointer and assigns it to the client. This of course happens only when the client doesn't have an assigned CP yet.

Alternatively, the CP can be explicitly assigned. The XISetClientPointer(3) and XIGetClientPointer(3) calls are to set and query the current ClientPointer for a client.

Status XISetClientPointer( Display* dpy, Window win, int deviceid);

Bool XIGetClientPointer( Display* dpy, Window win, int* deviceid);

Both calls take a window and a deviceid. If the window is a valid window, the client owning this window will have the CP set to the given device. The window parameter may also be just a pure client ID. Finally, the window parameter may be None, in which case the requesting client's CP is set to the given device. This is not useful beyond debugging, if the client understands enough XI2 to set the CP it should be able to handle multiple devices properly.

Getting the CP takes the same parameters but it returns the deviceid and it returns True if the CP has been set for the target client, regardless of whether it was set implicitly or explicitly. If no CP is set yet, XIGetClientPointer returns False.

Event delivery, XI2 and and grabs

The CP setting does not affect event delivery in any way. Regardless of which master pointer is the ClientPointer, any device can still interact with the client. This also means that the CP has no effect whatsoever on XI2 or XI1 requests since they are not ambiguous.

Grabs are a different matter. Since the activation of a grab is ambiguous in the core protocol (XGrabPointer - well, which pointer?) a grab will by default activate on the CP. This can be a bit iffy since an application that just grabs the pointer may not grab the one currently within the window boundaries. So the grabbing code has two exceptions. One, if a device is already grabbed by the client, a grab request will act on the already-grabbed device instead of the CP. Two, if a passive grab activates it will activate on the device triggering the grab, not on the CP.

In practice, this means that if a client has a passive grab on a button and any device presses this button, the passive grab activates on this device. If the client then requests and active grab (which toolkits such as GTK do), the active grab is set on the already-grabbed device.The result: in most cases the grab happens on the "correct" device for the current situation.

How to use the ClientPointer

As said above, the only application that should really need to know about the CP is the window manager who manages core applications as well as XI apps. The most straightforward manner to managing an application is to set the CP whenever a pointer clicks into a client window. This ensures that if the applications requests some ambiguous data, a pointer that is interacting with the application is used.

I have used this method in a custom window manager written for a user study several moons ago and it works well enough. Of course, you are free to contemplate situations where such a simple approach is not sufficient.

Wednesday, July 22, 2009

This post is part of a mini-series of various recipes on how to deal with the new functionality in XI2. The examples here are merely snippets, full example programs to summarize each part are available here.

What are grabs?

Grabs ensure event delivery to a particular client and to this client only. A common application of grabs are drop-down and popup menus. If such a menu is displayed, the pointer is grabbed to ensure that the next click is delivered to the client displaying the popup. The client can then either perform an action or undisplay the popup.

Two different types of grabs exist: active grabs ("grab here and now") and passive grabs ("grab whenever a button/key is pressed"). Both types of grabs may be synchronous or asynchronous. Asynchronous grabs essentially do just the above - deliver all future events to the grabbing client only (in relation to the grab window). Synchronous grabs stop event reporting after an event reported to the grabbing client and it is up to the client what happens next (see below). I recommend reading the XGrabPointer and XAllowEvents man pages for more detail, XI2 grabs work essentially in the same manner.

Active grabs

Devices may be actively grabbed with the XIGrabDevice() call. This call is quite similar to the core protocol's XGrabPointer, so I encourage you to read the matching man page for more detail than provides here.

The device may be a master or slave device that is currently not grabbed, the window must be viewable and time must be CurrentTime or later for the grab to succeed. If the device is a master pointer device, the cursor specified in cursor is displayed for the duration of the grab. Otherwise, the cursor argument is ignored.Grab mode is either synchronous or asynchronous and applies to the device. If the device is a master device, the paired_device_mode applies to the paired master device. Otherwise, this argument is ignored.The mask itself is a simple event mask, specifying the events to be delivered to the grabbing client. Depending on owner_events, events are reported only if selected by the mask (if owner_events is false) or are reported if selected by the client on the window (if owner_events is true).

The matching call to release an active grab is XIUngrabDevice(). If a synchronous grab is issued on a device, the client must use XIAllowEvents() to control further event delivery - just as with synchronous core grabs.

Noteworthy about XIGrabDevice is that the deviceid must specify a valid device. The fake deviceids of XIAllDevices or XIAllMasterDevices will result in a BadDevice error.

Passive grabs on buttons and keys

Devices may be passively grabbed on button presses and key presses, similar to the core protocol passive grabs.Let's have a look at button presses. Passive button grabs ensure that the next time the specified button is pressed, a grab is activated on the device and the client receives the event until the button is released again. This happens automatically provided the matching modifiers are down.The API to set a button grab is similar to the one from the core protocol with one notable exception. Button grabs can be registered for multiple modifier combinations in one go:

The first couple of parameters are identical to the active grab calls. The num_modifiers parameter specifies the length of the modifiers_inout array. The modifiers_inout itself specifies the modifiers states when this grab should be activated. So instead of one request per button+keycode combination, a client can submit all modifier combinations in one go. If any of these combinations fails, the server returns it in the modifiers_inout array.

typedef struct{ int modifiers; int status;} XIGrabModifiers;

The client sets modifiers to the modifiers that should activate the grab, status is ignored by the server. The server returns the number of modifier combinations that could not be set and their status value. The modifiers themselves are always the effective modifiers. No difference between latched, locked, etc. is made.

Both button and keycode grabs can be removed with XIUngrabButton and XIUngrabKeycode.

The fake deviceids of XIAllDevices and XIAllMasterDevices are permitted for passive button and key grabs.

Passive grabs on enter/focus events

Grabbing on enter and focus events is a new feature of XI2. Similar in principle to button presses except that the device is grabbed when the pointer or the keyboard focus is set to the specified window.

The parameters are the same for both (XIGrabFocusIn doesn't take a cursor parameter) and they are the same as for button/keysym grabs anyway. When an enter and focus grabs activates, the grabbing client receives an additional XI_Enter of XI_FocusIn event with detail XINotifyPassiveGrab (provided it is set on the window or in the grab mask). To avoid inconsistencies with the core protocol, this event is sent after the enter event is sent to underlying window.

Provided that nmodifiers is 0 after the call to XIGrabEnter, each time the pointer moves into the window win the device will be grabbed (provided no modifiers are logically down). Once the pointer leaves the window again, the device is automatically ungrabbed.

Passive enter/focus grabs can be removed with the XIUngrabEnter and XIUngrabFocusIn calls.

The fake deviceids of XIAllDevices and XIAllMasterDevices are permitted for passive button and key grabs.

Wednesday, July 15, 2009

This post is part of a mini-series of various recipes on how to deal with the new functionality in XI2. The examples here are merely snippets, full example programs to summarize each part are available here.

All XI2 events are cookie events and must be retrieved with XGetEventData.

XIDeviceEvent

The XIDeviceEvent is the default event for button and key press and releases and motion events. The event types that produce an XIDeviceEvent are XI_Motion, XI_ButtonPress, XI_ButtonRelease, XI_KeyPress and XI_KeyRelease. Note that proximity events do not exist in XI2. A device that supports proximity should instead provide another axis and send valuator events on this axis.

The event itself is (reasonably) close to the core events, so most fields may be familiar in one way or another. It is a GenericEvent, so the type is always GenericEvent (35), extension is the X Input extension's opcode and evtype is the actual type of the event: XI_KeyPress, XI_KeyRelease, XI_ButtonPress, XI_ButtonRelease, XI_Motion. XI2 does not have proximity events like XI1 has, on the basis that if a device supports proximity, it should just provide an axis that reports proximity data (even if that axis has a range of 0-1).

Each event provides the device the event came from and the actual source the event originated from. So for applications that listen to master device events, deviceid is the id of the master device, and sourceid is the id of the physical device that just got moved/clicked/typed.

For button events, detail is the button number (after mapping applies of course). For key events, detail is the keycode. XI2 supports 32-bit keycodes, btw. For motion events, detail is 0. The flags field is a combination of various flags that apply for this event. Right now, the only defined flag is XIKeyRepeat for XI_KeyPress events. If this flag is set, the event is the result of an in-server key repeat instead of a physical key press (waiting for daniels to send me the patch for the server).

Each event includes root-absolute and window-relative coordinates with subpixel precision. For example, if you have your mouse slowed down by constant deceleration, you'll see the pointer's X coordinate move from 100.0 to 100.25, 100.5, 100.75, 101, etc. The same happens with devices that have their own coordinate range (except that that bit is missing in the server right now.).

XIDeviceEvents contain the button, modifier and valuator state in four different structs:

The buttons include the button state for each button on this device. Since we don't have any restrictions on the number of buttons in the protocol, the mask looks like this:

typedef struct { int mask_len; unsigned char *mask;} XIButtonState;

The mask_len specifies the length of the actual mask in bytes. The bit for button N is defined as (1 << N), if it is set then the button is currently logically down. Quite similar is the valuator state, except that the mask specifies which valuators are provided in the values array.

Again, mask_len is in bytes and for each bit set in mask, one double represents the current value of this valuator in this event. These coordinates are always in the device-specific coordinate system (screen coordinates for relative devices). To give you an example, if mask_len is 1 and bits 0 and 5 are set in mask, then values is an array size 2, with the values for axis 0 and axis 5.

Finally, the event contains the state of the modifier keys and the current XKB group info.

The base modifiers are the ones currently pressed, latched the ones pressed until a key is pressed that's configured to unlatch it (e.g. some shift-capslock interactions have this behaviour) and finally locked modifiers are the ones permanently active until unlocked (default capslock behaviour in the US layout). The effective modifiers are a bitwise OR of the three above - which is essentially equivalent to the modifiers state supplied in the core protocol events.The group state is a bit more complicated, since the effective group it is the arithmetic sum of all 3 after the group overflow handling is taken into account. The meaning of base, latched and locked is essentially the same otherwise.

Enter/leave and focus events

One of the biggest deficiencies of XI1 is the lack of enter/leave events for extended devices. XI2 provides both and they are essentially the same as the device events above with three extra fields: mode, focus and same_screen. Both enter/leave events and focus events are the same as core events squashed in an XI2 format. I recommend reading the core protocol spec for these events, it's much more verbose (and eloquent) than this blog.

Unsurprisingly, enter/leave events are generated separately for each device. While the core protocol has a quite funky model to ensure that applications aren't confused when multiple pointers exit or leave a window, the XI2 events are sent for each pointer, regardless of how many devices are currently in the window.

Property events

Property events have stayed the same, except that they use the XGenericEvent (and cookie) format now. Property events contain the property that changed, the deviceid and a field detailing what actually changed on this property (one of XIPropertyDeleted, XIPropertyCreated, XIPropertyModified).

Raw events

Raw events are something new. Normal input events are heavily processed by the server (clipped, accelerated, mapped to absolute, etc.). Raw events are essentially a container to forward the data the server works with (e.g. the data passed up by the driver) and thus do not contain state other than the new information. The three interesting fields are detail, valuators and raw_values.

That the detail for button events is the unmapped button number or the key code. Possible evtypes for raw events are XI_RawMotion, XI_RawKeyPress, XI_RawKeyRelease, XI_RawButtonPress and XI_RawButtonRelease.

Valuator information works in the same manner as in XIDeviceEvents and contains the transformed (i.e. accelerated) valuators as used in the server. The raw_values array provides the untransformed values as they were passed up from the driver. This is useful for applications that need to provide their own acceleration code (e.g. games).For example, the following bit shows the acceleration applied on each axis:

Since raw events do not have target windows they are delivered exclusively to all root windows. Thus, a client that registers for raw events on a standard client window will receive a BadValue from XISelectEvents(). Like normal events however, if a client has a grab on the device, then the event is delivered only to the grabbing client.

The data pointer is always the extension-specific event, depending on evtype. More on that in the next XI2 recipes post. Failure to call XFreeEventData() will result in a memory leak.

XIFreeEventData is gone

Previous examples featured XIFreeEventData, this call is now replaced by XFreeEventData in Xlib. They both do the same thing, so you can just run sed over your source files.

Less pointers in events

The removal of the size restrictions means a few pointers in events have changed into structs - nothing a search/replace can't fix. One example for this is the button state which used to be a pointer - now it's part of the XIDeviceEvent struct.

No XIEvent padding

This isn't related to cookies but it's quite important: XIEvents may not be used for passing into XNextEvent. The previous struct included padding to force the same size as an XEvent. Except not on 64 bit, so it was broken anyway. The padding is removed, so you must not pass XIEvents into XNextEvent() and similar functions (unless you are a big fan of scrambled memory).

Both are designed to deal with events delivered by servers supporting the X Generic Event extension (XGE). These events are longer than 32 bytes on the wire and can become rather large when they are unpacked into Xlib's event structures. Since Xlib has an internal maximum size for XEvents we cannot easily deal with XGE events without introducing memory leaks.

XGE event cookies

A new datatype has been introduced - the XGenericEventCookie. This datatype is essentially a wrapper to fit into Xlib but provide access to the actual event data.

It's simple enough and overlaps with XGenericEvents and of course XEvents:

The two interesting fields are the cookie and the data pointer. The cookie is simply a unique number assigned to each event as it is received. It serves to identify the event when data needs to be retrieved from the Xlib internal event storage. The data pointer is a pointer to the actual event data - its type is of whatever type the extension has specified for this event type (e.g. XIDeviceEvent for XI2 motion events).

Fetching cookie data

Retrieving an event through XNextEvent or similar retrieves a cookie event instead - with a data pointer NULL. The extra data can then be received by passing the cookie event into XGetEventData. XGetEventData returns True if the cookie has been fetched successfully or False for invalid cookies (including already claimed cookies) or events that aren't cookie events.

Once data has been obtained by the client, it becomes the client's responsibility to free this data with XFreeEventData. Failure to do so will leak memory. Unclaimed cookies are freed automatically by the library, so if you never call XGetEventData, memory doesn't leak.

XGetEventData and XFreeEventData are safe to be called with non-cookie events.

One cookie - one claim

The important thing about the cookies:

Each cookie can only ever be claimed once for each event.

XGetEventData and XFreeEventData are symmetrical

Each cookie returned by the library can be claimed exactly once, even if it represents the same actual event. In the following snippet, both XGetEventData calls return the cookie data, even though they are the same event.

Saturday, July 4, 2009

I still see bugreports that blame HAL for various things including my mouse buttons don't work", "the pointer jumps", and various other things. In none of these cases HAL is at fault. From the X server's point-of-view, HAL is merely a replacement for the xorg.conf.

The simple tasks HAL does for us in the X server is:

List all input devices (the equivalent to the InputDevice sections in the xorg.conf).

Nominate the appropriate driver for each input device (the equivalent to the Driver "..." line in each InputDevice section).

Provide user-configured extra options such as the keyboard layout (the equivalent to the Option "Foo" "bar" lines).

Note that 2 and 3 are a result of your local configuration files and not some random guesses.

So whenever it's unclear if a problem is in fact caused by HAL ask yourself: if you had a xorg.conf, could this problem be fixed by editing it? If not, then you need to report the bug against the input driver or the X server. Here's Fedora's rough checklist for reporting input bugs.

That HAL is being replaced by libudev is a completely different issue as well.

Monday, June 29, 2009

Last week I started writing the next part of the XI2 recipes series, this time detailing button/key/motion events etc. Except that while fixing various inconsistencies I ran into a wall put up by Xlib years ago. In short, Xlib only allows for event sizes of up to 96 bytes (on 32 bit). Previously I've gotten around that by storing pointers to extra memory freed by the client (see XIFreeEventData(3)). However, this not only introduces a potentially huge memory leak in "naive" clients but also makes the actual event structure be quite annoying. Having to dereference into three structs to get the button mask is just not particularly great.

So now I have to rethink how to deal with events on the client side (Xlib only, xcb is not affected, server isn't affected either).

Either way, this means two things:

the next XI2 tutorial will take another few days

there will likely be a fairly big API change in how XI2 clients can deal with events.

Tuesday, June 16, 2009

This post is part of a mini-series of various recipes on how to deal with the new functionality in XI2. The examples here are merely snippets, full example programs to summarize each part are available here.

In Part 1 I covered how to initialise and select for events and in Part 2 how to query for devices and monitor the hierarchy. In this part, I will cover how to get more information about the devices.

Device capabilities for master and slave devices

Once upon a time, mice were simply mice and keyboards were simply keyboards. Nowadays, some mice have 3 buttons, others have 8, some have two scrollwheels, etc. Tablets, touchscreens and touchpads behave like a mouse, but have their own coordinate systems, their own set of buttons etc.

Why is this information important anyway? The valuator (== axis) information is sent in device-dependent coordinates. So for example, a wacom tablet may send coordinates from 0 to 50000 instead of the screen resolution. Likewise, some device may have a particular button that other devices don't have. Since button numbers are sequential, button 8 one one device may have a completely different functionality than button 8 on another device.

In Part 2, I detailed the master/slave device hierarchy and explained that each event from a slave device is routed through the attached master device. In addition to that, whenever a slave device sends an event, the master device changes capabilities to reflect the current slave device. Assuming that two slave devices (one mouse, one graphics tablet) hang off one master device (i.e. cursor), the master device will thus look like either the mouse or the tablet - depending on who sent the last event. This switch happens before the actual event.

The order of things is (if we assume the mouse was the last used device):

Move the mouse

Mouse events are sent to the client

Move the tablet

The master changes capabilities to look like the tablet.

A DeviceChanged event is sent to notify clients about this change.

The move event and all following events originating from the tablet are sent to clients.

Move the mouse

The master changes capabilities to look like the mouse.

A DeviceChanged event is sent to notify clients about this change.

The move event and all following events originating from the mouse are sent to clients.

rinse, wash, repeat

Following the DeviceChanged events, the master device the client will always look like the physical device that is currently sending the events. And since clients should listen to master devices, this means that clients can actually adjust their UI depending on what physical device is in use. For example, a drawing program such as the GIMP could switch automatically between mouse mode and tablet mode. The drawback of this approach though is that depending on when you call XIQueryDevice the information may be different. You can mitigate this by monitoring XIDeviceChangedEvents (see below).

Key and valuator information reflects the currently used slave device. Button information - not quite as simple. A master device always has as many buttons as the slave device with the highest number of buttons. This allows combining multiple devices for button presses, e.g. button 1 on one device and button 3 on another device is a combined 1+3 button state on the master device. The master device will change dynamically to reflect the right number of buttons whenever a slave device is attached or detached. More about that below.

Querying extra device information

So, assuming you just called XIQueryDevice and you got back a set of devices and XIDeviceInfo structs. The device capabilities are in the device classes, much in the same matter as in XI 1.

Right now, three classes are defined: button class, key class, and valuator class. A device may have zero or one button class, zero or one key class and zero or more valuator classes. A typical mouse will have a button class and two valuator classes (one for the x axis and one for the y axis).

All classes share the same two fields, type and sourceid. The sourceid denotes the device this particular class came from. For slave devices, this is the device id. For master devices, this is the device id of the slave that has sent the event through the server.The current X server implementation always copies all classes from the slave device into the master device, so all sourceids of one device are the same. In the future, we may add partial class copying and you are not guaranteed that the button class is from the same physical device as the valuator class, etc.

The first field is always XIButtonClass, and num_buttons tells us how many buttons the device actually has. This is complemented by the array of button labels. Each atom defines a button label (note that this label might be None, an Atom of zero, if the device cannot label its buttons). So for a standard mouse you should get num_buttons of at least 3 and the labels are (usually) the atoms for "Button Left", "Button Middle", "Button Right". Finally, the state field has a bit set for each button currently down on this device.

You can now write user interfaces that interpret the buttons as what they are. Previously, you had to apply magic to guess which button would do what, now the devices actually tell you what each button does (provided the kernel drivers do). The order of the buttons array is always the physical order, regardless of any button mapping.

It is a bit trickier on the master device. Since the master device buttons are the union of all attached slave devices, the number of buttons is always the highest number of buttons on any slave device. For example, if three devices with 3, 9 and 20 buttons, the master device will have 20 buttons. If you press button 20 on device 3 and button 3 on device 1, the combined state is 20+3 on the master device. There is one downside to this union: if the devices have different button labels, the labels may change between press and release. For example, pressing button 1 on device 1 ("Button Left") will result in the DeviceChangedEvent with "Button Left" as label for 1. If you then press button 1 on device 2 ("Button 0"), no event is sent from the MD (because the button is already down). Now release device 1 (no release event, the button is still down), then release device 2. At the time of the release event, the label for button 1 was "Button 0", even though it was "Button Left" at the time of the press.

Essentially the same as the button class info, except that it specifies numerical keycodes instead of button labels. This is somewhat different to the core approach that gives you a min_keycode and a max_keycode (which incidentally must always be the same anyway). In XI2, a keyboard with three keys will only give you those three keycodes it actually sends.

Number is the physical number of the axis (you're not guaranteed to get them in order from the server). The label is - like button labels - an Atom specifying the axis type (e.g. "Rel X"). Min, max and resolution give you the minimum and maximum axis value and the resolution in units/m, mode is either absolute or relative depending on the device.Value is the current value of the valuator - which of course only makes sense if a valuator is an absolute axis.

That's it. With this information, you can now get all necessary info about a device.

Monitoring device changes

As mentioned above, the XIDeviceChangedEvent notifies you when a device has changed. Its information is mostly the same as what XIQueryDevice provides, with two extra fields: reason and sourceid. The sourceid simply specifies the device that triggered this device. In some cases, this is the slave device, in some cases sourceid and deviceid are the same. The reason field specifies why this event was sent. Right now, there's two reasons: XISlaveSwitch and XIDeviceChange. The former is sent whenever you change between physical devices attached to the same master. The latter is sent whenever a device changes capabilities on its own accord. One example for this is the master device changing the number of available buttons when a slave device is attached or detached.

Monday, June 15, 2009

About two years ago, I dabbled a bit with multitouch. The low-level parts were a hack, a custom kernel driver that spat out a special protocol, the X driver custom-written for this kernel driver and then a new type of events in the X server. The new events were the really interesting bits, but I had to ditch it because of lack of time.

Now, thanks mainly to Henrik Rydberg, we have a multitouch protocol in 2.6.30. The protocol is based on the event protocol that the evdev driver uses and adds the capabilities to track multiple touch points over time.

And now Stephane Chatty from the Interactive Computing Lab at ENAC sent me this link, showing off a few demos that use this new multi-touch API.

The good thing about this? A normal mouse or keyboard event usually goes hardware -> kernel -> X driver -> X server -> toolkit -> application. For multitouch, we now have the low-level bits in places. The demos above are simply missing the X bits in this equation. Bryan Goldstein has taken on the task to resurrect the blob branch mentioned before, so with a bit of luck we can beat that into shape, rebase it to current master and work on completing it. At which point multitouch is fully integrated with the desktop rather than relying on custom hardware-specific hacks at every point in the stack.

Note: don't hold your breath for that to arrive soon, at the moment I'm looking at an 8 months+ timeframe until a release with these features.

Tuesday, June 9, 2009

This post is part of a mini-series of various recipes on how to deal with the new functionality in XI2. The examples here are merely snippets, full example programs to summarize each part are available here.

A word of warning: XI2 is still in flux and the code documented here may change before the 2.0 release.

In Part 1 I covered how to initialise and select for events. In this part, I will cover how to query and modify the device hierarchy.

What is the device hierarchy?

The device hierarchy is the tree of master and slave devices. A master pointer is represented by a visible cursor, a master keyboard is represented by a keyboard focus. Slave pointers and keyboards are (usually) physical devices attached to one master device.

The distinction may sound odd first but we've been using it for years. A computer has two sets of interfaces: Physical interfaces are what we humans employ to interact with the computer (e.g. mouse, keyboard, touchpad). Virtual interfaces is what applications actually see. Think of it: if you have a laptop with two physical devices (a mouse and a touchpad) you're still only controlling one virtual device (the cursor). So although you have two very different physical interfaces, the application isn't aware of it at all.

This works mostly fine as long as you have only one virtual interface per type but it gets confusing really quickly if you have multiple users on the same screen at the same time. Hence the explicit device hierarchy in XI2.

We call virtual devices master devices, and physical devices slave devices. Note that there are exceptions where a slave device is a emulation of a physical device.

A device may be of one of five device types:

Master pointers are devices that represent a cursor on the screen. One master pointer is always available (the "Virtual core pointer"). Master pointers usually send core events, meaning they appear like a normal pointer device to non-XI applications.

Master keyboards are devices that represent a keyboard focus. One master keyboard is always available (the "Virtual core keyboard"). Master keyboards usually send core events, meaning they appear like a normal keyboard device to non-XI applications.

Slave pointers are pointer devices that are attached to a master pointer. Slave pointers never send core events, they are invisible to non-XI applications and can only interact with a core application if they are attached to a master device (in which case it's actually the master device that interacts)

Slave keyboards are keyboard devices that are attached to a master keyboard. Slave keyboards never send core events, they are invisible to non-XI applications and can only interact with a core application if they are attached to a master device (in which case it's actually the master device that interacts)

Floating slaves are slave devices that are currently not attached to a master device. They can only be used by XI or XI2 applications and do not have a visible cursor or keyboard focus.

So what does attachment mean? A master device cannot generate events by itself. If a slave device is attached to a master device, then each event that the slave device generates is also passed through the master device. This is how the X server works since 1.4, if you click a mouse button, the server sends a click event from the mouse and from the "virtual core pointer".

A floating device on the other hand does not send events through the master device. They don't control a visible cursor or keyboard focus and any application listening to a floating slave device needs to control focus and cursor manually. One example where floating slaves are useful is the use of graphics tablets in the GIMP (where the area of the tablet is mapped to the canvas).

For most applications, you will only ever care about master devices.

Querying the device hierarchy

At some point, clients may need to know which devices are actually present in the system right now.

As with event selection, XIAllDevices and XIAllMaster devices are valid as device ID parameter. Alternatively, just supply the device ID of the device you're interested in.

The attachment simply states which device this device is attached to. For master pointers, this is always the paired master keyboard and the other way round. For floating slaves, this value is undefined.

Now we know the layout of the hierarchy, including how many master devices and physical devices are actually present.

XIQueryDevice returns more information such as the capabilities of each device. I'll leave this for another post (mainly because I just found a deficiency in the XI2 protocol that needs to be fixed first. :)

Hierarchy events

Now, knowing the hierarchy is only useful for a short time as another client may change it at any point in time. So your client should listen for hierarchy events. These events are sent to all windows, so it doesn't really matter where you register. The traditional approach is to register on the root window.

The first couple of fields are standard. The flags field lists all changes that occurred, a combination of:

XIMasterAdded

A new master device has been created.

XIMasterRemoved

A master device has been deleted.

XISlaveAdded

A new slave device has been added (e.g. plugged in).

XISlaveRemoved

A slave device has been removed (e.g. unplugged).

XISlaveAttached

A slave device has been attached to a master device.

XISlaveDetached

A slave device has been detached (set floating).

XIDeviceEnabled

A device has been enabled (i.e. it may send events now)).

XIDeviceDisabled

A device has been disabled (i.e. it will not send events until enabled again).

The info field contains the details for all devices currently present and those removed with this change. Each info->flags states what happened to this particular device. For example, one could search for the slave device that just got removed by searching for the XISlaveRemoved flag.

After this change is performed, there will be two new master devices: "My new master pointer" and "My new master keyboard" (remember, master devices always come in pairs). They will send core events (i.e. they are usable in non-XI applications) and they will be enabled immediately. If you registered for hierarchy events, you will get and event with XIMasterAdded and XIDeviceEnabled flags.

This has the effect of removing device 10 (provided it is a master device) and also its paired master device. Any slave devices currently attached to device 10 or it's paired master device will be reattached to device 2 (for pointers) or device 3 (for keyboards).

Attaching and detaching slave devices is equally simple and shouldn't require a lot of explanation. If you want to change a slave from one master to the other, a single attach command is sufficient, the slave device does not need to be detached first. Note that attaching a slave device will also enable it if it is currently disabled.

Finally, if you submit multiple commands in one go, they are applied in order until all of them are processed or an error occurs. If an error occurs, the client is notified and processing stops, but already processed commands will take effect. Regardless of an error, a hierarchy event is sent to all clients with the new state.

That's it. You now know how to stay aware of the device hierarchy and how to modify it. In the next part, I'll discuss how to get more information about the devices.

Sunday, June 7, 2009

XI2 is now merged into master. Over the next couple of days, I will post various recipes on how to deal with the new functionality. The examples here are merely snippets, full example programs to summarize each part are available here.

In this first part, I will cover general things, initialisation and event selection.

Why XI2?

One of the major reasons for XI2 was the merge of MPX into the server. The current X Input Extension version 1.5 is quite limiting and extending it to fully support MPX has been tough. Programming against it is mostly annoying, so a replacement was sought that makes it easy to program against multiple devices and is flexible enough to cover future use-cases.

The XI2 protocol is fairly conservative, adding only a few requests and events but it also leaves room for more. XI2 and it's APIs are somewhat closer to the core protocol paradigms. The big differences to XI from a client's perspective are: Calls take a deviceid as parameter, there is no need to open devices and event types and masks are constant (more of this below).

Right now, the only bindings for XI2 are Xlib bindings. Nonetheless, I encourage you to start playing with XI2 and think how applications may use it. By testing it now, you can help identify problems and missing bits with the current requests before version 2.0 is released. And of course, fixing issues before a release benefits everyone.

MPX

MPX allows the use of multiple cusors and keyboard foci simultaneously. This again leads to pretty funky user interfaces - bi-manual, multi-user, you name it. It also throws a number of assumptions about current GUIs out of the window but I'll get to that some other time.

MPX introduces an explicit master/slave device hierarchy. The easiest way to remember which is which is: a physical device is a slave device, and a cursor or keyboard focus is a master device.

Slave devices are attached to one master and each time a slave generates an event, this event is routed through the master device to the client. Master devices always come in twos (one cursor, one keyboard focus) and these two are paired.

So a common setup may be a laptop with four master devices (i.e. two cursors and two keyboard foci), the touchpad controls and built-in keyboard control the first pair of master devices, and a USB wireless combo controls the second pair of master devices. The standard setup is to have one pair of master devices, and all devices are attached to this pair's master pointer or master keyboard. Which, incidentally (or not :), is exactly the same setup as we've had since server 1.4. It also means that MPX only takes effect if you create a new pair of master devices, otherwise it's invisible.

I will cover more of MPX in a follow-up post. From most clients' perspective, master device are the devices that matter, only configuration tools and some other specialised apps need to worry about slave devices.

First the client connects to the X server, then asks whether the extension is available. XQueryExtension not only tells us whether the X Input Extension is supported, but it also returns the opcode of the extension. This opcode is needed for event parsing (all XI2 events use this opcode). This opcode is set when the server started, so you cannot rely on it being constant across server sessions.

Finally, we announce that we support XI 2.0 and the server returns the version it supports. Although XIQueryVersion is a pure XI2 call, it is implemented so that it will not result in a BadRequest error if you run it against a server that doesn't support XI2 (this part is implemented in Xlib, so if you use xcb this behaviour is not the same).

XIQueryVersion not only returns the supported version, the server will also store the version your client supports. As XI2 progresses it becomes important that you use this call as the server may treat the client differently depending on the supported version.

Selecting for events

After initialising the GUI, a client usually needs to select for events. This can be achieved with XISelectEvents.

An XIEventMask defines the mask for one device. A mask is defined as (1 << event type) and the matching bits must be set in the eventmask.mask field. The size of eventmask.mask can be arbitrary as long as it has enough bits for the masks you need to set. The XI_LASTEVENT define specifies the highest event type in the current XI2 protocol version, so you can use this to determine the mask size. In this example, we only need 6 bits, so a 1 byte mask is enough.

XISelectEvents takes multiple event masks, so you can submit many of these (e.g. one for each device) in one go.

As shown above, each XIEventMask takes a deviceid. This is either the numeric ID for a device or the special ID of XIAllDevices or XIAllMasterDevices. If you select an event mask for XIAllDevices, all devices will send the selected events. XIAllMasterDevices does the same, but only for master devices.

Event masks for XIAllDevices and XIAllMasterDevices are in addition to device-specific event masks. For example, if you select for button press on XIAllDevices, button release on XIAllMasterDevices and motion events on device 2, device 2 will report all three types of events to the client (the effective mask is a bitwise OR).

Furthermore, XIAllDevices and XIAllMasterDevices always apply, even if the device has been added after the client has selected for events. So you only need to issue the event mask once, regardless of how many devices are currently connected and how many will be connected in the future.

A more detailed analysis of the data in each event will be described in a later post.Note that you need to check against the opcode of the X Input Extension as returned by XQueryExtension(3) (as discussed before).

That's it. With this knowledge, you can already go and write simple programs that listen for events from any device. In the next part, I will cover how to list the input devices and listen for changes in the master/slave device hierarchy.