Tuesday, December 20, 2016

This is a common source of confusion: the legacy X.Org driver for touchpads is called xf86-input-synaptics but it is not a driver written by Synaptics, Inc. (the company).

The repository goes back to 2002 and for the first couple of years it Peter Osterlund was the sole contributor. Back then it was called "synaptics" and really was a "synaptics device" driver, i.e. it handled PS/2 protocol requests to initialise Synaptics, Inc. touchpads. Evdev support was added in 2003, punting the initialisation work to the kernel instead. This was the groundwork for a generic touchpad driver. In 2008 the driver was renamed to xf86-input-synaptics and relicensed from GPL to MIT to take it under the X.Org umbrella. I've been involved with it since 2008 and the official maintainer since 2011.

For many years now, the driver has been a generic touchpad driver that handles any device that the Linux kernel can handle. In fact, most bugs attributed to the synaptics driver not finding the touchpad are caused by the kernel not initialising the touchpad correctly. The synaptics driver reads the same evdev events that are also handled by libinput and the xf86-input-evdev driver, any differences in behaviour are driver-specific and not related to the hardware. The driver handles devices from Synaptics, Inc., ALPS, Elantech, Cypress, Apple and even some Wacom touch tablets. We don't care about what touchpad it is as long as the evdev events are sane.

Synaptics, Inc.'s developers are active in kernel development to help get new touchpads up and running. Once the kernel handles them, the xorg drivers and libinput will handle them too. I can't remember any significant contribution by Synaptics, Inc. to the X.org synaptics driver, so they are simply neither to credit nor to blame for the current state of the driver. The top 10 contributors since August 2008 when the first renamed version of xf86-input-synaptics was released are:

There's a long tail of other contributors but the top ten illustrate that it wasn't Synaptics, Inc. that wrote the driver. Any complaints about Synaptics, Inc. not maintaining/writing/fixing the driver are missing the point, because this driver was never a Synaptics, Inc. driver. That's not a criticism of Synaptics, Inc. btw, that's just how things are. We should have renamed the driver to just xf86-input-touchpad back in 2008 but that ship has sailed now. And synaptics is about to be superseded by libinput anyway, so it's simply not worth the effort now.

The other reason I included the commit count in the above: I'm also the main author of libinput. So "the synaptics developers" and "the libinput developers" are effectively the same person, i.e. me. Keep that in mind when you read random comments on the interwebs, it makes it easier to identify people just talking out of their behind.

Monday, December 19, 2016

A long-standing criticism of libinput is its touchpad acceleration code, oscillating somewhere between "terrible", "this is bad and you should feel bad" and "I can't complain because I keep missing the bloody send button". I finally found the time and some more laptops to sit down and figure out what's going on.

I recorded touch sequences of the following movements:

super-slow: a very slow movement as you would do when
pixel-precision is required. I recorded this by effectively slowly rolling
my finger. This is an unusual but sometimes required interaction.

slow: a slow movement as you would do when you need to hit a
target several pixels across from a short distance away, e.g. the Firefox
tab close button

medium: a medium-speed movement though probably closer to the
slow side. This would be similar to the movement when you move 5cm across
the screen.

medium-fast: a medium-to-fast speed movement. This would be
similar to the movement when you move 5cm across the screen onto a large
target, e.g. when moving between icons in the file manager.

fast: a fast movement. This would be similar to the movement when
you move between windows some distance apart.

flick: a flick movement. This would be similar to the movement when
you move to a corner of the screen.

Note that all these are by definition subjective and somewhat dependent on
the hardware. Either way, I tried to get something of a reasonable subset.

Next, I ran this through a libinput 1.5.3 augmented with printfs in the
pointer acceleration code and a script to post-process that output.
Unfortunately, libinput's pointer acceleration internally uses units
equivalent to a 1000dpi mouse and that's not something easy to understand.
Either way, the numbers themselves don't matter too much for analysis right now and I've now switched everything to mm/s anyway.

A note ahead: the analysis relies on libinput recording an evemu replay.
That relies on uinput and event timestamps are subject to a little bit of drift
across recordings. Some differences in the before/after of the same recording can
likely be blamed on that.

The graph I'll present for each recording is relatively simple, it shows the velocity and the matching factor.The x axis is simply the events
in sequence, the y axes are the factor and the velocity (note: two different
scales in one graph). And it colours in the bits that see some type of acceleration. Green
means "maximum factor applied", yellow means "decelerated". The purple "adaptive" means per-velocity acceleration is applied.
Anything that
remains white is used as-is (aside from the constant deceleration). This
isn't really different to the first graph, it just shows roughly the same data
in different colours.

Interesting numbers for the factor are 0.4 and 0.8. We have a constant acceleration of 0.4 on touchpads, i.e. a factor of 0.4 "don't
apply acceleration", the latter is "maximum factor". The maximum factor is twice as big as the normal
factor, so the pointer moves twice as fast. Anything below 0.4 means we decelerate the pointer, i.e. the pointer moves slower than the finger.

The super-slow movement shows that the factor is, aside from the
beginning always below 0.4, i.e. the sequence sees deceleration applied.
The takeaway here is that acceleration appears to be doing the right thing,
slow motion is decelerated and while there may or may not be some tweaking to do, there
is no smoking gun.

Super slow motion is decelerated.

The slow movement shows that the factor is almost always 0.4, aside
from a few extremely slow events. This indicates that for the slow speed,
the pointer movement maps exactly to the finger movement save for our
constant deceleration. As above, there is no indicator that we're doing something seriously wrong.

Slow motion is largely used as-is with a few decelerations.

The medium movement gets interesting. If we look at the factor
applied, it changes wildly with the velocity across the whole range between
0.4 and the maximum 0.8. There is a short spike at the beginning where it
maxes out but the rest is accelerated on-demand, i.e. different finger
speeds will produce different acceleration. This shows the crux of what a
lot of users have been complaining about - what is a fairly slow motion
still results in an accelerated pointer. And because the acceleration
changes with the speed the pointer behaviour is unpredictable.

In medium-speed motion acceleration changes with the speed and even maxes out.

The medium-fast movement shows almost the whole movement maxing out
on the maximum acceleration factor, i.e. the pointer moves at twice the
speed to the finger. This is a problem because this is roughly the speed
you'd use to hit a "mentally preselected" target, i.e. you know exactly
where the pointer should end up and you're just intuitively moving it
there. If the pointer moves twice as fast, you're going to overshoot and
indeed that's what I've observed during the touchpad
tap analysis userstudy.

Medium-fast motion easily maxes out on acceleration.

The fast movement shows basically the same thing, almost the whole
sequence maxes out on the acceleration factor so the pointer will move twice
as far as intuitively guessed.

Fast motion maxes out acceleration.

So does the flick movement, but in
that case we want it to go as far as possible and note that the speeds
between fast and flick are virtually identical here. I'm not sure if that's
me just being equally fast or the touchpad not quite picking up on the
short motion.

Flick motion also maxes out acceleration.

Either way, the takeaway is simple: we accelerate too soon and there's a
fairly narrow window where we have adaptive acceleration, it's very easy to
top out. The simplest fix to get most touchpad movements working well is
to increase the current threshold on when acceleration applies. Beyond that
it's a bit harder to quantify, but a good idea seems to be to stretch out
the acceleration function so that the factor changes at a slower rate as the
velocity increases. And up the acceleration factor so we don't top out and
we keep going as the finger goes faster. This would be the intuitive
expectation since it resembles physics (more or less).

There's a set of patches on the list now that does exactly that. So let's see
what the result of this is. Note ahead: I also switched everything from mm/s which causes some numbers to shift slightly.

The super-slow motion is largely unchanged though the velocity scale changes
quite a bit. Part of that is that the new code has a different unit which,
on my T440s, isn't exactly 1000dpi. So the numbers shift and the result of
that is that deceleration applies a bit more often than before.

Super-slow motion largely remains the same.

The slow motions are largely unchanged but more deceleration is now
applied. Tbh, I'm not sure if that's an artefact of the evemu replay, the
new accel code or the result of the not-quite-1000dpi of my touchpad.

Slow motion largely remains the same.

The medium motion is the first interesting one because that's where
we had the first observable issues. In the new code, the motion is almost
entirely unaccelerated, i.e. the pointer will move as the finger does.
Success!

Medium-speed motion now matches the finger speed.

The same is true of the medium-fast motion. In the recording the first few events were past the new thresholds so some acceleration is applied, the rest of the motion matches finger motion.

Medium-fast motion now matches the finger speed except at the beginning where some acceleration was applied.

The fast and flick motion are largely identical in having the
acceleration factor applied to almost the whole motion but the big change is
that the factor now goes up to 2.3 for the fast motion and 2.5 for the flick
motion, i.e. both movements would go a lot faster than before. In the graphics below you still see the blue area marked as "previously max acceleration factor" though it does not actually max out in either recording now.

Fast motion increases acceleration as speed increases.

Flick motion increases acceleration as speed increases.

In summary, what this means is that the new code accelerates later but when
it does accelerate, it goes faster. I tested this on a T440s, a T450p and an
Asus VivoBook with an Elantech touchpad (which is almost unusable with
current libinput). They don't quite feel the same yet and I'm not
happy with the actual acceleration, but for 90% of 'normal' movements the
touchpad now behaves very well. So at least we go from "this is terrible" to
"this needs tweaking". I'll go check if there's any champagne left.

Monday, December 12, 2016

A short while ago, I asked a bunch of people for long-term touchpad
usage data (specifically: evemu recordings). I currently have 25 sets of data, the shortest of which has
9422 events, the longest of which has 987746 events. I requested that
evemu-record was to be run in the background while people use their touchpad normally. Thus the
data is quite messy, it contains taps, two-finger scrolling, edge scrolling,
palm touches, etc. It's also raw data from the touchpad, not processed by libinput. Some care has to be taken with analysis, especially since
it is weighted towards long recordings. In other words, the user with 987k
events has a higher influence than the user with 9k events. So the data is useful for looking for patterns that can be
independently verified with other data later. But it's also useful for
disproving hypothesis, i.e. "we cannot do $foo because some users' events show $bla".

One of the things I've looked into was tapping.
In libinput, a tap has two properties: a time threshold and a movement
threshold. If the finger is held down longer than 180ms
or it moves more than 3mm it is not a tap. These
numbers are either taken from synaptics or just guesswork (both, probably).
The need for a time-based threshold is obvious: we don't know whether the user is
tapping until we see the finger up event. Only if that doesn't happen within
a given time we know the user simply put the finger down. The movement
threshold is required because small movements occur while tapping, caused by
the finger really moving (e.g. when tapping shortly before/after a pointer
motion) or by the finger center moving (as the finger
flattens under pressure, the center may move a bit). Either way, these
thresholds delay real pointer movement, making the pointer less reactive
than it could be. So it's in our interest to have these thresholds low to
get reactive pointer movement but as high as necessary to
have reliable tap detection.

General data analysis

Let's look at the (messy) data. I wrote a script to calculate the time delta and
movement distance for every single-touch sequence, i.e. anything with two or
more fingers down was ignored. The script used a range of 250ms and
6mm of movement, discarding any sequences outside those thresholds. I also
ignored anything in the left-most or right-most 10% because it's likely that
anything that looks like a tap is a palm interaction [1]. I ran the script
against those files where the users reported that they use tapping (10
users) which gave me 6800 tap sequences. Note that the ranges are purposely
larger than libinput's to detect if there was a significant amount of attempted taps
that exceed the current thresholds and would be misdetected as non-taps.

Let's have a look at the results. First, a simple picture that merely prints
the start location of each tap, normalised to the width/height of the
touchpad. As you can see, taps are primarily clustered around the center but
can really occur anywhere on the touchpad. This means any attempt at
detecting taps by location would be unreliable.

You can easily see the empty areas in the left-most and right-most 10%, that is an artefact of the filtering.

The analysis of time is more interesting: There are spikes around the 50ms
mark with quite a few outliers going towards 100ms forming what looks like a
narrow normal distribution curve. The data points are overlaid with markers
for the mean [2], the 50 percentile, the 90 percentile
and the 95 percentile [3]. And the data says: 95% of events fall below
116ms. That's something to go on.

Times between touch down and touch up for a possible tap event.

Note that we're using a 250ms timeout here and thus even look at touches
that would not have been detected as tap by libinput. If we reduce to the
180ms libinput uses, we get a 95% percentile of 98ms, i.e. "of all taps currently detected as taps, 95% are 98ms or shorter".

The analysis of distance is similar: Most of the tap sequences have little
to no movement, with 50% falling below 0.2mm of movement. Again the
data points are overlaid with markers for the mean, the 50 percentile,
the 90 percentile and the 95 percentile. And the data says: 95% of events
fall below 1.8mm. Again, something to go on.

Movement between the touch down and the touch up event for a possible tap (10 == 1mm)

Note that we're using a 6mm threshold here and thus even look at touches
that would not have been detected as tap by libinput. If we reduce to the
3mm libinput uses, we get a 95% percentile of 1.2mm, i.e. "of all taps currently detected as taps, 95% move 1.2mm or less".

Now let's combine the two. Below is a graph mapping times and distances from
touch sequences. In general, the longer the time, the longer the more
movement we get but most of the data is in the bottom left. Since doing
percentiles is tricky on 2 axes, I mapped the respective axes individually.
The biggest rectangle is the 95th percentile for time and distance, the
number below shows how many data points actually fall into this rectangle.
Looks promising, we still have a vast majority of touchpoints fall into
the respective 95 percentiles though the numbers are slightly lower than the individual axes
suggest.

Time to distance map for all possible taps

Again, this is for the 250ms by 6mm movement. About 3.3% of the events fall into the area
between 180ms/3mm and 250ms/6mm. There is a chance that some of the touches have have been short, small movements, we
just can't know by from data.

So based on the above, we learned one thing: it would not be reliable to
detect taps based on their location. But we also suspect two things now: we
can reduce the timeout and movement threshold without sacrificing a lot of
reliability.

Verification of findings

Based on the above, our hypothesis is: we can reduce the timeout to 116ms
and the threshold to 1.8mm while still having a 93% detection reliability.
This is the most conservative reading, based on the extended thresholds.

To verify this, we needed to collect tap data from multiple users in a
standardised and reproducible way. We wrote a basic website that displays
5 circles (see the screenshot below) on a canvas and asked a bunch of co-workers in two
different offices [4] to tap them. While doing so, evemu-record was running in
the background to capture the touchpad interactions. The touchpad was the
one from a Lenovo T450 in both cases.

Screenshot of the <canvas> that users were asked to perform the taps on.

Some users ended up clicking instead of tapping and we had to discard
those recordings. The total number of useful recordings was 15 from the
Paris office and 27 from the Brisbane office. In total we had 245 taps (some
users missed the circle on the first go, others double-tapped).

We asked each user three questions: "do you know what tapping/tap-to-click
is?", "do you have tapping enabled" and "do you use it?". The answers are
listed below:

Do you know what tapping is? 33 yes, 12 no

Do you have tapping enabled? 19 yes, 26 no

Do you use tapping? 10 yes, 35 no

I admit I kinda screwed up the data collection here because it includes
those users whose recordings we had to discard. And the questions could've been better. So I'm not going to go into
too much detail. The only useful thing here though is: the majority of users
had tapping disabled and/or don't use it which should make any potential
learning effect disappear[5]

Ok, let's look at the data sets, same scripts as above:

Times between touch down and touch up for tap events

Movement between the touch down and the touch up events of a tap (10 == 1mm)

95th percentile for time is 87ms. 95th percentile for distance is 1.09mm.
Both are well within the numbers we expected we saw above. The combined
diagram shows that 87% of events fall within the 87ms/10.9mm box.

Time to distance map for all taps

The few outliers here are close enough to the edge that expanding the box to
to 100ms/1.3mm we get more than 95%. So it appears that our hypothesis is
correct, reducing the timeout to 116ms and 1.8mm will have a 95% detection
reliability. Furthermore, using the clean data it looks like we can use a
lower threshold than previously assumed and still get a good detection
ratio. Specifically, data collected in a controlled environment across 42 different users of varying familiarity with touchpad tapping shows that 100ms and 1.3mm gets us a 95% detection rate of taps.

What does this mean for users?

Based on the above, the libinput thresholds will be reduced to 100ms and 1.3mm.
Let's see how we go with this and then we can increase it in the
future if misdetection is higher than expected. Patches will on the
wayland-devel list shortly.

For users that don't have tapping enabled, this will not change anything.
All users who have tapping enabled will see a more responsive cursor on small
movements as the time and distance thresholds have been significantly
reduced. Some users may see a drop in tap detection rate. This is
hopefully a subconscious enough effect that those users learn to tap faster
or with less movement. If not, we have to look at it separately and see how
we can deal with that.

If you find any issues with the analysis above, please let me know.

[1] These scripts analyse raw touchpad data, they don't benefit from
libinput's palm detection
[2] Note: mean != average, the mean is less affected by strong outliers.
look it up, it's worth knowing
[3] X percentile means X% of events fall below this value
[4] The Brisbane and Paris offices. No separate analysis was done, so it is unknown whether close proximity to
baguettes has an effect to tap behaviour
[5] i.e. the effect of users learning how to use a system that doesn't work
well out-of-the-box. This may result in e.g. quicker taps from those that
are familiar with the system vs those that don't.

Wednesday, December 7, 2016

xinput is a tool to query and modify X input device properties (amongst other things). Every so-often someone-complains about it's non-intuitive interface, but this is where users are mistaken: xinput is a not a configuration UI. It is a DUI - a developer user interface [1] - intended to test things without having to write custom (more user-friendly) for each new property. It is nothing but a tool to access what is effectively a key-value store. To use it you need to know not only the key name(s) but also the allowed formats, some of which are only documented in header files. It is intended to be run under user supervision, anything it does won't survive device hotplugging. Relying on xinput for configuration is the same as relying on 'echo' to toggle parameters in /sys for kernel configuration. It kinda possibly maybe works most of the time but it's not pretty. And it's not intended to be, so please don't complain to me about the arcane user interface.

[1] don't do it, things will be a bit confusing, you may not do the right thing, you can easily do damage, etc. A lot of similarities... ;)

Tuesday, December 6, 2016

This post mostly affects developers of desktop environments/Wayland compositors. A systemd pull request was merged to add two new properties to some keyboards: XKB_FIXED_LAYOUT and XKB_FIXED_VARIANT. If set, the device must not be switched to a user-configured layout but rather the one set in the properties. This is required to make fake keyboard devices work correctly out-of-the-box. For example, Yubikeys emulate a keyboard and send the configured passwords as key codes matching a US keyboard layout. If a different layout is applied, then the password may get mangled by the client.

Since udev and libinput are sitting below the keyboard layout there isn't much we can do in this layer. This is a job for those parts that handle keyboard layouts and layout configurations, i.e. GNOME, KDE, etc. I've filed a bug for gnome here, please do so for your desktop environment.

If you have a device that falls into this category, please submit a systemd patch/file a bug and cc me on it (@whot).

Monday, December 5, 2016

This post applies to most tools that interface with the X server and change settings in the server, including xinput, xmodmap, setxkbmap, xkbcomp, xrandr, xsetwacom and other tools that start with x. The one word to sum up the future for these tools under Wayland is: "non-functional".

An X window manager is little more than an innocent bystander when it comes to anything input-related. Short of handling global shortcuts and intercepting some mouse button presses (to bring the clicked window to the front) there is very little a window manager can do. It's a separate process to the X server and does not receive most input events and it cannot affect what events are being generated. When it comes to input device configuration, any X client can tell the server to change it - that's why general debugging tools like xinput work.

A Wayland compositor is much more, it is a window manager and the display server merged into one process. This gives the compositor a lot more power and responsibility. It handles all input events as they come out of libinput and also manages device's configuration. Oh, and instead of the X protocol it speaks Wayland protocol.

The difference becomes more obvious when you consider what happens when you toggle a setting in the GNOME control center. In both Wayland and X, the control center toggles a gsettings key and waits for some other process to pick it up. In both cases, mutter gets notified about the change but what happens then is quite different. In GNOME(X), mutter tells the X server to change a device property, the server passes that on to the xf86-input-libinput driver and from there the setting is toggled in libinput. In GNOME(Wayland), mutter toggles the setting directly in libinput.

Since there is no X server in the stack, the various tools can't talk to it. So to get the tools to work they would have to talk to the compositor instead. But they only know how to speak X protocol, and no Wayland protocol extension exists for input device configuration. Such a Wayland protocol extension would most likely have to be a private one since the various compositors expose device configuration in different ways. Whether this extension will be written and added to compositors is uncertain, I'm not aware of any plans or even intentions to do so (it's a very messy problem). But either way, until it exists, the tools will merely shout into the void, without even an echo to keep them entertained. Non-functional is thus a good summary.

pastebins are useful for dumping large data sets whenever the medium of conversation doesn't make this easy or useful. IRC is one example, or audio/video conferencing. But pastebins only work when the other side looks at the pastebin before it expires, and the default expiry date for a pastebin may only be a few days.

This makes them effectively useless for bugs where it may take a while for the bug to be triaged and the assignee to respond. It may take even longer to figure out the source of the bug, and if there's a regression it can take months to figure it out. Once the content disappears we have to re-request the data from the reporter. And there is a vicious dependency too: usually, logs are more important for difficult bugs. Difficult bugs take longer to fix. Thus, with pastebins, the more difficult the bug, the more likely the logs become unavailable.

All useful bug tracking systems have an attachment facility. Use that instead, it's archived with the bug and if a year later we notice a regression, we still have access to the data.

If you got here because I pasted the link to this blog post, please do the following: download the pastebin content as raw text, then add it as attachment to the bug (don't paste it as comment). Once that's done, we can have a look at your bug again.

Tuesday, November 29, 2016

I pushed the patch to require resolution today, expect this to hit the general public with libinput 1.6. If your graphics tablet does not provide axis resolution we will need to add a hwdb entry. Please file a bug in systemd and CC me on it (@whot).

How do you know if your device has resolution? Run sudo evemu-describe against the device node and look for the ABS_X/ABS_Y entries:

Since Fedora 22, xorg-x11-drv-libinput is the preferred input driver. For historical reasons, almost all users have the xorg-x11-drv-synaptics package installed. But to actually use the synaptics driver over xorg-x11-drv-libinput requires a manually dropped xorg.conf.d snippet. And that's just not ideal. Unfortunately, in DNF/RPM we cannot just say "replace the xorg-x11-drv-synaptics package with xorg-x11-drv-libinput on update but still allow users to install xorg-x11-drv-synaptics after that".

So the path taken is a package rename. Starting with Fedora 26, xorg-x11-drv-libinput's RPM will Provide/Obsolete [1] xorg-x11-drv-synaptics and thus remove the old package on update. Users that need the synaptics driver then need to install xorg-x11-drv-synaptics-legacy. This driver will then install itself correctly without extra user intervention and will take precedence over the libinput driver. Removing xorg-x11-drv-synaptics-legacy will remove the driver assignment and thus fall back to libinput for touchpads. So aside from the name change, everything else works smoother now. Both packages are now updated in Rawhide and should be available from your local mirror soon.

What does this mean for you as a user? If you are a synaptics user, after an update/install, you need to now manually install xorg-x11-drv-synaptics-legacy. You can remove any xorg.conf.d snippets assigning the synaptics driver unless they also include other custom configuration.

[1] "Provide" in RPM-speak means the package provides functionality otherwise provided by some other package even though it may not necessarily provide the code from that package. "Obsolete" means that installing this package replaces the obsoleted package.

Monday, November 14, 2016

I've written more extensively about this here but here's an analogy that should get the point across a bit better: Wayland is just a protocol, just like HTTP. In both cases, you have two sides with very different roles and functionality. In the HTTP case, you have the server (e.g. Apache) and the client (a browser, e.g. Firefox). The communication protocol is HTTP but both sides make a lot of decisions unrelated to the protocol. The server decides what data is sent, the client decides how the data is presented to the user. Wayland is very similar. The server, called the "compositor", decides what data is sent (also: which of the clients even gets the data). The client renders the data [1] and decides what to do with input like key strokes, etc.

Asking Does $FEATURE work under Wayland? is akin to asking Does $FEATURE work under HTTP?. The only answer is: it depends on the compositor and on the client. It's the wrong question. You should ask questions related to the compositor and the client instead, e.g. "does $FEATURE work in GNOME?" or "does $FEATURE work in GTK applications?". That's a question that can be answered.

Of course, there are some cases where the fault is really the protocol itself. But often enough, it's not.

[1] albeit it does so by telling the compositor to display it. The analogy with HTTP only works to some extent... :)

Wednesday, November 2, 2016

Update: Dec 08 2016: someone's working on this project. Sorry about the late update, but feel free to pick other projects you want to work on.

Interested in hacking on some low-level stuff and implementing a feature that's useful to a lot of laptop owners out there? We have a feature on libinput's todo list but I'm just constantly losing my fight against the ever-growing todo list. So if you already know C and you're interested in playing around with some low-level bits of software this may be the project for you.

Specifically: within libinput, we want to disable certain devices based on a lid state. In the first instance this means that when the lid switch is toggled to closed, the touchpad and trackpoint get silently disabled to not send events anymore. [1] Since it's based on a switch state, this also means that we'll now have to listen to switch events and expose those devices to libinput users.

The things required to get all this working are:

Designing a switch interface plus the boilerplate code required (I've done most of this bit already)

Extending the current evdev backend to handle devices with EV_SW and exposing their events

Hooking up the switch devices to internal touchpads/trackpoints to disable them ad-hoc

Handle those devices where lid switch is broken in the hardware (more details on this when we get to this point)

You get to dabble with libinput and a bit of udev and the kernel. Possibly Xorg stuff, but that's unlikely at this point. This project is well suited for someone with a few spare weekends ahead. It's great for someone who hasn't worked with libinput before, but it's not a project to learn C, you better know that ahead of time. I'd provide the mentoring of course (I'm in UTC+10, so expect IRC/email). If you're interested let me know. Riches and fame may happen but are not guaranteed.

[1] A number of laptops have a hw issue where either device may send random events when the lid is closed

I finally have a bit of time to look at touchpad pointer acceleration in libinput. But when I did, I found a great total of 5 bugs across freedesktop.org and Red Hat's bugzilla, despite this being the first thing anyone brings up whenever libinput is mentioned. 5 bugs - that's not much to work on. Note that over time there were also a lot of bugs where pointer acceleration was fixed once the touchpad's axis ranges were corrected which usually is a two-liner for the udev hwdb.

Tuesday, September 20, 2016

First a definition: a trackstick is also called trackpoint, pointing stick, or "that red knob between G, H, and B". I'll be using trackstick here, because why not.

This post is the continuation of libinput and the Lenovo T450 and T460 series touchpads where we focused on a stalling pointer when moving the finger really slowly. Turns out the T460s at least, possibly others in the *60 series have another bug that caused a behaviour that is much worse but we didn't notice for ages as we were focusing on the high-precision cursor movement. Specifically, the pointer would just randomly stop moving for a short while (spoiler alert: 300ms), regardless of the movement speed.

libinput has built-in palm detection and one of the things it does is to disable the touchpad when the trackstick is in use. It's not uncommon to rest the hand near or on the touchpad while using the trackstick and any detected touch would cause interference with the pointer motion. So events from the touchpad are ignored whenever the trackpoint sends events. [1]

On (some of) the T460s the trackpoint sends spurious events. In the recording I have we have random events at 9s, then again 3.5s later, then 14s later, then 2s later, etc. Each time, our palm detection could would assume the trackpoint was in use and disable the touchpad for 300ms. If you were using the touchpad while this was happening, the touchpad would suddenly stop moving for 300ms and then continue as normal. Depending on how often these spurious events come in and the user's current caffeination state, this was somewhere between odd, annoying and infuriating.

The good news is: this is fixed in libinput now. libinput 1.5 and the upcoming 1.4.3 releases will have a fix that ignores these spurious events and makes the touchpad stalls a footnote of history. Hooray.

Monday, September 19, 2016

This post explains how the evdev protocol works. After reading this post you
should understand what evdev is and how to interpret evdev event dumps to
understand what your device is doing. The post is aimed mainly at users
having to debug a device, I will thus leave out or simplify some of the
technical details. I'll be using the output from evemu-record as example
because that is the primary debugging tool for evdev.

What is evdev?

evdev is a Linux-only generic protocol that the kernel uses to forward
information and events about input devices to userspace. It's not just for mice and keyboards but any device that has any sort of
axis, key or button, including things like webcams and remote controls. Each
device is represented as a device node in the form of
/dev/input/event0, with the trailing number increasing as you add
more devices. The node numbers are re-used after you unplug a device, so
don't hardcode the device node into a script. The device nodes are also
only readable by root, thus you need to run any debugging tools as root too.

evdev is the primary way to talk to input devices on Linux. All X.Org
drivers on Linux use evdev as protocol and libinput as well. Note that
"evdev" is also the shortcut used for xf86-input-evdev, the X.Org
driver to handle generic evdev devices, so watch out for context when you
read "evdev" on a mailing list.

Communicating with evdev devices

Communicating with a device is simple: open the device node and read from
it. Any data coming out is a struct input_event, defined in
/usr/include/linux/input.h:

I'll describe the contents later, but you can see that it's a very simple
struct.

Static information about the device such
as its name and capabilities can be queried with a set of
ioctls. Note
that you should always use
libevdev
to interact with a device, it blunts the few sharp edges evdev has.
See the libevdev
documentation for usage examples.

evemu-record, our primary debugging tool for anything evdev is very
simple. It reads the static information about the device, prints it and then
simply reads and prints all events as they come in. The output is in
machine-readable format but it's annotated with human-readable comments
(starting with #). You can always ignore the non-comment bits. There's a
second command, evemu-describe, that only prints the description and
exits without waiting for events.

Relative devices and keyboards

The top part of an evemu-record output is the device description. This is
a list of static properties that tells us what the device is capable of. For
example, the USB mouse I have plugged in here prints:

The device name is the one (usually) set by the manufacturer and so are the
vendor and product IDs. The bus is one of the "BUS_USB" and similar
constants defined in /usr/include/linux/input.h. The version is often
quite arbitrary, only a few devices have something meaningful here.

We also have a set of supported events, categorised by "event type" and
"event code" (note how type and code are also part of the struct input_event).
The type is a general category, and
/usr/include/linux/input-event-codes.h defines quite a few of those.
The most important types are EV_KEY (keys and buttons), EV_REL (relative
axes) and EV_ABS (absolute axes). In the output above we can see that we
have EV_KEY and EV_REL set.

As a subitem of each type we have the event code. The event codes for this device are
self-explanatory: BTN_LEFT, BTN_RIGHT and BTN_MIDDLE are the left, right and
middle button. The axes are a relative x axis,
a relative y axis and a wheel axis (i.e. a mouse wheel). EV_MSC/MSC_SCAN is
used for raw scancodes and you can usually ignore it.
And finally we have the EV_SYN bits but let's ignore those, they are always
set for all devices.

Note that an event code cannot be on its own, it must be a tuple of (type,
code). For example, REL_X and ABS_X have the same numerical value and
without the type you won't know which one is which.

That's pretty much it. A keyboard will have a lot of EV_KEY
bits set and the EV_REL axes are obviously missing (but not always...).
Instead of BTN_LEFT, a keyboard would have e.g. KEY_ESC, KEY_A, KEY_B, etc.
90% of device
debugging is looking at the event codes and figuring out which ones are
missing or shouldn't be there.

Exercise: You should now be able to read a evemu-record
description from any mouse or keyboard device connected to your computer and
understand what it means. This also applies
to most special devices such as remotes - the only thing that changes are
the names for the keys/buttons. Just run sudo evemu-describe and pick any
device in the list.

The events from relative devices and keyboards

evdev is a serialised protocol. It sends a series of events and then a
synchronisation event to notify us that the preceeding events all belong
together. This synchronisation event is EV_SYN SYN_REPORT, is generated by
the kernel, not the device and hence all EV_SYN codes are always available
on all devices.

Let's have a look at a mouse movement. As explained above, half the line is
machine-readable but we can ignore that bit and look at the human-readable
output on the right.

Mostly the same as button events. But wait, there is one difference: we have
a value of 2 as well. For key events, a value 2 means "key repeat".
If you're on the tty, then this is what generates repeat keys for you. In X
and Wayland we ignore these repeat events and instead use XKB-based key
repeat.

Now look at the keyboard events again and see if you can make sense of the sequence.
We have an Enter release (but no press), then ctrl down (and repeat),
followed by a 'c' press - but no release. The explanation is simple - as
soon as I hit enter in the terminal, evemu-record started recording so it
captured the enter release too. And it stopped recording as soon as ctrl+c
was down because that's when it was cancelled by the terminal. One important
takeaway here: the evdev protocol is not guaranteed to be balanced. You may
see a release for a key you've never seen the press for, and you may be
missing a release for a key/button you've seen the press for (this happens
when you stop recording). Oh, and there's one danger:
if you record your keyboard and you type your password, the keys will show
up in the output. Security experts generally reocmmend not publishing event
logs with your password in it.

Exercise: You should now be able to read a evemu-record
events list from any mouse or keyboard device connected to your computer and
understand the event sequence.This also applies to most special devices such as
remotes - the only thing that changes are the names for the keys/buttons.
Just run sudo evemu-record and pick any device listed.

Absolute devices

Things get a bit more complicated when we look at absolute input devices
like a touchscreen or a touchpad. Yes, touchpads are absolute devices in
hardware and the conversion to relative events is done in userspace by e.g.
libinput. The output of my touchpad is below. Note that I've manually removed a few
bits to make it easier to grasp, they will appear later in the multitouch
discussion.

We have a BTN_LEFT again and a set of other buttons that I'll explain in a
second. But first we look at the EV_ABS output. We have the same naming
system as above. ABS_X and ABS_Y are the x and y axis on the device,
ABS_PRESSURE is an (arbitrary) ranged pressure value.

Absolute axes have a bit more
state than just a simple bit. Specifically, they have a minimum and maximum
(not all hardware has the top-left sensor position on 0/0, it can
be an arbitrary position, specified by the minimum). Notable here is that
the axis ranges are simply the ones announced by the device - there is no
guarantee that the values fall within this range and indeed a lot of
touchpad devices tend to send values slightly outside that range.
Fuzz and flat can be safely ignored, but resolution is interesting. It is
given in units per millimeter and thus tells us the size of the device. in
the above case: (5112 - 1024)/42 means the device is 97mm wide. The
resolution is quite commonly wrong,
a lot of axis
overrides need the resolution changed
to the correct value.

The axis description also has a current value listed. The kernel only sends
events when the value changes, so even if the actual hardware keeps sending
events, you may never see them in the output if the value remains the same.
In other words, holding a finger perfectly still on a touchpad creates
plenty of hardware events, but you won't see anything coming out of the
event node.

Finally, we have properties on this device. These are used to indicate
general information about the device that's not otherwise obvious. In this
case INPUT_PROP_POINTER tells us that we need a pointer for this device (it
is a touchpad after all, a touchscreen would instead have INPUT_PROP_DIRECT
set). INPUT_PROP_BUTTONPAD means that this is a so-called clickpad, it does
not have separate physical buttons but instead the whole touchpad clicks.
Ignore INPUT_PROP_TOPBUTTONPAD because it only applies to the Lenovo *40
series of devices.

Ok, back to the buttons: aside from BTN_LEFT, we have BTN_TOUCH. This one
signals that the user is touching the surface of the touchpad (with some
in-kernel defined minimum pressure value). It's not just for finger-touches,
it's also used for graphics tablet stylus touchpes (so really, it's more
"contact" than "touch" but meh).

The BTN_TOOL_FINGER event tells us that a finger is in detectable range. This
gives us two bits of information: first, we have a finger (a tablet would have
e.g. BTN_TOOL_PEN) and second, we may have a finger in proximity without
touching. On many touchpads, BTN_TOOL_FINGER and BTN_TOUCH come in the same
event, but others can detect a finger hovering over the touchpad too (in which
case you'd also hope for ABS_DISTANCE being available on the touchpad).

Finally, the BTN_TOOL_DOUBLETAP up to BTN_TOOL_QUINTTAP tell us whether the
device can detect 2 through to 5 fingers on the touchpad. This doesn't actually
track the fingers, it merely tells you "3 fingers down" in the case of
BTN_TOOL_TRIPLETAP.

Exercise: Look at your touchpad's description and figure out if the size
of the touchpad is correct based on the axis information [1]. Check how many
fingers your touchpad can detect and whether it can do pressure or distance detection.

The events from absolute devices

Events from absolute axes are not really any different than events from
relative devices which we already covered. The same type/code combination with
a value and a timestamp, all framed by EV_SYN SYN_REPORT events. Here's an
example of me touching the touchpad:

In the first event you see BTN_TOOL_FINGER and BTN_TOUCH set (this touchpad
doesn't detect hovering fingers). An x/y coordinate pair and a pressure value.
The pressure changes in the second event, the third event changes pressure and
location. Finally, we have BTN_TOOL_FINGER and BTN_TOUCH released on finger up,
and the pressure value goes back to 0. Notice how the second event didn't
contain any x/y coordinates? As I said above, the kernel only sends updates on
absolute axes when the value changed.

In the first event, the touchpad detected all three fingers at the same time.
So get BTN_TOUCH, x/y/pressure and BTN_TOOL_TRIPLETAP set. Note that the
various BTN_TOOL_* bits are mutually exclusive. BTN_TOOL_FINGER means
"exactly 1 finger down" and you can't have exactly 1 finger down when you have
three fingers down. In the second event x and pressure update (y has no event,
it stayed the same).

In the event after the break, we switch from three fingers to one finger.
BTN_TOOL_TRIPLETAP is released, BTN_TOOL_FINGER is set. That's very common.
Humans aren't robots, you can't release all fingers at exactly the same time, so
depending on the hardware scanout rate you have intermediate states where one
finger has left already, others are still down. In this case I released two
fingers between scanouts, one was still down. It's not uncommon to see a full
cycle from BTN_TOOL_FINGER to BTN_TOOL_DOUBLETAP to BTN_TOOL_TRIPLETAP on finger
down or the reverse on finger up.

Exercise: test out the pressure values on your touchpad and see how close
you can get to the actual announced range. Check how accurate the multifinger
detection is by tapping with two, three, four and five fingers. (In both cases,
you'll likely find that it's very much hit and miss).

Multitouch and slots

Now we're at the most complicated topic regarding evdev devices. In the
case of multitouch devices, we need to send multiple touches on the same
axes. So we need an additional dimension and that is called multitouch
slots (there is another, older multitouch protocol that doesn't use
slots but it is so rare now that you don't need to bother).

First: all axes that are multitouch-capable are repeated as ABS_MT_foo axis.
So if you have ABS_X, you also get ABS_MT_POSITION_X and both axes have the
same axis ranges and resolutions. The reason here is
backwards-compatibility: if a device only sends multitouch events, older
programs only listening to the ABS_X etc. events won't work. Some axes may
only be available for single-touch (ABS_MT_TOOL_WIDTH in this case).

We have an x and y position for multitouch as well as a pressure axis.
There are also two special multitouch axes that aren't really axes.
ABS_MT_SLOT and ABS_MT_TRACKING_ID. The former specifies which
slot is currently active, the latter is used to track touch points.

Slots are a static property of a device. My touchpad, as you can see above
ony supports 2 slots (min 0, max 1) and thus can track 2 fingers at a
time. Whenever the first finger is set down it's coordinates will be tracked
in slot 0, the second finger will be tracked in slot 1. When the finger in
slot 0 is lifted, the second finger continues to be tracked in slot 1, and
if a new finger is set down, it will be tracked in slot 0. Sounds more
complicated than it is, think of it as an array of possible touchpoints.

The tracking ID is an incrementing number that lets us tell touch
points apart and also tells us when a touch starts and when it ends. The two
values are either -1 or a positive number. Any positive number means "new touch"
and -1 means "touch ended". So when you put two fingers down and lift them
again, you'll get a tracking ID of 1 in slot 0, a tracking ID of 2 in slot
1, then a tracking ID of -1 in both slots to signal they ended. The tracking
ID value itself is meaningless, it simply increases as touches are created.

We have a tracking ID (387) signalling finger down, as well as a position
plus pressure. then some updates and eventually a tracking ID of -1
(signalling finger up). Notice how there is no ABS_MT_SLOT here - the kernel
buffers those too so while you stay in the same slot (0 in this case) you
don't see any events for it. Also notice how you get both single-finger as
well as multitouch in the same event stream. This is for backwards
compatibility [2]

This was a really quick two-finger tap that illustrates the tracking IDs nicely.
In the first event we get a touch down, then an ABS_MT_SLOT event. This
tells us that subsequent events belong to the other slot, so it's the other
finger. There too we get a tracking ID + position. In the next event we get
an ABS_MT_SLOT to switch back to slot 0. Tracking ID of -1 means that touch
ended, and then we see the touch in slot 1 ended too.

Note that "scroll" is something handled in userspace, so what you see here
is just a two-finger move. Everything in there i something we've already
seen, but pay attention to the two middle events: as updates come in for
each finger, the ABS_MT_SLOT changes before the upates are sent. The kernel
filter for identical events is still in effect, so in the third event we
don't get an update for the X position on slot 1. The filtering is
per-touchpoint, so in this case this means that slot 1 position x is still
on 3511, just as it was in the previous event.

That's all you have to remember, really. If you think of evdev as a
serialised way of sending an array of touchpoints, with the slots as the
indices then it should be fairly clear. The rest is then just about actually
looking at the touch positions and making sense of them.

Exercise: do a pinch gesture on your touchpad. See if you can
track the two fingers moving closer together. Then do the same but only move
one finger. See how the non-moving finger gets less updates.

That's it. There are a few more details to evdev but much of that is just
more event types and codes. The few details you really have to worry about
when processing events are either documented in libevdev or abstracted away
completely. The above should be enough to understand what your device does,
and what goes wrong when your device isn't working. Good luck.

[1] If not, file a bug against systemd's hwdb and CC me so we can put
corrections in
[2] We treat some MT-capable touchpads as single-touch devices in libinput
because the MT data is garbage

Friday, September 16, 2016

libinput's touchpad acceleration is the cause for a few bugs and outcry from
a quite vocal (maj|in)ority. A common suggestion is "make it like
the synaptics driver". So I spent a few hours going through the pointer
acceleration code to figure out what xf86-input-synaptics actually does (I don't think
anyone knows at this point) [1].

If you just want the TLDR: synaptics doesn't use physical distances but
works in device units coupled with a few magic factors, also based on device
units. That pretty much tells you all that's needed.

Also a disclaimer: the last time some serious work was done on acceleration
was in 2008/2009. A lot of things have changed since and
since the server is effectively un-testable, we ended up with the mess below
that seems to make little sense. It probably made sense 8 years ago and
given that most or all of the patches have my signed-off-by it must've made
sense to me back then. But now we live in the glorious future and holy cow
it's awful and confusing.

Synaptics has three options to configure speed: MinSpeed, MaxSpeed and
AccelFactor. The first two are not explained beyond "speed factor" but given
how accel usually works let's assume they all somewhoe should work as a
multiplication on the delta (so a factor of 2 on a delta of dx/dy gives you
2dx/2dy). AccelFactor is documented as "acceleration factor for normal
pointer movements", so clearly the documentation isn't going to help clear
any confusion.

I'll skip the fact that synaptics also has a pressure-based motion
factor with four configuration options because oh my god what have we done.
Also, that one is disabled by default and has no effect unless set by the
user. And I'll also only handle default values here, I'm not going to get
into examples with configured values.

Also note: synaptics has a device-specific acceleration profile (the only
driver that does) and thus the acceleration handling is split between the
server and the driver.

Ok, let's get started. MinSpeed and MaxSpeed default to 0.4 and 0.7. The
MinSpeed is used to set constant acceleration (1/min_speed) so we always
apply a 2.5 constant acceleration multiplier to deltas from the touchpad.
Of course, if you set constant acceleration in the xorg.conf, then it
overwrites the calculated one.

MinSpeed and MaxSpeed are mangled during setup so that MaxSpeed is actually
MaxSpeed/MinSpeed and MinSpeed is always 1.0. I'm not 100% why but
the later clipping to the min/max speed range ensures that we never go below a 1.0 acceleration factor
(and thus never decelerate).

The AccelFactor default is 200/diagonal-in-device-coordinates. On my T440s
it's thus 0.04 (and will be roughly the same for most PS/2 Synaptics
touchpads). But on a Cyapa with a different axis range it is 0.125. On a
T450s it's 0.035 when booted into PS2 and 0.09 when booted into RMI4.
Admittedly, the resolution halfs under RMI4 so this possibly maybe makes
sense. Doesn't quite make as much sense when you consider the x220t which
also has a factor of 0.04 but the touchpad is only half the size of the
T440s.

It's correct that the frequency is roughly 80Hz but I honestly don't know
what the 100packet/s reference refers to. Either way, it means that we
always apply a factor of 12.5, regardless of the timing of the events.
Ironically, this one is hardcoded and not configurable unless you happen to know that it's the X server option VelocityScale or ExpectedRate (both of them set the same variable).

Ok, so we have three factors. 2.5 as a function of MaxSpeed, 12.5 because of
80Hz (??) and 0.04 for the diagonal.

When the synaptics driver calculates a delta, it does so in device
coordinates and ignores the device resolution (because this code pre-dates
devices having resolutions). That's great until you have a device
with uneven resolutions like the x220t. That one has 75 and 129 units/mm for
x and y, so for any physical movement you're going to get almost twice as
many units for y than for x. Which means that if you move 5mm to the right
you end up with a different motion vector (and thus acceleration) than when
you move 5mm south.

The core X protocol actually defines who acceleration is supposed to be
handled. Look up the man page for XChangePointerControl(), it sets a
threshold and an accel factor:

The XChangePointerControl function defines how the pointing device
moves. The acceleration, expressed as a fraction, is a multiplier
for movement. For example, specifying 3/1 means the pointer moves
three times as fast as normal. The fraction may be rounded
arbitrarily by the X server. Acceleration only takes effect if the
pointer moves more than threshold pixels at once and only applies to
the amount beyond the value in the threshold argument.

Of course, "at once" is a bit of a blurry definition outside of maybe
theoretical physics. Consider the definition of "at once" for a gaming mouse
with 500Hz sampling rate vs. a touchpad with 80Hz (let us fondly remember
the 12.5 multiplier here) and the above description quickly dissolves into
ambiguity.

Anyway, moving on. Let's say the server just received a delta from the synaptics driver. The
pointer accel code in the server calculates the velocity over time,
basically by doing a hypot(dx, dy)/dtime-to-last-event. Time in the server
is always in ms, so our velocity is thus in device-units/ms (not adjusted
for device resolution).

Side-note: the velocity is calculated across several delta events so it gets
more accurate. There are some checks though so we don't calculate across
random movements: anything older than 300ms is discarded, anything not in
the same octant of movement is discarded (so we don't get a velocity of 0
for moving back/forth). And there's two calculations to make sure we only
calculate while the velocity is roughly the same and don't average between
fast and slow movements. I have my doubts about these, but until I have some
more concrete data let's just say this is accurate (altough since the whole
lot is in device units, it probably isn't).

Anyway. The velocity is multiplied with the constant acceleration (2.5, see
above) and our 12.5 magic value. I'm starting to think that this is just
broken and would only make sense if we used a delta of "event count" rather
than milliseconds.

It is then passed to the synaptics driver for the actual acceleration
profile. The first thing the driver does is remove the constant acceleration
again, so our velocity is now just v * 12.5. According to the comment this
brings it back into "device-coordinate based velocity" but this seems wrong
or misguided since we never changed into any other coordinate system.

The driver applies the accel factor (0.04, see above) and then clips the
whole lot into the MinSpeed/MaxSpeed range (which is adjusted to move
MinSpeed to 1.0 and scale up MaxSpeed accordingly, remember?).
After the clipping, the pressure motion factor is calculated and applied. I
skipped this above but it's basically: the harder you press the higher the
acceleration factor. Based on some config options. Amusingly, pressure
motion has the potential to exceed the MinSpeed/MaxSpeed options. Who knows
what the reason for that is...

Oh, and btw: the clipping is actually done based on the accel factor set by
XChangePointerControl into the acceleration function here. The code is

So we have a factor set by XChangePointerControl() but it's only used to
determine the maximum factor we may have, and then we clip to that. I'm
missing some cross-dependency here because this is what the GUI acceleration
config bits hook into. Somewhere this sets things and changes the
acceleration by some amount but it wasn't obvious to me.

Alrighty. We have a factor now that's returned to the server and we're back
in normal pointer acceleration land (i.e. not synaptics-specific). Woohoo.
That factor is averaged across 4 events using the simpson's rule to smooth
out aprupt changes. Not sure this really does much, I don't think we've ever
done any evaluation on that. But it looks good on paper (we have that in
libinput as well).

Now the constant accel factor is applied to the deltas.
So far we've added the factor, removed it (in synaptics), and now
we're adding it again. Which also makes me wonder whether we're applying the
factor twice to all other devices but right now I'm past the point where I
really want to find out . With all the above, our acceleration factor is,
more or less:

f = units/ms * 12.5 * (200/diagonal) * (1.0/MinSpeed)

and the deltas we end up using in the server are

(dx, dy) = f * (dx, dy)

But remember, we're still in device units here (not adjusted for
resolution).

Anyway. You think we're finished? Oh no, the real fun bits start now. And if
you haven't headdesked in a while, now is a good time.

After acceleration, the server does some scaling because synaptics is an
absolute device (with axis ranges) in relative mode [2]. Absolute devices
are mapped into the whole screen by default but when they're sending
relative events, you still want a 45 degree line on the device to map into
45 degree cursor movement on the screen. The server does this by
adjusting dy in-line with the device-to-screen-ratio (taking
device resolution into account too). On my T440s this means:

dx is left as-is. Now you have the delta that's actually applied to
the cursor. Except that we're in device coordinates, so we map the current
cursor position to device coordinates, then apply the delta, then map back
into screen coordinates (i.e. pixels). You may have spotted the flaw here:
when the screen size changes, the dy scaling changes and thus the pointer
feel. Plug in another monitor, and touchpad acceleration changes. Also: the
same touchpad feels different on laptops when their screen hardware differs.

Ok, let's wrap this up. Figuring out what the synaptics driver does
is... "tricky". It seems much like a glorified random number scheme. I'm
not planning to implement "exactly the same acceleration as synaptics" in
libinput because this would be insane and despite my best efforts, I'm not
that yet. Collecting data from synaptics users is almost meaningless,
because no two devices really employ the same acceleration profile (touchpad
axis ranges + screen size) and besides, there are 11 configuration options
that all influence each other.

What I do plan though is collect more motion data from a variety of
touchpads and see if I can augment the server enough that I can get a clear
picture of how motion maps to the velocity. If nothing else, this should
give us some picture on how different the various touchpads actually behave.

[1] fwiw, I had this really great idea of trying to get behind all this,
with diagrams and everything. But then I was printing json data from the X
server into the journal to be scooped up by sed and python script to print
velocity data. And I questioned some of my life choices.
[2] why the hell do we do this? because synaptics at some point became a
device that announce the axis ranges (seemed to make sense at the time, 2008) and
then other things started depending on it and with all the fixes to the
server to handle absolute devices in relative mode (for tablets) we painted
ourselves into a corner. Synaptics should switch back to being a relative
device, but last I tried it breaks pointer acceleration and that a) makes
the internets upset and b) restoring the "correct" behaviour is, well, you
read the article so far, right?

Friday, September 9, 2016

A great new feature has been merged during this 1.19 X server development cycle: we're now using threads for input [1]. Previously, there were two options for how an input driver would pass on events to the X server: polling or from within the signal handler. Polling simply adds all input devices' file descriptors to a select(2) loop that is processed in the mainloop of the server. The downside here is that if the server is busy rendering something, your input is delayed until that rendering is complete. Historically, polling was primarily used by the keyboard driver because it just doesn't matter much when key strokes are delayed. Both because you need the client to render them anyway (which it can't when it's busy) and possibly also because we're just so bloody used to typing delays.

The signal handler approach circumvented the delays by installing a SIGIO handler for each input device fd and calling that when any input occurs. This effectively interrupts the process until the signal handler completes, regardless of what the server is currently busy with. A great solution to provide immediate visible cursor movement (hence it is used by evdev, synaptics, wacom, and most of the now-retired legacy drivers) but it comes with a few side effects. First of all, because the main process is interrupted, the bit where we read the events must be completely separate to the bit where we process the events. That's easy enough, we've had an input event queue in the server for as long as I've been involved with X.Org development (~2006). The drivers push events into the queue during the signal handler, in the main loop the server reads them and processes them. In a busy server that may be several seconds after the pointer motion was performed on the screen but hey, it still feels responsive.

The bigger issue with the use of a signal handler is: you can't use malloc [2]. Or anything else useful. Look at the man page for signal(7), it literally has a list of allowed functions. This leads to two weird side-effects: one is that you have to pre-allocate everything you may ever need for event processing, the other is that you need to re-implement any function that is not currently async signal safe. The server actually has its own implementation of printf for this reason (for error logging). Let's just say this is ... suboptimal. Coincidentally, libevdev is mostly async signal safe for that reason too. It also means you can't use any libraries, because no-one [3] is insane enough to make libraries async signal-safe.

We were still mostly "happy" with it until libinput came along. libinput is a full input stack and expecting it to work within a signal handler is the somewhere between optimistic, masochistic and sadistic. The xf86-input-libinput driver doesn't use the signal handler and the side effect of this is that a desktop with libinput didn't feel as responsive when the server was busy rendering.

Keith Packard stepped in and switched the server from the signal handler to using input threads. Or more specifically: one input thread on top of the main thread. That thread controls all the input device's file descriptors and continuously reads events off them. It otherwise provides the same functionality the signal handler did before: visible pointer movement and shoving events into the event queue for the main thread to process them later. But of course, once you switch to threads, problems have 2 you now. A signal handler is "threading light", only one code path can be interrupted and you know you continue where you left off. So synchronisation primitives are easier than in threads where both code paths continue independently. Keith replaced the previous xf86BlockSIGIO() calls with corresponding input_lock() and input_unlock() calls and all the main drivers have been switched over. But some interesting race conditions kept happening. But as of today, we think most of these are solved.

The best test we have at this point is libinput's internal test suite. It creates roughly 5000 devices within about 4 minutes and thus triggers most code paths to do with device addition and removal, especially the overlaps between devices sending events before/during/after they get added and/or removed. This is the largest source of possible errors as these are the code paths with the most amount of actual simultaneous access to the input devices by both threads. But what the test suite can't test is normal everyday use. So until we get some more code maturity, expect the occasional crash and please do file bug reports. They'll be hard to reproduce and detect, but don't expect us to run into the same race conditions by accident.

[1] Yes, your calendar is right, it is indeed 2016, not the 90s or so
[2] Historical note: we actually mostly ignored this until about 2010 or so when glibc changed the malloc implementation and the server was just randomly hanging whenever we tried to malloc from within the signal handler. Users claimed this was bad UX, but I think it's right up there with motif.
[3] yeah, yeah, I know, there's always exceptions.