Mir

Summary

We have developed and continue to grow our next generation display server called Mir. It is a system-level component targeting to unlock next-generation user experiences for devices ranging from traditional Linux desktops to mobile devices to embedded products powered by Ubuntu. Mir can be viewed as a replacement for the X window server system on Ubuntu for desktop form factors, making Mir the unified solution for Ubuntu. This wiki outlines the motivation for the project, describes the high level design, summarizes the scope, and provides the roadmap of the Mir display server.

The purpose of Mir is to enable the development of user interface shells. In the case of Ubuntu, Mir will be utilized by next generation Unity. In addition, Mir will be developed in a manner that will retain its flexibility and usability in order that shells other than Unity may employ it.

Objectives

In general, we have the following attributes in mind when developing the system:

Well-Defined Functionality

We develop the system based on requirements and use-cases. We want to avoid the situation of unnecessary feature-bloat, with the system evolving on its own time-line without actual need for it.

Efficiency

The system should fulfill all of the requirements as efficiently as possible, with a focus on CPU cycles, GPU cycles, memory and power consumption. We want to establish a set of benchmarks that make sure that the system lives up to this attribute.

Test-Driven

The system should be under test as much as possible. We consider all three levels of testing-detail (unit, integration and acceptance tests) to ensure a high quality and to deliver a product that just works (tm). More to this, any development should only happen starting with a well-defined acceptance test available. Any feature that we cannot test for cannot be implemented in a high quality.

Versatile & Flexible

The system should easily be adaptable and portable to different platforms and use-cases (within the range of the well-defined functionality mentioned before). Running the system on a mobile device, exposing only a limited functionality like a system-level compositor should not be a special-case but a requirement easily fulfilled by the system.

For instance we now have seen some embedded products use Mir with a custom shell as a single compositor. As well as our Unity8 based phone products using Mir with a Unity8 based shell with nested Mir as a system & session compositor in place for security and design requirements.

Security

We want to avoid exposing any sort of privileged protocol to client applications. In particular, we want to prevent (malicious) client applications from spoofing on the input event stream or capture the screen content without at least a prior authorization/authentication step. To this end, we restrict the set of non-privileged operations.

Toolkit Integration & Legacy X Application Support

Mir's client library should be easy to integrate with existing toolkits. Application authors relying on Qt/QML, GTK3, XUL etc. should not be required to perform additional porting as we will work on providing Mir integration for the most prominent toolkit choices. In reality though, certain legacy applications will not be able to transition away from X completely, and we will provide an in-session rootless X server that is integrated with Mir. It acts as an on-demand compatibility layer between legacy X applications and the session-level Unity/Mir instance.

Scope

This section gives a high-level overview of the functionality that the final version of the system should provide. Please refer to the section “Roadmap” for time estimates and targeted release version for the individual features.

Mir Internals

Compositor

The compositor is responsible for presenting the final scene consisting of all application and shell surfaces (windows) on screen. It contains a renderer that takes care of applying effects (e.g., drop shadows) to the individual surfaces. The compositor is synchronized to vblank to avoid tearing and wasting cycles.

Input Management

The system should support reading measurements (coordinates, keys, acceleration values …) from arbitrary input devices, pre-processing the event stream, presenting it to a chain of server-side filters (e.g., to support shell-level gesture recognition or keyboard interaction) and finally delivering it to client applications. We want the server-side input stack to be flexible in that it should support reading from arbitrary input devices, with a focus on the evdev kernel subsystem.

Finally we want to make sure that the input stack is as efficient as possible with respect to power consumption. Most importantly, we want to be able to throttle down event propagation to client applications to match vblanc and account for the loss in sampling accuracy by means of predicting future motion events.

We have looked at multiple candidate input stacks and have chosen the one included in Android for its efficiency, clear design and flexibility. We adapted the stack to compile outside of the Android source tree, only relying on the STL and boost.

Output Management

The system should support monitoring connected physical display devices, without assuming a certain type of connector. More to this, the system should provide means for shell components to react to changes in the configuration of the physical display devices, to:

Support common multi-monitor use-cases and to Support seamless transitions between different form factors (thinking about the convergence device here)

Another important area of functionality is support for multiple GPUs with different characteristics running in the same system. High-end laptops with discrete graphics powering games or 3D-intensive applications and featuring an on-chip graphics solution for low power consumption scenarios are a prominent example here. We want to be able to seamlessly transition between both GPUs and move application and their respective EGL contexts from one GPU to the other.

Application Management

Applications should be first class citizens in our display server. An application is named and consists of an arbitrary number of surfaces. The shell components can access the set of currently running/registered applications and operate on top of the collection to provide e.g. Alt-Tab functionality. Shell

The shell, or system level UI, will be a first-class citizen of the display server, at least in terms of well-defined interfaces that are used to communicate back and forth between the shell and the other components of the display server. We do consider an in-process shell approach right now, but we might revisit this decision in the future.

Inter-app Data Exchange

Exchanging data between running applications is very limited in the X world. We have basic support for copy’n’paste and drag’n’drop operation, but the experience that is currently offered is very limited and barely functional. For this reason, we want the display server to provide an advanced way for applications to exchange arbitrary data, together with a seamless user experience when initiating and carrying out the actual data exchange.

Mir Today

On Android Drivers

Currently the Ubuntu Touch "phablet" images use Mir, as of 13.10 and beyond. There are also multiple mobile and embedded products leveraging this architecture and shipping today. Pre-14.04 development has also been able to demonstrate Unity8 "convertible" on this architecture, meaning the shell will can transition from a mobile experience to a traditional-desktop experience based on the appearance/disappearance of various input devices.

Architecture:

Mir on the Free Graphics Driver Stack

Right now, Mir is able to run on top of the free graphics driver stack, leveraging GBM, DRM and KMS to integrate with existing graphics hardware. This particular configuration can be run with the currently optional package unity8-desktop-session-mir which. More information available at Unity8Desktop wiki and a demo video from a community member here

Architecture:

Mir on HW Supported By Closed Source Drivers

At the moment, partners have not delivered closed source or "proprietary" drivers compatible with Mir. However, we are in contact with GPU vendors and are working closely together with them to support Mir and to distill a reusable and unified EGL-centric driver model that further eases display server development in general and keeps cross-platform use-cases in mind.

April 2016

Motivation - Why Mir?

In recent years, the sophisticated user experience offered by mobile devices like the iPhone or Android-powered devices changes the expectations of users regarding a “fast’n’fluid” (f’n’f) way of interacting with their devices. Historically, graphical user interfaces on the Linux platform have been powered by the X windowing system. X has a long and successful history and it has served the purposes of both system level and application level UI well for more than 3 decades. However, users nowadays expect a more consistent and a more integrated user experience than what is possible to offer on top of the X window system. Even more recent developments like the introduction of compositors to the X stack does not fully solve the situation and both shell and application development do have to deploy workarounds to overcome issues with the X rendering model. With respect to shell development (Unity), three major shortcomings of the X stack prevent us from delivering the user experience (f’n’f) we have in mind:

X shares a lot of system state across process boundaries. This is obviously not a problem in itself but a system-level UI that is meant to provide a beautiful and consistent user experience is likely to require tight control over the overall system state.

X's input model is complex and allows applications to spoof on input events they do not own. On the one hand, this raises serious security concerns, especially regarding mobile platforms. On the other hand, adjusting and extending X's input model is difficult and supporting features like input event batching and compression, motion event prediction together with associated power-saving strategies or flexible synchronization schemes for aligning input event delivery and rendering operations is (too) complex.

The compositor hierarchy ends on the session level, and no tight integration into the system from boot time onward is available. For that reason, there is a visible glitch when transitioning the system from a VT-level to the graphical shell level.

In addition to the points mentioned before, X's graphics driver model lacks focus and its adoption throughout the industry has been problematic. Again, focusing on the mobile use-cases, more consistent driver models like the Android graphics driver model offer much better support and adoption by SOC and GPU vendors. For this reason, we decided to go for a well-defined driver model and we stated the following requirements:

In summary, we want to provide a graphics stack that works across different platforms and driver models by limiting our assumptions to a bare minimum. The graphics stack and its display server component should be easily integrateable with the shell and act as a model that allows a shell to inject/define custom behavior easily. Here, our focus on security plays an important role: We want to avoid the need to expose a privileged protocol that would need to guarded by additional security means like AppArmor. To this end, we prefer an in-process approach that allows a shell implementation to interact with the display server model in a much more flexible way.

Finally, we want to emphasize our focus on quality and enforce a test-driven development approach for the display server component. We require every component of the system to be under test to ensure its correct functionality and to provide us with a test harness that allows us to evolve the system efficiently and safely.

Why Not Wayland / Weston?

An obvious clarification first: Wayland is a protocol definition that defines how a client application should talk to a compositor component. It touches areas like surface creation/destruction, graphics buffer allocation/management, input event handling and a rough prototype for the integration of shell components. However, our evaluation of the protocol definition revealed that the Wayland protocol does not meet our requirements. First, we are aiming for a more extensible input event handling that takes future developments like 3D input devices (e.g. Leap Motion) into account. Please note though that Wayland's input event handling does not suffer from the security issues introduced by X's input event handling semantics (thanks to Daniel Stone and Kristian Høgsberg for pointing this out). With respect to mobile use-cases, we think that the handling of input methods should be reflected in the display server protocol, too. As another example, we consider the shell integration parts of the protocol as privileged and we'd rather avoid having any sort of shell behavior defined in the client facing protocol.

However, we still think that Wayland's attempt at standardizing the communication between clients and the display server component is very sensible and useful, but due to our different requirements we decided to go for the following architecture w.r.t. to protocol-integration:

A protocol-agnostic inner core that is extremely well-defined, well-tested and portable.

An outer-shell together with a frontend-firewall that allow us to port our display server to arbitrary graphics stacks and bind it to multiple protocols.

In summary, we have not chosen Wayland/Weston as our basis for delivering a next-generation user experience as it does not fulfill our requirements completely. More to this, with our protocol- and platform-agnostic approach, we can make sure that we reach our goal of a consistent and beautiful user experience across platforms and device form factors. However, Wayland support could be added either by providing a Wayland-specific frontend implementation for our display server or by providing a client-side implementation of libwayland that ultimately talks to Mir.