What is a session manager?

The main purpose of PipeWire is to act as an intermediate layer between applications and devices. To achieve this, it provides a generic way for applications to create media streams, which can then be directed to any device or other application for playback or capture. This functionality defines PipeWire as a stream exchange framework. Apart from providing a mechanism to create media streams, however, stream exchange also requires a mechanism to define who is exchanging data with whom. In other words, it needs a mechanism to decide which application is going to be connected to which device, how and when.

In traditional setups, applications have direct access to devices. This means they need to choose themselves the device they want to open and set it up according to their media requirements (i.e. choose an audio sample rate, a format, a video resolution, etc). While system configuration can exist to have a “system default” device (ex. in ALSA), in some setups this is not the case, burdening the application developer to provide a way to configure device selection. Furthermore, such setups do not allow transparent switching of devices (ex. switch audio playback from laptop speakers to a bluetooth headset while music is playing), unless the application implements the complex operations required to do so. In some cases, another issue is that devices are controlled exclusively by a single application, not allowing more complex use cases where sharing a device is required. Last but not least, accessing devices directly increases the complexity of the applications’ media pipelines in order to handle multiple device formats or deal with mis-behaving / non-standard devices.

PulseAudio has improved this situation significantly for audio applications. In PulseAudio, audio devices are opened and configured internally and audio applications can just create streams of any desired format and request to play or capture from the “default” device. Application developers no longer have to provide a means to configure which device to use, although they still can if they want to. PulseAudio maintains this “default” device preference internally and automatically creates the necessary internal links to make things work when a new stream comes in from an application. This default device preference can be changed at runtime and application streams can be transparently redurected to another device, abstracting away all complexity. The problem here, however, is that while this logic is great for most desktop applications, it does not scale well to other use cases. Also, PulseAudio does not handle video streams…

On the other side there is JACK, which deals with a specific use case as well: professional audio. JACK similarly allows applications to just create a stream and forget about the device. But unlike PulseAudio, it implements no connection logic internally. This is left to an external component: the session manager. The session manager watches for applications connecting or disconnecting and uses its own logic to link them to a device or a peer application. This may involve a “default” device target, but it normally follows a set of more complex user-configurable rules that allow flexibility in setting up the audio processing stage for professional audio applications. The problem here, however, is of course that JACK does not handle well the typical desktop use case and is complex to use for a non-professional.

Which brings us back to PipeWire… Combining parts of all these designs together, PipeWire provides a flexible media server that can be used to implement desktop, embedded, professional and non-professional use cases for both audio and video. To its best interest, PipeWire is also powered by a session manager, similar to the one in JACK, but with even more powers available.

WirePlumber

PipeWire upstream has a very limited example session manager. It serves as a good example for building new ones and has some functionality there for basic desktop use cases and testing, but it goes no further than that. WirePlumber serves as a replacement for this example and additionally provides a framework for building custom session managers.

The main goal of WirePlumber as a session manager is obviously to watch for streams from applications and make sure that they get linked to the appropriate device or peer application according to the rules of the use case that it implements. However, unlike a JACK session manager, a PipeWire session manager has more responsibilities.

Device monitoring

PipeWire itself actually does not open any devices when it starts. It provides components that can do that, but they are not loaded by default in the daemon. A main task of the session manager is to load these components, for the devices that it is interested in, and configure the devices appropriately.

This is reasonable to be part of the session manager, since the decision of which devices to probe and how to configure them is specific to the use case. A car’s audio hardware requires different configuration than a desktop’s sound card.

WirePlumber provides a module that deals with monitoring devices which works for all of PipeWire’s device monitor components that implement the spa_device interface. This includes ALSA, V4L2 and bluez5 monitors. Additionally, it provides a module that loads the special “JACK” device, which allows PipeWire to run as a client to the JACK audio server.

Client permissions

PipeWire takes security seriously and assumes by default that all applications are untrustworthy. Internally, it provides a permissions system similar to the one on UNIX filesystems, allowing to set read, write & execute (rwx) bits on all objects that a client can access through its IPC protocol. A client that does not have the required permissions to access an object cannot do anything malicious with it.

Another task of the session manager, therefore, is to authenticate clients and grant them the appropriate permissions. WirePlumber provides a module for that, although, at the time of writing this post, this module is dummy and does not do proper permissions management; it just grants all clients full access to all objects. There are plans to implement this properly for AGL and for the desktop, though, so stay tuned.

Endpoints

PipeWire internally represents the media flow using a graph of components that are called “nodes”, which are linked to one another. These are the purple and green boxes in the diagram above. Nodes abstract processing logic and provide a way for getting data in and out of PipeWire, delegating processing to clients or devices.

When managing this graph, it is often the case that several nodes need to be managed together as a single entity that provides more complex functionality. For instance, an audio DSP filter that operates on an audio device would be represented by a node that is linked directly to that audio device’s node. Applications that want their audio to pass through that filter should then have their nodes linked with the filter node instead of the device node. This increases complexity of whichever component is making the decision on where to link what, as it now needs to have specific knowledge about this filter’s operation. Additionally, this does not work well with configuration UIs like pavucontrol or GNOME’s sound settings, which are built around the concept that applications connect directly to devices with nothing in-between.

Another concern is that in modern systems streams are often associated with a use case. This is not visible on desktop systems so much, but think of your phone. Audio streams that deliver music are separated from audio streams that deliver notifications or alarm sounds and they come with separate volume controls and policy as to whether they are audible, whether they are emphasized (all other streams muted or ducked to a lower volume), etc… Similar properties apply to video streams, where, for instance, a camera feed that is meant for live preview on your screen has a different encoding and resolution than the feed that is meant for video recording and the feed that is meant for still image (photo) capture.

While it may not sound complex, associating streams with use cases can be very much so on embedded systems. In pure software, for example, the audio use cases implementation would be just a matter of categorizing application streams and adjusting their volume controls or their link status based on policy configuration. On embedded, however, it is common for all of this to be implemented on a dedicated hardware DSP that receives all the streams via different paths and applies all the mixing, volume alterations, effects and policy in hardware. Controlling the operation of this hardware, therefore, becomes specific to the device and that means that the session manager, on the CPU side, needs to present an abstraction layer for the policy configuration to work similarly on different devices.

All these problems are solved in WirePlumber by implementing certain objects that are called endpoints. Endpoints, just like nodes, are also linked to one another forming a graph. Each one of them represents a user-conceivable place where media can be routed to/from (such as a pair of speakers or a bluetooth headset’s microphone) and provides a set of endpoint streams, which represent logical paths that can be taken to reach this place, often associated with a use case.

The purpose of this endpoints graph (also called the “session management graph” in the documentation) is to provide a means of viewing the nodes graph from a higher-level perspective that involves use cases and targets that the user can understand. This allows writing policy and other configuration more easily, allowing the user to foget about device-specific details and focus on the actual user experience that this configuration will deliver.

WirePlumber constructs all endpoints using a module that is driven by user-configurable rules and has a modular system for loading system-specific endpoint providers. That system allows integrators to provide code that manages specific hardware, without having to re-implement a custom session manager from scratch.

Session Policy

Last but not least, WirePlumber provides a module that creates links between endpoints based on user-configurable policy rules. This is the main goal of it as the session manager. Unfortunately, the current way of configuring policy is not as flexible as we would like it to be, despite it being the second attempt in writing a policy management module. In the very near future, my plan is to experiment with lua-based scripts that will describe this policy. This subject will be discussed further in a future blog post, so I will keep it short here.

Modular Design

As you may have noticed, in all the above text about WirePlumber’s features I have mentioned that it provides “modules” that offer functionality. This is a key design aspect of WirePlumber. Every function is a module that builds upon a shared library with common functionality and interfaces that allow the modules to work together.

WirePlumber’s common library is based on GObject, which, among other things, allows implementing bindings to other languages easily. While current modules are all written in C, mechanisms exist to allow implementing them in different languages.

The idea behind all this is for WirePlumber to serve as a whole framework for building custom session managers for PipeWire. It is possible this way to replace functionality that already exists in some module or complement it with additional code. Combined with the modular and extensible nature of PipeWire itself, this can be a very powerful tool for adding custom functionality that goes beyond PipeWire’s original targets.

Comments (4)

Hi - thanks so much for your work in this area. Given the need for remote work during the times of Covid-19, I'd be interested in hearing about how Pipewire will allow for wayland-based remote desktop sharing. Might this be something your team could write about in the future? Forgive me if this is a bit of a naive question. Thank you again for all of your work in this area.

Would you believe it's already there? If you're using a relatively recent version of GNOME (from about the last couple of years), it already has support for casting using PipeWire - either per-window or per-screen. This is exposed through a screencasting portal, which can be used by system applications or Flatpak applications alike, already supported in browsers like Chromium and Firefox as well as other tools such as OBS. Other compositors such as KDE, Weston, wlroots, and more also support the portal.

If you're missing screen-capture functionality somewhere, it's worth following up with the developers and asking them to implement the 'org.freedesktop.portal.ScreenCast' XDG interface, which not only allows for Wayland support, but also provides a much more secure and efficient pipeline than the old X11 method of any client capturing images at will.

PipeWire already allows wayland-based remote desktop sharing in distributions that have integrated this functionality, such as Fedora Workstation. The idea is that the wayland compositor (gnome-shell, weston, sway, etc...) connects to PipeWire and creates a "source" node that provides access to video frames which are captured directly from the graphics output. Then, applications that want to capture (such as a web browser or a VNC/RDP server app) also connect to PipeWire and create a "sink" node that is able to consume these video frames. When the session manager detects these 2 nodes, it links them together, creating a stream that transfers video buffers from the graphics output to the application.

This video stream is typically implemented with "dmabuf" buffers, which means there is no memory copy involved inside PipeWire and the application gets access directly to a hardware buffer with the contents of the screen, making this whole operation as efficient as it can get. At the same time, it is also secure, because the application never gets access to the hardware or the compositor's internals and the session manager arbitrates this stream, meaning that it can disallow it (for instance, if the user has not permitted this app to capture from the screen).

Of course, PipeWire is only the middle-man that enables the compositor and the application to share media. They both need to have support for PipeWire, which may not be the case for every compositor and every application yet. But gnome-shell, weston, firefox, chrome, they all have support already (or at least, patches exist). Hopefully this will be more widely adopted in the years to come, making PipeWire the de-facto media access API for all kinds of applications.