I don't think the Wayland protocol is actually involved in this. Wayland describ...

gf000 · on Jan 26, 2025

According to Asahi Lina, X does async ioctl that can update the cursor even during the scanout of the current frame, while Wayland does atomic, synced updates on everything, cursor involved, which has the benefit of no tearing and the cursor's state is always in sync with the content, but it does add an average of 1 more frame latency (either updates just in time for the next frame), or it will go to the next frame.

arghwhat · on Jan 27, 2025

This is not what Wayland does, it is what a particular display server with Wayland support decided to do.

Second, just to be clear, this only discusses mouse cursors on the desktop - not the content of windows, and in particular not games even if they have cursors. Just the white cursor you browse the Web with.

Anyway, what you refer to is the legacy drm interface that was replaced by the atomic one. The legacy interface is very broken and does not expose new hardware features, but it did indeed handle cursors as its own magical entity.

The atomic API does support tearing updates, but cursor updates are currently rejected in that path as drivers are not ready for that, and at the same time, current consensus is that tearing is toggled on when a particular fullscreen game demands it, and games composite any cursors in their own render pass so they're unaffected. Drivers will probably support this eventually, but it's not meant to be a general solution.

The legacy API could let some hardware swap the cursor position mid-scanout, possibly tearing the cursor, but just because the call is made mid-scanout does not mean that the driver or hardware would do it.

> but it does add an average of 1 more frame latency

If you commit just in time (display servers aim to commit as late as possible), then the delay between the commit and a tearing update made just before the pixels were pushed is dependent on the cursor position - if the cursor is at the first line shown, it makes no difference, if on the last shown, it'll be almost a frame newer.

Averaging cursor positions mean half a frame of extra latency, but with a steady sampling rate instead of rolling shutter.

Proper commit timing is usually the proper solution, and more importantly helps every other aspect of content delivery as well.

account42 · on Jan 27, 2025

> This is not what Wayland does, it is what a particular display server with Wayland support decided to do.

To the user that's an irrelevant distinction.

I also don't think this matters that much - with X11 this was optimized in one place by people that care about such details while with Wayland now every compositor developer (who in general are much more interested in window management policty) needs to become a low leve performance expert.

> Second, just to be clear, this only discusses mouse cursors on the desktop - not the content of windows, and in particular not games even if they have cursors.

Games can and sometimes do use "hardware" cursors as well - after all, they also care about latency.

gf000 · on Jan 27, 2025

Sure, it's what Gnome Wayland does, but the Wayland protocol does sort of mandate that every frame should be perfect, and the cursor has to match the underlying content, e.g. if it moves over a text it has to change to denote that it is selectable.

I do believe it is a useful tradeoff, though.

ta3401 · on Jan 27, 2025

> Anyway, what you refer to is the legacy drm interface that was replaced by the atomic one. The legacy interface is very broken and does not expose new hardware features, but it did indeed handle cursors as its own magical entity.

Isn't it what many people refer to as "hardware cursor"? Is it possible for Wayland to rely on such a feature?

arghwhat · on Jan 27, 2025

Wayland display servers will already be using what is commonly referred to as hardware cursors.

They just use the atomic API to move a cursor or overlay plane, which reflect how the hardware handles things. That the legacy API exposed a specialized cursor API was just a quirk of the design.

Note that planes are a power optimization more than anything else, as it allows e.g. the cursor to move or for decoded video frames to be displayed while GPU's render-related units are powered down. Drawing the cursor move, even though the render task is a rounding error, would require the render-related units to be on.

ta3401 · on Jan 27, 2025

Thank you. So, if I get this right, the cursor position, which is what the video card needs to position the mouse pointer picture on the screen as an overlay to the actual framebuffer, isn't updated asynchronously to the screen update (ie. whenever the mouse is moved), but instead each time a frame is being rendered, and thus the pointer is only moved at these times, which may avoid tearing (though I don't see why) and other nasty effects, yet introduces a small rendering lag.

I don't know however if the mouse pointer picture is still handled the VESA way, or if GPUs video cards nowadays have a more generic API, or what.

kllrnohj · on Jan 27, 2025

There really isn't such a thing as "the actual framebuffer". Instead the display hardware can do composition during scanout from a set of buffers at a set of positions with varying capabilities. These buffers then just being arbitrary dmabufs.

It doesn't give a damn if you give it 2 buffers and one contains a mouse cursor and the other everything else or if you give it 2 buffers and one is everything including the mouse and the other is a video, allowing complete power collapse of the GPU rendering units.

Often they support more than 2 of these as well, and with color conversions, 1D & 3D LUTs, and a handful of other useful properties. Mobile SoCs in particular, like your typical mid/high end snapdragon, actually have upwards of a dozen overlay planes. This is how Android manages to almost never hit GPU composition at all.

On desktop linux all of these go through the drm/kms APIs.

arghwhat · on Jan 28, 2025

Well, GPUs from the big players do give a damn as they tend to have surprisingly limited plane count and capabilities. It is often just a single primary, cursor and overlay plane, sometimes the latter is shared across all outputs, and sometimes what the plane can do depends on what the plane it overlaps with is doing.

Mobile chips are as you mention far ahead in this space, with some having outright arbitrary plane counts.

kllrnohj · on Jan 28, 2025

Although desktop GPU plane counts are much more limited, it's not as restricted as you're portraying. Here's the AMD SoC in the Steam Deck, for example: https://github.com/ValveSoftware/gamescope/blob/master/src/d...

Even though it's only 3 planes, they are relatively feature-rich still. In a typical desktop UI that would indeed be primary, cursor, and video planes. But if the system cursor is hidden, such as in a game, that frees up a plane that can be used for something else - such as the aforementioned game.

arghwhat · on Jan 28, 2025

What you are showing is just a standard color pipeline, which is the bare minimum for color management.

On AMD in particular, the cursor plane must match several aspects of any plane it overlaps with, including transform and color pipeline IIRC.

The AMD SoC in my laptop (much newer than the steam deck) only exposes two overlay planes to share among all 4 display controllers. Intel used to have a single overlay plane per display.

The Raspberry Pi 5 on the other hand intentionally limited the exposed overlay planes to "just" 48, as it can be as many as you have memory for.

You can peek at the reported capabilities of various devices here: https://drmdb.emersion.fr/

ta3401 · on Jan 28, 2025

> which may avoid tearing (though I don't see why)

What I meant here is that I didn't see why asynchronous updates may introduce tearing; but my tired brain couldn't quite formulate it properly. And to answer that, it's clear to me know that an update of the pointer position while the pointer sprite is being drawn would introduce a shift somewhere within that sprite, which is, I suppose, the tearing discussed (and not a whole frame tearing).

> I don't know however if the mouse pointer picture is still handled the VESA way, or if GPUs video cards nowadays have a more generic API, or what.

Also, the VESA interface doesn't seem to handle mouse pointers, it's something that was available in the VGA BIOS, to provide a uniform support for this feature, as each vendor most likely did it their own way.

TapamN · on Jan 27, 2025

It seems like it should be possible to do the X async method without tearing.

When updating the cursor position, check if line being output overlaps with the cursor. If it isn't, it's safe to update the hardware cursor immediately, without tearing. Otherwise, defer updating the cursor until later (vblank would work) to avoid tearing.

Of course, this assumes it's possible to read what row of the frame buffer is being displayed. I think most hardware would support it, but I could see driver support being poorly tested, or possibly even missing entirely from Linux's video APIs.

arghwhat · on Jan 27, 2025

This would have to be done by the kernel driver for you GPU. I kind of doubt that it's possible (you're not really scanning out lines anymore with things like Display Stream Compression, partial panel self refresh and weird buffer formats), and doubt even more that kernel devs would consider it worth the maintenance burden...

gf000 · on Jan 27, 2025

I believe this is called "racing the beam".

But given that different displays work differently, I'm not sure it would worth the hassle.

bandrami · on Jan 27, 2025

I mean at some point it's a fundamental choice though, right? You can either have sync problems or lag problems and there's a threshold past which improving one makes the other worse. (This is true in audio, at least, and while I don't know video that well I can't see why it would be different.)

Vilian · on Jan 27, 2025

Wayland support tearing too, not sure if gnome do but KDE if the application supports, it can draw with tearing for less latency in full screen

yxhuvud · on Jan 27, 2025

Well there are opportunities to do the wrong thing though, like sending an event to the client every time it get an update. Which means that high poll rate mice would DDOS less efficient clients. This used to be a problem in Mutter, but that particular issue was fixed.