Previously if we had no measurements then `compute_max_render_time_us`
would pessimise its answer to ensure triple buffering could be reached:
```
if (frame_clock->state == CLUTTER_FRAME_CLOCK_STATE_DISPATCHED_ONE)
ret += refresh_interval_us;
```
But that also meant entering triple buffering even when not required.
Now we make `compute_max_render_time_us` more honest and return failure
if the answer isn't known (or is disabled). This in turn allows us to
optimize `calculate_next_update_time_us` for this special case, ensuring
triple buffering can be used, but isn't blindly always used.
This makes a visible difference to the latency when dragging windows in
Xorg, but will also help Wayland sessions on platforms lacking
TIMESTAMP_QUERY such as Raspberry Pi.
(cherry picked from commit 7852451a9e93f5116fa853350020a318d31cb710)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
But only if we've ever got actual swap measurements
(COGL_FEATURE_ID_TIMESTAMP_QUERY). If it's supported then we now drop to
double buffering and get optimal latency on a burst of cursor-only
updates.
Closes: https://launchpad.net/bugs/2023363
(cherry picked from commit b2c16a63d2127faa0ebd811fab65c8be97a74469)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
It was assuming an immediate transition from compositing (triple
buffering) to direct scanout (double buffering), whereas there is
a one frame delay in that transition as the buffer queue shrinks.
We don't lose any frames in the transition.
(cherry picked from commit d41b85398b32ed8c65830e7dfcd418604c9a53cd)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
1. When direct scanout is attempted
There's no compositing during direct scanout so the "render" time is zero.
Thus there is no need to implement triple buffering for direct scanouts.
Stick to double buffering and enjoy the lower latency.
2. If disabled by environment variable MUTTER_DEBUG_TRIPLE_BUFFERING
With possible values {never, auto, always} where auto is the default.
3. When VRR is in use
VRR calls `clutter_frame_clock_schedule_update_now` which would keep
the buffer queue full, which in turn prevented direct scanout mode.
Because OnscreenNative currently only supports direct scanout with
double buffering.
We now break that feedback loop by preventing triple buffering from
being scheduled when the frame clock mode becomes variable. Long term
this could also be solved by supporting triple buffering in direct
scanout mode. But whether or not that would be desirable given the
latency penalty remains to be seen.
(cherry picked from commit 280f7f6b26cd3e7a82706d1d001419295ea15d8b)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
We need this hint whether direct scanout succeeds or fails because it's
the mechanism by which we will tell the clock to enforce double buffering,
thus making direct scanout possible on future frames. Triple buffering
will be disabled until such time that direct scanout is not being attempted.
(cherry picked from commit a0c1c735a15456dd8d27c655694892e7d8751a16)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
This will allow the backend to provide performance hints to the frame
clock in future.
(cherry picked from commit 9b5f91b086b6a16c548626fed0e33f776cf3f030)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
This replaces the DISPATCHED state with four new states that are possible
with triple buffering:
DISPATCHED_ONE: Double buffering
DISPATCHED_ONE_AND_SCHEDULED: Scheduled switch to triple buffering
DISPATCHED_ONE_AND_SCHEDULED_NOW: Scheduled switch to triple buffering
DISPATCHED_TWO: Triple buffering
It's four states simply because you now have two booleans that are no
longer mutually exclusive: DISPATCHED and SCHEDULED.
(cherry picked from commit e323bc74b9abb3f694637848421237cf163594bc)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
Chronologically they already overlap in time as presentation may
complete in the middle of the dispatch function, otherwise they are
contiguous in time. And most switch statements treated the two states
the same already so they're easy to merge into a single `DISPATCHED`
state.
Having fewer states now will make life easier when we add more states
later.
(cherry picked from commit a83535e24aecb9148f334b3d5cef9537709dc52a)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
Error diffusion was introduced in 0555a5bbc1 for Nvidia where last
presentation time is always unknown (zero). Dispatch times would drift
apart always being a fraction of a frame late, and accumulated to cause
periodic frame skips. So error diffusion corrected that precisely and
avoided the skips.
That works great with double buffering but less great with triple
buffering. It's certainly still needed with triple buffering but
correcting for a lateness of many milliseconds isn't a good idea. That's
because a dispatch being that late is not due to main loop jitter but due
to Nvidia's swap buffers blocking when the queue is full. So scheduling
the next frame even earlier using last_dispatch_lateness_us would just
perpetuate the problem of swap buffers blocking for too long.
So now we lower the threshold of when error diffusion gets disabled. It's
still high enough to fix the original smoothness problem it was for, but
now low enough to detect Nvidia's occasionally blocking swaps and backs
off in that case.
Since the average duration of a blocking swap is half a frame interval
and we want to distinguish between that and sub-millisecond jitter, the
logical threshold is halfway again: refresh_interval_us/4.
(cherry picked from commit 4304155aa2ef681814641f3ccb5e60c06347178c)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
It's analogous to discard_pending_page_flips but represents swaps that
might become flips after the next frame notification callbacks, thanks
to triple buffering. Since the views are being rebuilt and their onscreens
are about to be destroyed, turning those swaps into more flips/posts would
just lead to unexpected behaviour (like trying to flip on a half-destroyed
inactive CRTC).
(cherry picked from commit ff972c6ef5f9f39302a5345aa85cd2bedf3acdcc)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
And when the number of pending posts decreases we know it's safe to submit
a new one. Since KMS generally only supports one outstanding post right now,
"decreases" means equal to zero.
(cherry picked from commit bce609b3867e3a33948d257dba43791481d4b533)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
This will allow us to keep track of up to two buffers that have been
swapped but not yet scanning out, for triple buffering.
This commit replaces mutter!1968
(cherry picked from commit cd14dbbe1b3ea42a7e8ca5b19a4a177dffa6cae0)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
All paths out of `meta_onscreen_native_swap_buffers_with_damage` from
here onward would set the same `CLUTTER_FRAME_RESULT_PENDING_PRESENTED`
(or terminate with `g_assert_not_reached`).
Even failed posts set this result because they will do a
`meta_onscreen_native_notify_frame_complete` in
`page_flip_feedback_discarded`.
(cherry picked from commit 24418b952c5bfd28c3e6bbd6bbc47f2cf37c0a5f)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
Because it soon won't be the maximum. But we do want to verify that the
frame info queue is not empty, to avoid NULL dereferencing and catch logic
errors.
(cherry picked from commit 2076f6e3a91438bf883354888f019931f63dad00)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
It's slightly more efficient, but also we'll soon need it in multiple
functions.
(cherry picked from commit 0fe3c5c29776e87407dadee90edecc6c8ea36960)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
Because a single iteration might also grow the list again.
(cherry picked from commit d3e50cc023acc440cb483604cc82b04778703f52)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
This is a case that triple buffering will encounter. We don't want it
to queue the same onscreen multiple times because that would represent
multiple flips occurring simultaneously.
It's a linear search but the list length is typically only 1 or 2 so
no need for anything fancier yet.
(cherry picked from commit 19e62b1299405d630cb7d8ce1f210f00a4903db5)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
Same as in cogl_onscreen_swap_buffers_with_damage.
(cherry picked from commit 0348913afa7f4478ce7ed21a52e298f3054cbb91)
Fixes: 99209958b9 ("cogl: Store latest GPU work completed sync fd")
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/4057>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
While in double buffering we only care about one previous presentation,
triple buffering will sometimes need to refer to the presentation from
two dispatches prior. So it might help to separate those frame stats
more clearly. This way each frame's dispatch and presentation times are
stored more cohesively such as to not be overwritten during overlapping
frame lifetimes.
Having two types of frame reference (dispatch and presentation) moving
at difference speeds meant that they could not be stored in a ring. Not
all dispatches become presentations and so storing them in a ring would
necessitate very complex conflict avoidance. Instead, a simple reference
counting model was used.
(cherry picked from commit 817a951b246b64fd1c4148e486d24017f71fc4c4)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/3961>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
So that we maintain a perfectly balanced number of callbacks:
dispatch == notify_ready + notify_presented
Otherwise you can't put any useful logic inside notify_ready and be sure
you're handling all the empty frames.
(cherry picked from commit 85f0f4e227b7f6896fc9fe6d6b9da4d568d2a9e7)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/3961>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
This consolidates duplicate code in meta_drm_buffer_gbm_blit_to_framebuffer
to use the newly added meta_drm_buffer_gbm_create_native_blit_image, which
also has the side-effect of caching creation of the EGLImage per GBM BO.
(cherry picked from commit c866ca2206ab279f639fa452f43812abd9be243a)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/4027>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
Creating an EGLImage is rather expensive and is taking the bulk of the
time the secondary GPU copy path is using for each frame. By caching
these per GBM BO we avoid this expensive recreation, which seems to
significantly improve FPS throughput in these scenarios, e.g. an
AMD or Intel iGPU with an NVIDIA dGPU.
(cherry picked from commit 0255c1f5c099bd0a761d5c42cfdbc74e4375882d)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/4027>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
cogl_framebuffer_finish can result in a CPU-side stall because it waits for
the primary GPU to flush and execute all commands that were queued before
that. By using a GPU-side EGLSync we can let the primary GPU inform us when
it is done with the queued commands instead. We then create another EGLSync
on the secondary GPU using the same fd so the primary GPU effectively
signals the secondary GPU when it is done rendering, causing the latter
to wait for the former before copying part of the frames it needs for
monitors attached to it directly.
This solves the corruption that cogl_framebuffer_finish also solved, but
without needing a CPU-side stall.
(cherry picked from commit 3a40413f7cf036622da3c22c609b703c138c709c)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/4015>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
This adds meta_egl_create_sync and meta_egl_destroy_sync to be able to
create and dispose EGLSync objects, respectively, as well as
meta_egl_wait_sync to be able to wait for an EGLSync on the GPU.
(cherry picked from commit d2e70c15ec1cf6b6f7cd8c8825a5d92a1f89b35b)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/4015>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
e994fbf02 moved warping the pointer to before the destruction of the
resource to prevent dereferencing the constraint after destruction.
This however meant that the constraint was still active when the motion
event caused by the warp is handled, which would constrain the pointer
back again to its original position.
This moves the warping of the pointer back to after the destruction of
the resource and instead just retrieves the seat earlier while the
constraint is still valid.
Fixes: e994fbf02 ("wayland/pointer-constraints: Warp pointer before destroying resource")
Closes: https://gitlab.gnome.org/GNOME/mutter/-/issues/3696
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/4098>
(cherry picked from commit cb34fafd57)
Quoting Ebassi https://www.bassi.io/articles/2023/02/20/bindable-api-2023/:
Whenever you’re describing a function that takes a callback, you
should always annotate the callback argument with the argument that
contains the user data using the (closure argument) annotation
You should not annotate the data argument with a unary (closure).
The unary (closure) is meant to be used when annotating the callback
type
Recently gobject-introspection became a bit more strict with this and
that generated some warnings:
Warning: Cogl: invalid "closure" annotation: only valid on callback
parameters
This commit fix all the closure annotations.
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/4058>
(cherry picked from commit 077eb80a8d)
In some use cases there is a need to dynamically change the preferred
primary GPU, or get rid of the preference altogether. This is currently
not possible due to a change in udev introduced by systemd v247. This
version made the tags "sticky", meaning there is no way to remove them
once attached. When a tag gets removed, only the CURRENT_TAGS property
reflects that change, the removed tag will remain in the TAGS property.
This also bumps libgudev version to 238, since that version introduces
a function, which we need to get the current tags.
Related: https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1562
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/4055>
(cherry picked from commit 57812546b9)
MetaWaylandDrmLeaseDevice and MetaWaylandDrmLeaseConnector hold a
reference to each other.
In both cases, the reference count was increased. Do not increase the
reference count when lease_connector->lease_device is stored to break
the reference count cycle.
Fixes: fb08a597e1 ("wayland/drm-lease: Advertize initial connectors")
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/4031>
(cherry picked from commit 9c536939a1)
drmModeAddFB() doesn't take a format, but depth and bits per pixel.
These can be used to determine whether there should be an alpha channel
or not, and is roughly assumed to result in either XR24 or AR24 if one
passes 24 or 32 as depth, with 32 as bpp.
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/3926>
(cherry picked from commit 86c9d602cd)
During an onscreen swap, the cogl_onscreen_swap_buffers_with_damage()
function ensures that the Cogl renderer creates a sync object every
frame. This sync object is later shared with the Wayland clients that
utilize the explicit sync protocol.
However, in headless mode, the function mentioned above is not called.
As a result, the sync object the Cogl renderer stores seems to be not
created. This causes cogl_context_get_latest_sync_fd() function to
return an invalid sync fd, causing Mutter to not be able to materialize
the sync timeline point that the clients wait for when the explicit sync
protocol is in use.
This change simply adds a call to the cogl_framebuffer_flush() function
to the offscreen swap path to make sure that there is a sync object that
can be shared with the clients, which will be signalled when all the
queued operations before the swap are completed.
Signed-off-by: Doğukan Korkmaztürk <dkorkmazturk@nvidia.com>
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/4056>
(cherry picked from commit 6a0cc1371c)