Previously if we had no measurements then `compute_max_render_time_us`
would pessimise its answer to ensure triple buffering could be reached:
```
if (frame_clock->state == CLUTTER_FRAME_CLOCK_STATE_DISPATCHED_ONE)
ret += refresh_interval_us;
```
But that also meant entering triple buffering even when not required.
Now we make `compute_max_render_time_us` more honest and return failure
if the answer isn't known (or is disabled). This in turn allows us to
optimize `calculate_next_update_time_us` for this special case, ensuring
triple buffering can be used, but isn't blindly always used.
This makes a visible difference to the latency when dragging windows in
Xorg, but will also help Wayland sessions on platforms lacking
TIMESTAMP_QUERY such as Raspberry Pi.
(cherry picked from commit 7852451a9e93f5116fa853350020a318d31cb710)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
But only if we've ever got actual swap measurements
(COGL_FEATURE_ID_TIMESTAMP_QUERY). If it's supported then we now drop to
double buffering and get optimal latency on a burst of cursor-only
updates.
Closes: https://launchpad.net/bugs/2023363
(cherry picked from commit b2c16a63d2127faa0ebd811fab65c8be97a74469)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
It was assuming an immediate transition from compositing (triple
buffering) to direct scanout (double buffering), whereas there is
a one frame delay in that transition as the buffer queue shrinks.
We don't lose any frames in the transition.
(cherry picked from commit d41b85398b32ed8c65830e7dfcd418604c9a53cd)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
1. When direct scanout is attempted
There's no compositing during direct scanout so the "render" time is zero.
Thus there is no need to implement triple buffering for direct scanouts.
Stick to double buffering and enjoy the lower latency.
2. If disabled by environment variable MUTTER_DEBUG_TRIPLE_BUFFERING
With possible values {never, auto, always} where auto is the default.
3. When VRR is in use
VRR calls `clutter_frame_clock_schedule_update_now` which would keep
the buffer queue full, which in turn prevented direct scanout mode.
Because OnscreenNative currently only supports direct scanout with
double buffering.
We now break that feedback loop by preventing triple buffering from
being scheduled when the frame clock mode becomes variable. Long term
this could also be solved by supporting triple buffering in direct
scanout mode. But whether or not that would be desirable given the
latency penalty remains to be seen.
(cherry picked from commit 280f7f6b26cd3e7a82706d1d001419295ea15d8b)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
We need this hint whether direct scanout succeeds or fails because it's
the mechanism by which we will tell the clock to enforce double buffering,
thus making direct scanout possible on future frames. Triple buffering
will be disabled until such time that direct scanout is not being attempted.
(cherry picked from commit a0c1c735a15456dd8d27c655694892e7d8751a16)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
This will allow the backend to provide performance hints to the frame
clock in future.
(cherry picked from commit 9b5f91b086b6a16c548626fed0e33f776cf3f030)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
This replaces the DISPATCHED state with four new states that are possible
with triple buffering:
DISPATCHED_ONE: Double buffering
DISPATCHED_ONE_AND_SCHEDULED: Scheduled switch to triple buffering
DISPATCHED_ONE_AND_SCHEDULED_NOW: Scheduled switch to triple buffering
DISPATCHED_TWO: Triple buffering
It's four states simply because you now have two booleans that are no
longer mutually exclusive: DISPATCHED and SCHEDULED.
(cherry picked from commit e323bc74b9abb3f694637848421237cf163594bc)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
Chronologically they already overlap in time as presentation may
complete in the middle of the dispatch function, otherwise they are
contiguous in time. And most switch statements treated the two states
the same already so they're easy to merge into a single `DISPATCHED`
state.
Having fewer states now will make life easier when we add more states
later.
(cherry picked from commit a83535e24aecb9148f334b3d5cef9537709dc52a)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
Error diffusion was introduced in 0555a5bbc1 for Nvidia where last
presentation time is always unknown (zero). Dispatch times would drift
apart always being a fraction of a frame late, and accumulated to cause
periodic frame skips. So error diffusion corrected that precisely and
avoided the skips.
That works great with double buffering but less great with triple
buffering. It's certainly still needed with triple buffering but
correcting for a lateness of many milliseconds isn't a good idea. That's
because a dispatch being that late is not due to main loop jitter but due
to Nvidia's swap buffers blocking when the queue is full. So scheduling
the next frame even earlier using last_dispatch_lateness_us would just
perpetuate the problem of swap buffers blocking for too long.
So now we lower the threshold of when error diffusion gets disabled. It's
still high enough to fix the original smoothness problem it was for, but
now low enough to detect Nvidia's occasionally blocking swaps and backs
off in that case.
Since the average duration of a blocking swap is half a frame interval
and we want to distinguish between that and sub-millisecond jitter, the
logical threshold is halfway again: refresh_interval_us/4.
(cherry picked from commit 4304155aa2ef681814641f3ccb5e60c06347178c)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
It's analogous to discard_pending_page_flips but represents swaps that
might become flips after the next frame notification callbacks, thanks
to triple buffering. Since the views are being rebuilt and their onscreens
are about to be destroyed, turning those swaps into more flips/posts would
just lead to unexpected behaviour (like trying to flip on a half-destroyed
inactive CRTC).
(cherry picked from commit ff972c6ef5f9f39302a5345aa85cd2bedf3acdcc)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
And when the number of pending posts decreases we know it's safe to submit
a new one. Since KMS generally only supports one outstanding post right now,
"decreases" means equal to zero.
(cherry picked from commit bce609b3867e3a33948d257dba43791481d4b533)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
This will allow us to keep track of up to two buffers that have been
swapped but not yet scanning out, for triple buffering.
This commit replaces mutter!1968
(cherry picked from commit cd14dbbe1b3ea42a7e8ca5b19a4a177dffa6cae0)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
All paths out of `meta_onscreen_native_swap_buffers_with_damage` from
here onward would set the same `CLUTTER_FRAME_RESULT_PENDING_PRESENTED`
(or terminate with `g_assert_not_reached`).
Even failed posts set this result because they will do a
`meta_onscreen_native_notify_frame_complete` in
`page_flip_feedback_discarded`.
(cherry picked from commit 24418b952c5bfd28c3e6bbd6bbc47f2cf37c0a5f)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
Because it soon won't be the maximum. But we do want to verify that the
frame info queue is not empty, to avoid NULL dereferencing and catch logic
errors.
(cherry picked from commit 2076f6e3a91438bf883354888f019931f63dad00)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
It's slightly more efficient, but also we'll soon need it in multiple
functions.
(cherry picked from commit 0fe3c5c29776e87407dadee90edecc6c8ea36960)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
Because a single iteration might also grow the list again.
(cherry picked from commit d3e50cc023acc440cb483604cc82b04778703f52)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
This is a case that triple buffering will encounter. We don't want it
to queue the same onscreen multiple times because that would represent
multiple flips occurring simultaneously.
It's a linear search but the list length is typically only 1 or 2 so
no need for anything fancier yet.
(cherry picked from commit 19e62b1299405d630cb7d8ce1f210f00a4903db5)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
While in double buffering we only care about one previous presentation,
triple buffering will sometimes need to refer to the presentation from
two dispatches prior. So it might help to separate those frame stats
more clearly. This way each frame's dispatch and presentation times are
stored more cohesively such as to not be overwritten during overlapping
frame lifetimes.
Having two types of frame reference (dispatch and presentation) moving
at difference speeds meant that they could not be stored in a ring. Not
all dispatches become presentations and so storing them in a ring would
necessitate very complex conflict avoidance. Instead, a simple reference
counting model was used.
(cherry picked from commit 817a951b246b64fd1c4148e486d24017f71fc4c4)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/3961>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
So that we maintain a perfectly balanced number of callbacks:
dispatch == notify_ready + notify_presented
Otherwise you can't put any useful logic inside notify_ready and be sure
you're handling all the empty frames.
(cherry picked from commit 85f0f4e227b7f6896fc9fe6d6b9da4d568d2a9e7)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/3961>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
The stage view color state is not the output color state but usually a
linear version of it. For direct scanout, we need the view color state
to match the output color state instead.
Fixes: 20c7653d49 ("compositor-view/native: Don't scan out surface with color state mismatch")
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/4196>
(cherry picked from commit aea1aee79e0fff2ff78e1f4fc4b7650c51c8f14f)
It tests both the wl_surface integer scale and fractional scales, for
toplevels, subsurfaces and cursor surfaces. It doesn't yet test DND
surfaces.
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/4205>
(cherry picked from commit e7ff05632dd3b51169f7d87b95fa150e5f6d7a94)
This makes it easier to create e.g. subsurfaces or cursor surfaces that
inherit generic logic from the WaylandSurface test utility helper type.
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/4205>
(cherry picked from commit 7273f30234ed03bc6c8f7c48b360d6efb8c0fa87)
It isn't always the highest scale for the surface, so change the
terminology in the common path to reflect this.
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/4205>
(cherry picked from commit 29571397bc9c23134f9c7e0ea6f6d63d93a17cef)
This consolidates duplicate code in meta_drm_buffer_gbm_blit_to_framebuffer
to use the newly added meta_drm_buffer_gbm_create_native_blit_image, which
also has the side-effect of caching creation of the EGLImage per GBM BO.
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/4027>
(cherry picked from commit dfa3755f55badd6e58ae4469dbc412f3789a7095)
Creating an EGLImage is rather expensive and is taking the bulk of the
time the secondary GPU copy path is using for each frame. By caching
these per GBM BO we avoid this expensive recreation, which seems to
significantly improve FPS throughput in these scenarios, e.g. an
AMD or Intel iGPU with an NVIDIA dGPU.
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/4027>
(cherry picked from commit 507409f5d0a7bced42656276e806da6f516ca321)
To swap_buffer_result_feedback from page_flip_feedback_discarded. The
former is where META_KMS_ERROR_DISCARDED from disarm_all_frame_sources
gets handled here.
Fixes: af250506fb ("kms/impl-device: Queue result when discarding submitted update")
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/3964>
(cherry picked from commit f8524159cc6b1980ae7e2d4894a7cab0d339fc67)
This uses the ref test infrastructure inside the cursor tests screen
cast client, and verifies the content on the screen cast matches the
content of the compositor, both when using the 'embedded' cursor mode,
and the 'metadata' cursor mode.
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/3859>
(cherry picked from commit a9a4923c03d5a00ccf4d2eece0a4fa98833b324f)
This test doesn't use a separate reference image, but verifies the
result is identical to that of the built in cursor.
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/3859>
(cherry picked from commit bcb52960dbe8ac2cf944acbadaed1b68e366915b)