Previously if we had no measurements then `compute_max_render_time_us`
would pessimise its answer to ensure triple buffering could be reached:
```
if (frame_clock->state == CLUTTER_FRAME_CLOCK_STATE_DISPATCHED_ONE)
ret += refresh_interval_us;
```
But that also meant entering triple buffering even when not required.
Now we make `compute_max_render_time_us` more honest and return failure
if the answer isn't known (or is disabled). This in turn allows us to
optimize `calculate_next_update_time_us` for this special case, ensuring
triple buffering can be used, but isn't blindly always used.
This makes a visible difference to the latency when dragging windows in
Xorg, but will also help Wayland sessions on platforms lacking
TIMESTAMP_QUERY such as Raspberry Pi.
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
But only if we've ever got actual swap measurements
(COGL_FEATURE_ID_TIMESTAMP_QUERY). If it's supported then we now drop to
double buffering and get optimal latency on a burst of cursor-only
updates.
Closes: https://launchpad.net/bugs/2023363
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
Detached onscreens have no valid view so avoid servicing callbacks on
them during/after sleep mode. As previously mentioned in 45bda2d969.
Closes: https://launchpad.net/bugs/2020049
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
It was assuming an immediate transition from compositing (triple
buffering) to direct scanout (double buffering), whereas there is
a one frame delay in that transition as the buffer queue shrinks.
We don't lose any frames in the transition.
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
1. When direct scanout is attempted
There's no compositing during direct scanout so the "render" time is zero.
Thus there is no need to implement triple buffering for direct scanouts.
Stick to double buffering and enjoy the lower latency.
2. If disabled by environment variable MUTTER_DEBUG_TRIPLE_BUFFERING
With possible values {never, auto, always} where auto is the default.
3. When VRR is in use
VRR calls `clutter_frame_clock_schedule_update_now` which would keep
the buffer queue full, which in turn prevented direct scanout mode.
Because OnscreenNative currently only supports direct scanout with
double buffering.
We now break that feedback loop by preventing triple buffering from
being scheduled when the frame clock mode becomes variable. Long term
this could also be solved by supporting triple buffering in direct
scanout mode. But whether or not that would be desirable given the
latency penalty remains to be seen.
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
We need this hint whether direct scanout succeeds or fails because it's
the mechanism by which we will tell the clock to enforce double buffering,
thus making direct scanout possible on future frames. Triple buffering
will be disabled until such time that direct scanout is not being attempted.
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
Chronologically they already overlap in time as presentation may
complete in the middle of the dispatch function, otherwise they are
contiguous in time. And most switch statements treated the two states
the same already so they're easy to merge into a single `DISPATCHED`
state.
Having fewer states now will make life easier when we add more states
later.
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
Error diffusion was introduced in 0555a5bbc1 for Nvidia where last
presentation time is always unknown (zero). Dispatch times would drift
apart always being a fraction of a frame late, and accumulated to cause
periodic frame skips. So error diffusion corrected that precisely and
avoided the skips.
That works great with double buffering but less great with triple
buffering. It's certainly still needed with triple buffering but
correcting for a lateness of many milliseconds isn't a good idea. That's
because a dispatch being that late is not due to main loop jitter but due
to Nvidia's swap buffers blocking when the queue is full. So scheduling
the next frame even earlier using last_dispatch_lateness_us would just
perpetuate the problem of swap buffers blocking for too long.
So now we lower the threshold of when error diffusion gets disabled. It's
still high enough to fix the original smoothness problem it was for, but
now low enough to detect Nvidia's occasionally blocking swaps and backs
off in that case.
Since the average duration of a blocking swap is half a frame interval
and we want to distinguish between that and sub-millisecond jitter, the
logical threshold is halfway again: refresh_interval_us/4.
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
It's analogous to discard_pending_page_flips but represents swaps that
might become flips after the next frame notification callbacks, thanks
to triple buffering. Since the views are being rebuilt and their onscreens
are about to be destroyed, turning those swaps into more flips/posts would
just lead to unexpected behaviour (like trying to flip on a half-destroyed
inactive CRTC).
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
Otherwise we could get:
meta_kms_prepare_shutdown ->
flush_callbacks ->
... ->
try_post_latest_swap ->
post and queue more callbacks
So later in shutdown those callbacks would trigger an assertion failure
in meta_kms_impl_device_atomic_finalize:
g_hash_table_size (impl_device_atomic->page_flip_datas) == 0
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
So that they don't get overwritten prematurely during triple buffering
causing tearing.
https://launchpad.net/bugs/1999216
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
And when the number of pending posts decreases we know it's safe to submit
a new one. Since KMS generally only supports one outstanding post right now,
"decreases" means equal to zero.
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
This will allow us to keep track of up to two buffers that have been
swapped but not yet scanning out, for triple buffering.
This commit replaces mutter!1968
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
All paths out of `meta_onscreen_native_swap_buffers_with_damage` from
here onward would set the same `CLUTTER_FRAME_RESULT_PENDING_PRESENTED`
(or terminate with `g_assert_not_reached`).
Even failed posts set this result because they will do a
`meta_onscreen_native_notify_frame_complete` in
`page_flip_feedback_discarded`.
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
Because it soon won't be the maximum. But we do want to verify that the
frame info queue is not empty, to avoid NULL dereferencing and catch logic
errors.
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
This is a case that triple buffering will encounter. We don't want it
to queue the same onscreen multiple times because that would represent
multiple flips occurring simultaneously.
It's a linear search but the list length is typically only 1 or 2 so
no need for anything fancier yet.
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
cogl_framebuffer_finish can result in a CPU-side stall because it waits for
the primary GPU to flush and execute all commands that were queued before
that. By using a GPU-side EGLSync we can let the primary GPU inform us when
it is done with the queued commands instead. We then create another EGLSync
on the secondary GPU using the same fd so the primary GPU effectively
signals the secondary GPU when it is done rendering, causing the latter
to wait for the former before copying part of the frames it needs for
monitors attached to it directly.
This solves the corruption that cogl_framebuffer_finish also solved, but
without needing a CPU-side stall.
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
See previous commit log on the effects of this.
This means the deadline evasion needs to be added in both cases in
clutter_frame_clock_notify_presented.
v2:
* Use meta_kms_update_set_sync_fd. (Jonas Ådahl)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/3958>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
If the KMS thread is using the deadline timer, and a valid sync_file
descriptor is passed in:
1. The update is deferred, and the deadline timer is left armed, until
the sync_fd signals (becomes readable).
2. Implicit synchronization is disabled for the KMS update.
This means cursor updates should no longer miss a display refresh
cycle due to mutter's compositing GPU work finishing too late.
v2:
* Use g_autoptr for GSource in meta_kms_impl_device_handle_update.
(Sebastian Wick)
v3:
* Use meta_kms_update_get_sync_fd, don't track sync_fd in
CrtcFrame::submitted_update. (Jonas Ådahl)
v4:
* Clean up CrtcFrame::submitted_update members in crtc_frame_free.
v5:
* Coding style cleanup in meta_kms_impl_device_handle_update.
(Jonas Ådahl)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/3958>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
To META_KMS_ASSIGN_PLANE_FLAG_DISABLE_IMPLICIT_SYNC. This describes the
effect of the flag, instead of the circumstances it's currently used
for.
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/3958>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
If both crtc->shortterm_max_dispatch_duration_us and
crtc->deadline_evasion_us are 0, i.e. we're not using the deadline
timer.
v2:
* Fix coding style. (Jonas Ådahl)
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/3958>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
And take it into account in meta_kms_crtc_get_deadline_evasion.
This uses the same fundamental approach as clutter frame clock scheduling:
Measure the deadline timer dispatch duration, keep track of the longest
duration, and set the timer to fire such that the longest measured
dispatch duration would result in it completing shortly before start of
vblank.
Closes: https://gitlab.gnome.org/GNOME/mutter/-/issues/3612
v2:
* Move DEADLINE_EVASION_CONSTANT_US addition from
meta_kms_crtc_determine_deadline to meta_kms_crtc_get_deadline_evasion.
* Calculate how long before start of vblank dispatch completed for
debug output in crtc_frame_deadline_dispatch.
* Shorten over-long lines in crtc_frame_deadline_dispatch.
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/3934>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
(cherry picked from commit 88e7f353)
And also "completion" time to measure when the commit returned.
This is structured so as to measure all timestamps first before logging
anything. That way our results shouldn't be (don't seem to be) affected
by the logging itself.
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/3265>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
If we finish compositing in time, the composited result will be
submitted prior to the deadline timer is triggered, and we'll be fine,
and if not, at least the cursor updates will be smooth, which makes it
appear smoother than not.
There is a risk that this can negatively impact composited updates when
moving the cursor, so make it possible to toggle a paint-debug flag for
now until this has been more tested.
This also mean we need to disarm the deadline timer after handling
update, as there might be a scheduled cursor update pending, but we
already handled it, so disarm the timer.
Here is an illustration of the difference.
In the following scenario, with disarming, the composited frame E, and
the cursor movement C gets presented. With this branch, only the cursor
movement C gets presented.
```
* A: beginning of composited frame
* B: begin notification reaches KMS thread
* C: cursor moved
* D: calculated deadline dispatch time (disabled with the branch)
* E: KMS update posted
* F: KMS update reaches KMS thread
* G: actual deadline (and with branch and gets committed)
Compositor thread: --------A---------------E---------
\ \
\ \
KMS thread: -----------B------C----D---F-G----
```
In the following scenario, by not disarming, the cursor update C will be
presented, and the would-be-delayed composited frame E would be delayed
anyway, i.e. fixing cursor stutter.
```
* A: beginning of composited frame
* B: begin notification reaches KMS thread
* C: cursor moved
* D: calculated deadline dispatch time (and with branch will be dispatched)
* E: KMS update posted
* F: actual deadline
* G: KMS update reaches KMS thread (and with branch gets postponed)
Compositor thread: --------A---------------E---------
\ \
\ \
KMS thread: -----------B------C----D-F-G------
```
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/3184>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
The deadline evasion depends on debug flags, but they are not trackable,
so update the deadline evasion each time we schedule an update.
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/3184>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
(cherry picked from commit 6ec1312384)
This is meant to be the amount of time before a CRTC deadline we're
usually dispatching at. It's not yet set by anything however.
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/3184>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
And return early from `swap_buffers_with_damage` if the error would have
led to flipping a NULL buffer.
This is also the perfect time to remove the `egl_context_changed` parameter
and move `_cogl_winsys_egl_ensure_current` closer to the code that actually
needs it.
Related: https://bugs.launchpad.net/bugs/2069565
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/3817>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
So that swap failure messages are not also followed by:
meta_stage_native_redraw_view: runtime check failed: (!META_IS_CRTC_KMS (crtc))
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/3817>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
It won't be used until later when we flip, and in fact assigning
it early could have led to its own assertion failing on the next frame
in the unlikely event that we return with "Failed to ensure KMS FB ID...
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/3891>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
`update_secondary_gpu_state_post_swap_buffers` decides what our front
buffer object will be. There is only one answer. So return it as the
function result instead of making the caller figure it out.
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/3830>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
It's always equal to `onscreen_native->next_frame` and we can't eliminate
that copy so easily. Removing the parameter removes all ambiguity about
where the next frame will come from.
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/3829>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>
Let the ClutterFrame (or rather MetaFrameNative) own both the scanout
object and the framebuffer object, and let the frame itself live for as
long as it's needed. This allows to place fields that is related to a
single frame together, aiming to help reasoning about the lifetime of
the fields that were previously directly stored in MetaOnscreenNative.
Also take the opportunity to rename "current" to "presenting", to make
it clearer that frame's buffer is what is currently presenting to the
user.
Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/3799>
Signed-off-by: Mingi Sung <sungmg@saltyming.net>