mutter-performance-source

History

Ivan Molodetskikh 592fbee065 clutter: Compute max render time heuristically Max render time shows how early the frame clock needs to be dispatched to make it to the predicted next presentation time. Before this commit it was set to refresh interval minus 2 ms. This means Mutter would always start compositing 14.7 ms before a display refresh on a 60 Hz screen or 4.9 ms before a display refresh on a 144 Hz screen. However, Mutter frequently does not need as much time to finish compositing and submit buffer to KMS: max render time /------------\ ---\|---------------\|---------------\|---> presentations D----S D--S D - frame clock dispatch S - buffer submission This commit aims to automatically compute a shorter max render time to make Mutter start compositing as late as possible (but still making it in time for the presentation): max render time /-----\ ---\|---------------\|---------------\|---> presentations D----S D--S Why is this better? First of all, Mutter gets application contents to draw at the time when compositing starts. If new application buffer arrives after the compositing has started, but before the next presentation, it won't make it on screen: ---\|---------------\|---------------\|---> presentations D----S D--S A-------------X-----------> ^ doesn't make it for this presentation A - application buffer commit X - application buffer sampled by Mutter Here the application committed just a few ms too late and didn't make on screen until the next presentation. If compositing starts later in the frame cycle, applications can commit buffers closer to the presentation. These buffers will be more up-to-date thereby reducing input latency. ---\|---------------\|---------------\|---> presentations D----S D--S A----X----> ^ made it! Moreover, applications are recommended to render their frames on frame callbacks, which Mutter sends right after compositing is done. Since this commit delays the compositing, it also reduces the latency for applications drawing on frame callbacks. Compare: ---\|---------------\|---------------\|---> presentations D----S D--S F--A-------X-----------> \____________________/ latency ---\|---------------\|---------------\|---> presentations D----S D--S F--A-------X----> \_____________/ less latency F - frame callback received, application starts rendering So how do we actually estimate max render time? We want it to be as low as possible, but still large enough so as not to miss any frames by accident: max render time /-----\ ---\|---------------\|---------------\|---> presentations D------S-------------> oops, took a little too long For a successful presentation, the frame needs to be submitted to KMS and the GPU work must be completed before the vblank. This deadline can be computed by subtracting the vblank duration (calculated from display mode) from the predicted next presentation time. We don't know how long compositing will take, and we also don't know how long the GPU work will take, since clients can submit buffers with unfinished GPU work. So we measure and estimate these values. The frame clock dispatch can be split into two phases: 1. From start of the dispatch to all GPU commands being submitted (but not finished)—until the call to eglSwapBuffers(). 2. From eglSwapBuffers() to submitting the buffer to KMS and to GPU work completing. These happen in parallel, and we want the latest of the two to be done before the vblank. We measure these three durations and store them for the last 16 frames. The estimate for each duration is a maximum of these last 16 durations. Usually even taking just the last frame's durations as the estimates works well enough, but I found that screen-capturing with OBS Studio increases duration variability enough to cause frequent missed frames when using that method. Taking a maximum of the last 16 frames smoothes out this variability. The durations are naturally quite variable and the estimates aren't perfect. To take this into account, an additional constant 2 ms is added to the max render time. How does it perform in practice? On my desktop with 144 Hz monitors I get a max render time of 4–5 ms instead of the default 4.9 ms (I had 1 ms manually configured in sway) and on my laptop with a 60 Hz screen I get a max render time of 4.8–5.5 ms instead of the default 14.7 ms (I had 5–6 ms manually configured in sway). Weston [1] went with a 7 ms default. The main downside is that if there's a sudden heavy batch of work in the compositing, which would've made it in default 14.7 ms, but doesn't make it in reduced 6 ms, there is a delayed frame which would otherwise not be there. Arguably, this happens rarely enough to be a good trade-off for reduced latency. One possible solution is a "next frame is expected to be heavy" function which manually increases max render time for the next frame. This would avoid this single dropped frame at the start of complex animations. [1]: https://www.collabora.com/about-us/blog/2015/02/12/weston-repaint-scheduling/ Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1762>		2021-07-13 08:09:43 +00:00
..
clutter	clutter: Compute max render time heuristically	2021-07-13 08:09:43 +00:00
.gitignore	clutter: Remove clutter specific version	2018-11-06 17:17:36 +01:00
meson.build	clutter: Move pointer a11y settings management from MetaInputSettings	2021-05-05 19:07:26 +00:00