~eliasnaur/gio#63:
Lagginess with x11 linux

I'm almost done porting nucular (gdlv's widget toolkit) to gio. I was hoping it would improve its lagginess (shiny isn't that great on that front). Unfortunately it seems slightly worse. Just to prove that it wasn't nucular's fault I wrote a very simple gio-only program [1] and recorded it [2]. For comparison I also recorded a nucular-on-shiny application [3]. Note how the image lags behind the movement of the cursor and how it seems to be slightly worse with gio. Given that gio is gpu rendered and rendering a much simpler scene than the nucular-on-shiny example I suspect the problem is added lag in the input handling path.

[1] https://github.com/aarzilli/blah/blob/master/main.go [2] https://github.com/aarzilli/blah/blob/master/gio.mp4 [3] https://github.com/aarzilli/blah/blob/master/nucular_on_shiny.mp4

Status
REPORTED
Submitter
Alessandro Arzilli
Assigned to
No-one
Submitted
6 days ago
Updated
10 minutes ago
Labels
No labels applied.

~eliasnaur 5 days ago

On Thu Nov 7, 2019 at 2:25 PM Alessandro Arzilli wrote:

I'm almost done porting nucular (gdlv's widget toolkit) to gio. I was hoping it would improve its lagginess (shiny isn't that great on that front). Unfortunately it seems slightly worse. Just to prove that it wasn't nucular's fault I wrote a very simple gio-only program [1] and recorded it [2]. For comparison I also recorded a nucular-on-shiny application [3]. Note how the image lags behind the movement of the cursor and how it seems to be slightly worse with gio. Given that gio is gpu rendered and rendering a much simpler scene than the nucular-on-shiny example I suspect the problem is added lag in the input handling path.

[1] https://github.com/aarzilli/blah/blob/master/main.go [2] https://github.com/aarzilli/blah/blob/master/gio.mp4 [3] https://github.com/aarzilli/blah/blob/master/nucular_on_shiny.mp4

Interesting.

First and foremost, the performance counters in the upper left corner indicate that it often takes > 20ms to draw a frame (tot). I'd like to know why, because on my system the frame time is a stable 16.7ms (v-synced 60Hz). However, the f time is the GPU-reported time to draw the frame, excluding v-sync, which is ~0 as expected.

Missing every other frame could alone explain the added input lag compared to Shiny. What is your Window manager? Are you running Xorg directly or through XWayland?

Secondly, the CPU time (CPU) is about the same as the total. That's because we render as fast as we can, wasting CPU time in waiting for the previous frame to v-sync. In an ideal world, we'd know the time to render a frame including compositing, say X ms, and delay the start of the frame 16.7-X ms to give input events a chance to arrive. Wayland estimates X indirectly through a frame callback, and as a result has much lower input latency (on my system), but there's no similar feature in X.

So perhaps counter-intuitively, low render times can add an extra frame of input lag because no input events have a chance to arrive during a frame.

The attached patch tries to delay a frame until the previous is done, and also add a delay of 12ms before polling for events and draw. I don't expect either change to make a significant difference compared to the >20ms problem, but let me know if it does.

I'd also like to know what happens if you disable v-sync (set needVSync to false in egl_x11.go). Disabling V-Sync is not a viable fix, but gives us a sanity check your configuration can draw at full speed.

~eliasnaur 5 days ago

On Thu Nov 7, 2019 at 8:26 PM Elias Naur wrote:

On Thu Nov 7, 2019 at 2:25 PM Alessandro Arzilli wrote:

I'm almost done porting nucular (gdlv's widget toolkit) to gio. I was hoping it would improve its lagginess (shiny isn't that great on that front). Unfortunately it seems slightly worse. Just to prove that it wasn't nucular's fault I wrote a very simple gio-only program [1] and recorded it [2]. For comparison I also recorded a nucular-on-shiny application [3]. Note how the image lags behind the movement of the cursor and how it seems to be slightly worse with gio. Given that gio is gpu rendered and rendering a much simpler scene than the nucular-on-shiny example I suspect the problem is added lag in the input handling path.

[1] https://github.com/aarzilli/blah/blob/master/main.go [2] https://github.com/aarzilli/blah/blob/master/gio.mp4 [3] https://github.com/aarzilli/blah/blob/master/nucular_on_shiny.mp4

Interesting.

First and foremost, the performance counters in the upper left corner indicate that it often takes > 20ms to draw a frame (tot). I'd like to know why, because on my system the frame time is a stable 16.7ms (v-synced 60Hz). However, the f time is the GPU-reported time to draw the frame, excluding v-sync, which is ~0 as expected.

Missing every other frame could alone explain the added input lag compared to Shiny. What is your Window manager? Are you running Xorg directly or through XWayland?

Secondly, the CPU time (CPU) is about the same as the total. That's because we render as fast as we can, wasting CPU time in waiting for the previous frame to v-sync. In an ideal world, we'd know the time to render a frame including compositing, say X ms, and delay the start of the frame 16.7-X ms to give input events a chance to arrive. Wayland estimates X indirectly through a frame callback, and as a result has much lower input latency (on my system), but there's no similar feature in X.

So perhaps counter-intuitively, low render times can add an extra frame of input lag because no input events have a chance to arrive during a frame.

The attached patch tries to delay a frame until the previous is done, and also add a delay of 12ms before polling for events and draw. I don't expect either change to make a significant difference compared to the >20ms problem, but let me know if it does.

I'd also like to know what happens if you disable v-sync (set needVSync to false in egl_x11.go). Disabling V-Sync is not a viable fix, but gives us a sanity check your configuration can draw at full speed.

Another thing I forgot: the continously animating case is pretty much the worst case for system without frame callbacks such as X11. The reason is that when busy drawing frames, inputs are often delayed a frame. When not animating, incoming input will be ready to process and its results ready to draw immediately.

ProfileOp enables animation mode, so please try the attached patch and see whether input lag is lower without it:

diff --git main.go main.go index 84bf230..01bcdac 100644 --- main.go +++ main.go @@ -189,18 +189,24 @@ func loop(w *app.Window) error { ops.Reset()

profileStr := ""
  • profile.Op{blah}.Add(&ops)
  • //profile.Op{blah}.Add(&ops) q := w.Queue() for _, e := range q.Events(blah) { if e, ok := e.(profile.Event); ok { profileStr = e.Timings } }
  • pointer.InputOp{Key: pointer}.Add(&ops)
  • for _, e := range q.Events(pointer) {
  • if e, ok := e.(pointer.Event); ok {
  • pos = e.Position
  • }
  • }

    + drawImage(&ops, imgop, int(pos.X), int(pos.Y), img.Bounds().Dx(), img.Bounds().Dy())

    drawText(&ops, face, 10, 10, e.Size.X, 100, profileStr)
  • + e.Frame(&ops) case pointer.Event: pos = e.Position

Alessandro Arzilli 5 days ago

Responding to both messages, merged.

On Thu, Nov 07, 2019 at 07:26:46PM -0000, ~eliasnaur wrote:

Interesting.

First and foremost, the performance counters in the upper left corner indicate that it often takes > 20ms to draw a frame (tot). I'd like to know why, because on my system the frame time is a stable 16.7ms (v-synced 60Hz). However, the f time is the GPU- reported time to draw the frame, excluding v-sync, which is ~0 as expected.

Missing every other frame could alone explain the added input lag compared to Shiny. What is your Window manager? Are you running Xorg directly or through XWayland?

Cinnamon, which uses a fork of mutter as window manager. Running Xorg directly.

Secondly, the CPU time (CPU) is about the same as the total. That's because we render as fast as we can, wasting CPU time in waiting for the previous frame to v-sync. In an ideal world, we'd know the time to render a frame including compositing, say X ms, and delay the start of the frame 16.7-X ms to give input events a chance to arrive. Wayland estimates X indirectly through a frame callback, and as a result has much lower input latency (on my system), but there's no similar feature in X.

So perhaps counter-intuitively, low render times can add an extra frame of input lag because no input events have a chance to arrive during a frame.

The attached patch tries to delay a frame until the previous is done, and also add a delay of 12ms before polling for events and draw. I don't expect either change to make a significant difference compared to the >20ms problem, but let me know if it does.

Didn't see any attached patch.

ProfileOp enables animation mode, so please try the attached patch and see whether input lag is lower without it:

Ok. I tried this and it makes very little difference, if any. I ran this test on openbox which isn't composited and that did improve lag:

https://github.com/aarzilli/blah/blob/master/gio-openbox-noprofile.mp4

On a side note, a version of ProfileOp that doesn't cause continuous redrawing would be nice.

I'd also like to know what happens if you disable v-sync (set needVSync to false in egl_x11.go). Disabling V-Sync is not a viable fix, but gives us a sanity check your configuration can draw at full speed.

This actually improves things a lot:

https://github.com/aarzilli/blah/blob/master/gio-openbox-noprofile-novsync.mp4

For reference this is nucular-on-gio with vsync disable:

https://github.com/aarzilli/blah/blob/master/gio-noprofile-novsync-nucular.mp4

and in terms of lag it is completely acceptable (and I suspect that's all introduced by the compositor).

~eliasnaur 5 days ago

I'll work on disable continous redrawing for ProfileOp.

In the meantime, I've attached the missing patch.

Alessandro Arzilli 5 days ago

On Fri, Nov 08, 2019 at 08:41:01AM -0000, ~eliasnaur wrote:

I'll work on disable continous redrawing for ProfileOp.

In the meantime, I've attached the missing patch.

have you?

View on the web: https://todo.sr.ht/~eliasnaur/gio/63#comment-4635

~eliasnaur 5 days ago

On Fri Nov 8, 2019 at 9:21 AM Alessandro Arzilli wrote:

On Fri, Nov 08, 2019 at 08:41:01AM -0000, ~eliasnaur wrote:

I'll work on disable continous redrawing for ProfileOp.

In the

meantime, I've attached the missing patch.

have you?

Yes, but in my infinite wisdom I forgot that lists.sr.ht eats attachments.

I've pushed the patch to the x11-vsync-hacks branch on https://git.sr.ht/~eliasnaur/gio.

I'd like to know whether the patch achieves latency-parity with Shiny hwn profiling is off. If so, does it achieve parity even with the hard-coded time.Sleep removed?

Which graphics card do you have?

I'm on the #gioui gophers.slack.com channel if it's convenient for you.

-- elias

Alessandro Arzilli 5 days ago

On Fri, Nov 08, 2019 at 11:10:47AM -0000, ~eliasnaur wrote:

Yes, but in my infinite wisdom I forgot that lists.sr.ht eats attachments.

I've pushed the patch to the x11-vsync-hacks branch on

https://git.sr.ht/~eliasnaur/gio.

I'd like to know whether the patch achieves latency-parity with Shiny hwn profiling is off. If so, does it achieve parity even with the hard-coded time.Sleep removed?

Unfortunately it doesn't seem to make any difference.

Which graphics card do you have?

Intel HD Graphics 5500 (rev 09) (that would be an integrated graphics card in the core i5)

I'm on the #gioui gophers.slack.com channel if it's convenient for you.

~eliasnaur 5 days ago

On Fri Nov 8, 2019 at 11:52 AM Alessandro Arzilli wrote:

On Fri, Nov 08, 2019 at 11:10:47AM -0000, ~eliasnaur wrote:

Yes,

but in my infinite wisdom I forgot that lists.sr.ht eats attachments.

I've pushed the patch to the x11-vsync-hacks branch on

https://git.sr.ht/~eliasnaur/gio.

I'd like to know whether the patch achieves latency-parity with Shiny hwn profiling is off. If so, does it achieve parity even with the hard-coded time.Sleep removed?

Unfortunately it doesn't seem to make any difference.

I pushed the ProfileOp fix so that it won't continously redraw.

I'd like to know the profiling timings, in particular whether they're > 20ms. If they are, please try to tweak the time.Sleep from the x11 hack branch to see whether that helps decreasing the frame time.

~eliasnaur 5 days ago

I added another experiment to the x11-vsync-hacks branch: according to https://www.khronos.org/opengl/wiki/Swap_Interval, glFinish after eglSwapBuffers will ensure that the GPU doesn't queue up frames. This should lead to lower input latency.

This is particularly interesting since your frame times > 20ms indicates that 1.5 frame is queued up at the driver side.

Let me know whether it makes any difference.

~db47h 5 days ago

Sorry for being late on this one, but the example you posted doesn't work for me (the image displays but then nothing happens, unless I resize the window).

~eliasnaur 5 days ago

On Fri Nov 8, 2019 at 1:51 PM ~db47h wrote:

Sorry for being late on this one, but the example you posted doesn't work for me (the image displays but then nothing happens, unless I resize the window).

That's probably because I disable ProfileOp's continous re-rendering.

I run the example with the following patch:

diff --git main.go main.go
index 84bf230..0587399 100644
--- main.go
+++ main.go
@@ -196,11 +196,17 @@ func loop(w *app.Window) error {
                    profileStr = e.Timings
                }
            }
-   
+           pointer.InputOp{Key: "pointer"}.Add(&ops)
+           for _, e := range q.Events("pointer") {
+               if e, ok := e.(pointer.Event); ok {
+                   pos = e.Position
+               }
+           }
+
            drawImage(&ops, imgop, int(pos.X), int(pos.Y), img.Bounds().Dx(), img.Bounds().Dy())

            drawText(&ops, face, 10, 10, e.Size.X, 100, profileStr)
-           
+
            e.Frame(&ops)
        case pointer.Event:
            pos = e.Position

~db47h 5 days ago

much better, thanks :) It's kinda laggy, but not bad (I know, this is subjective).

Still, I can't figure out the lag difference between Wayland and X, I must be missing something. Here are the timings as I see it in a worst case scenario:

  • 0 ms : handle events (say it takes 0 time)
  • 0 ms: send draw event for frame #1
  • 1ms: user mouse move: buffered by X
  • 16.6ms: draw #1 returns
  • 16.6ms: handle events, send the mouse move, its Time field should be correct
  • 16.6ms: send draw for frame #2
  • 33.3ms: draw #2 returns. object position updated on screen with mouse position at t-32ms

The above timings assume that the frame is on screen when draw returns. When I run Alessandro's example, the perceived lag is consistent with these timings.

The only difference I see with the Wayland loop is that on Wayland, the application will receive the mouse move while draw #1 is in progress, yet the result won't appear on screen until frame #2 is rendered, 32ms later. Is this correct ?

I believe game engines deal with this in their physics system where the position of a moving object is interpolated so that when the frame is drawn, the object is drawn where it should be at that time (then an almost non-perceptible correction applied when it stops moving).

~eliasnaur 5 days ago

Another thought: Shiny has two X11 backends, the native (RENDER?) and the EGL. Alessandro, if you haven't done so already, could you switch to the EGL Shiny backend and see whether the input latency approaches that of Gio?

~eliasnaur 5 days ago

~db47h: the crucial difference is that Wayland tells us when it is time for another frame through the frame callback. See https://emersion.fr/blog/2018/wayland-rendering-loop/.

Basically, Wayland is able to shave a v-sync frame off the input latency by delaying the start of the next frame (16.7-X) milliseconds, where X is the (estimated) time to render (and composite!) a frame. It's during that delay additional input events have a chance to arrive.

It's pretty cool; on my system the Wayland input latency from Alessandro's test is almost as low as running without v-sync.

~eliasnaur 5 days ago

~db47h: what's odd about Alessandro's setup is that the CPU timings report > 20 ms per frame, which indicates that his GPU has queued up more than one frame. That alone can explain the extra input lag compared to Shiny. I tried to emulate the Wayland frame callback delay by time.Sleeping before handling events in the X11 backend, to no avail. The new approach is to eliminate the extra queued frame through a call to glFinish after eglSwapBuffers.

~db47h 5 days ago

Yes, these timings are odd. Alessandro, is it only Intel graphics or some kind of dual GPU setup (intel+NVIDIA for example) ?. Also, what's your screen's actual refresh rate? 20ms looks like 50Hz.

~eliasnaur I was looking at your changes in the x11 branch, and frameReady doesn't do anything: the Event(system.FrameEvent{}) call only returns after the gl buffers are swapped (at least for me), so notifying the driver early is a NOOP. But don't drop the idea! It would be nice if the render loop could respond (g.results <- res) before eglSwapBuffers: no more delay imposed by v-sync (wouldn't help here anyway).

~eliasnaur 5 days ago

Ok, I managed to reproduce the lousy frame times, several > 20 ms. I previously tested on XWayland (through the Sway Wayland compositor), which reached 16.7 ms easily. With Cinnamon or just Gnome running in native Xorg mode (Fedora 31) Alessandro's erratic frame times are apparent.

My theory is that Xorg is compositing with v-sync, so adding v-sync to our drawing leads to an extra frame (or half a frame?) of extra latency.

I don't know what the correct fix is. Running without v-sync leads us to second guess the appropriate refresh rate, and even if we do guess correctly, our timer is bound to drift compared to the actual refreshes and variable compositing delay.

~db47h 5 days ago

My theory is that Xorg is compositing with v-sync, so adding v-sync to our drawing leads to an extra frame (or half a frame?) of extra latency.

That makes sense and it's happening with gnome-shell/mutter as well, only more sporadically. If you draw a large rotating square, you'll see frame drops every once in a while. Go fullscreen and it's gone.

That's partly why I started with a timer based event loop w/o vsync, but then again, that's second guessing what the frame rate should be.

Anyhow I don't think Gio is the only framework impacted by this on Cinnamon.

~db47h 5 days ago

Oh wait. You said

just Gnome running in native Xorg mode (Fedora 31) Alessandro's erratic frame times are apparent

You mean gnome-shell/mutter ?

~db47h 5 days ago

Another thing that could cause issues is that handleEvents() does not return while there are still xevents to be processed (that's why it doesn't send draw events by itself). In the early implementations of the driver, I had counter measures in place, but after some proper profiling, it turned out that in a worst case scenario it always returned after at most 10 events or so. Would your environments behave differently?

Alessandro Arzilli 2 days ago

On Fri, Nov 08, 2019 at 01:26:56PM -0000, ~eliasnaur wrote:

I pushed the ProfileOp fix so that it won't continously redraw.

I'd like to know the profiling timings, in particular whether they're > 20ms. If they are, please try to tweak the time.Sleep from the x11 hack branch to see whether that helps decreasing the frame time.

Apologies for taking so long to respond, my spare time was being consumed by a delve bug. The x11-vsync-hacks branch makes things generally better but also produces a strange behavior: immediately after I open the window, or immediately after I alt tab to it, the tot time is sub-1ms (around 700μs) and perceived lag is very low; after a while the tot time jumps up to 7~8ms and the perceived lag increases. If I change the sleep from 10ms to 20ms this doesn't happen (tot stays low).

Alessandro Arzilli 2 days ago

On Fri, Nov 08, 2019 at 02:29:51PM -0000, ~db47h wrote:

Yes, these timings are odd. Alessandro, is it only Intel graphics or some kind of dual GPU setup (intel+NVIDIA for example) ?.

Nope, just intel.

Also, what's your screen's actual refresh rate? 20ms looks like 50Hz.

xrandr says 60Hz:

Screen 0: minimum 8 x 8, current 2560 x 1440, maximum 32767 x 32767 eDP1 connected primary 2560x1440+0+0 (normal left inverted right x axis y axis) 310mm x 170mm 2560x1440 60.00*+

Alessandro Arzilli 2 days ago

On Fri, Nov 08, 2019 at 03:08:51PM -0000, ~db47h wrote:

My theory is that Xorg is compositing with v-sync, so adding v-sync to our drawing leads to an extra frame (or half a frame?) of extra latency.

That makes sense and it's happening with gnome-shell/mutter as well, only more sporadically. If you draw a large rotating square, you'll see frame drops every once in a while. Go fullscreen and it's gone.

That's partly why I started with a timer based event loop w/o vsync, but then again, that's second guessing what the frame rate should be.

Anyhow I don't think Gio is the only framework impacted by this on Cinnamon.

Maybe I should reitarate that the problem also happens on an uncomposited openbox.

~db47h 2 days ago

The major difference between Gio and Shiny is that Gio uses Xlib's C API while Shiny uses a pure Go implementation of the newer XCB protocol (https://github.com/BurntSushi/xgb and https://github.com/BurntSushi/xgbutil). XCB has much better multithreading and lower latency overall (even for pure C applications) and to top it off, the Go implementation doesn't have a single bit of C code. That's very likely why you observe less lag in Shiny.

After digging a bit more on the vsync side of things it appears that ~eliasnaur's suggestion that the compositor and the app are somehow fighting over vsync is correct. Mainstream compositing window managers like Gnome-Shell and KWin support frame synchronization via the NETWMSYNCREQUEST and the very undocumented NETWMFRAMEDRAWN EWMH hints. I intend to try and add support for these as this should help reduce lag on these WMs (i.e. most linux users? there's hoping ;)

On Cinnamon, I don't know if that will help. There's a bug report (https://github.com/linuxmint/cinnamon/issues/8665) reporting that NETWMSYNCREQUEST is not implemented, I however doubt it since Cinnamon's WM (Muffin) is a fork of Mutter which supports it (or it's a very old fork?). A quick way to test it is to run the following command:

xprop -root | grep ^_NET_SUPPORTED

As for OpenBox, that won't help at all because it's not compositing (it supports NETWMSYNCREQUEST though, but on its own it only helps reduce flickering while resizing). And I really can't figure out what's happening there, unless you're running Compton.

The sleep time in the x11-vsync-hacks branch should be less than 16ms. Anything more and you end up skipping frames.

Alessandro Arzilli a day ago

On Mon, Nov 11, 2019 at 05:57:00PM -0000, ~db47h wrote:

On Cinnamon, I don't know if that will help. There's a bug report (https://github.com/linuxmint/cinnamon/issues/8665) reporting that NETWMSYNCREQUEST is not implemented, I however doubt it since Cinnamon's WM (Muffin) is a fork of Mutter which supports it (or it's a very old fork?). A quick way to test it is to run the following command:

xprop -root | grep ^_NET_SUPPORTED

it is indeed not supported:

NETSUPPORTED(ATOM) = NETWM_NAME, NETCLOSE_WINDOW, NETWM_STATE, NETWMSTATESHADED, NETWMSTATEMAXIMIZED_HORZ, NETWMSTATEMAXIMIZED_VERT, NETWMSTATETILED, NETWM_DESKTOP, NETNUMBEROFDESKTOPS, NETCURRENT_DESKTOP, NETWMWINDOWTYPE, NETWMWINDOWTYPE_DESKTOP, NETWMWINDOWTYPE_DOCK, NETWMWINDOWTYPE_TOOLBAR, NETWMWINDOWTYPE_MENU, NETWMWINDOWTYPE_UTILITY, NETWMWINDOWTYPE_SPLASH, NETWMWINDOWTYPE_DIALOG, NETWMWINDOWTYPEDROPDOWNMENU, NETWMWINDOWTYPEPOPUPMENU, NETWMWINDOWTYPE_TOOLTIP, NETWMWINDOWTYPE_NOTIFICATION, NETWMWINDOWTYPE_COMBO, NETWMWINDOWTYPE_DND, NETWMWINDOWTYPE_NORMAL, NETWMSTATEMODAL, NETCLIENT_LIST, NETCLIENTLISTSTACKING, NETWMSTATESKIP_TASKBAR, NETWMSTATESKIP_PAGER, NETWMWINDOWTILE_INFO, NETWMICONNAME, NETWM_ICON, NETWMICONGEOMETRY, NETWM_MOVERESIZE, NETACTIVE_WINDOW, NETWM_STRUT, NETWMSTATEHIDDEN, NETWMSTATEFULLSCREEN, NETWM_PING, NETWM_PID, NETWORKAREA, NETSHOWING_DESKTOP, NETDESKTOP_LAYOUT, NETDESKTOP_NAMES, NETWMALLOWEDACTIONS, NETWMACTIONMOVE, NETWMACTIONRESIZE, NETWMACTIONSHADE, NETWMACTIONSTICK, NETWMACTIONMAXIMIZE_HORZ, NETWMACTIONMAXIMIZE_VERT, NETWMACTIONCHANGE_DESKTOP, NETWMACTIONCLOSE, NETWMSTATEABOVE, NETWMSTATEBELOW, NETSTARTUP_ID, NETWMSTRUTPARTIAL, NETWMACTIONFULLSCREEN, NETWMACTIONMINIMIZE, NETFRAME_EXTENTS, GTKSHOWWINDOWMENU, NETREQUESTFRAMEEXTENTS, NETWMUSERTIME, NETWMSTATEDEMANDS_ATTENTION, NETMOVERESIZE_WINDOW, NETDESKTOP_GEOMETRY, NETDESKTOP_VIEWPORT, NETWMUSERTIME_WINDOW, NETWMACTIONABOVE, NETWMACTIONBELOW, NETWMSTATESTICKY, NETWMFULLSCREENMONITORS, NETWMSTATEFOCUSED, NETWMBYPASSCOMPOSITOR, NETWMFRAMEDRAWN, NETWMFRAMETIMINGS, NETWMXAPPICON_NAME, NETWMXAPPPROGRESS, NETWMXAPPPROGRESS_PULSE, GTKFRAME_EXTENTS, GTKSHOWWINDOWMENU

I imagine they forked before it was added...

As for OpenBox, that won't help at all because it's not compositing (it supports NETWMSYNCREQUEST though, but on its own it only helps reduce flickering while resizing). And I really can't figure out what's happening there, unless you're running Compton.

No compton, or any other compositors as far as I can tell.

The sleep time in the x11-vsync-hacks branch should be less than 16ms. Anything more and you end up skipping frames.

View on the web: https://todo.sr.ht/~eliasnaur/gio/63#comment-4708

Alessandro Arzilli referenced this from #63 a day ago

~db47h a day ago

Thanks for checking Alessandro. What's interesting is that Cinnamon supports _NET_WM_FRAME_DRAWN and _NET_WM_FRAME_TIMINGS! The first one is what we need to sync drawing with the WM. So I read the docs again and it turns out that it's not mandatory for the WM to advertise _NET_WM_SYNC_REQUEST, only _NET_WM_FRAME_DRAWN. There's still hope for Cinnamon :)

OpenBox... I did test on OpenBox 3.6.1 on Ubuntu 18.04, nvidia drivers and I get a solid ~16.5 ms per frame, with the occasional glitch (short enough that it's impossible to read the actual frame time). There's definitely input lag, but not worse than with Gnome Shell.

I wonder what kind of frame rate glxgears reports (I think it's in the mesa-demos package on Arch). You might want to try both windowed and fullscreen.

~eliasnaur, you said you can reproduce this, is it with an Intel GPU as well?

~eliasnaur 22 hours ago

~eliasnaur, you said you can reproduce this, is it with an Intel GPU as well?

Yes.

~kaey 21 hours ago

I also reproduce this problem on amdgpu and i3wm with gio v0.0.0-20191111125045-2e0406802b84 tot and draw times are stable 16ms without drops. If i export vblank_mode=0 draw times drop to 100us and lag is gone, but tearing is introduced.

As a side note, (almost) every shooter game suffers from input lag with vsync enabled, general solution is to disable vsync and set fps limit to 120.

~db47h 19 hours ago

Thanks for your input ~kaey. Fullscreen applications (like shooters) have the easy option to bypass the compositor by setting a single window property, thus reducing lag even with vsync, which can be further reduced by using XCB where available (everywhere these days?) instead of xlib. Windowed applications are another story, and unfortunately disabling vsync in Gio is not an option (think CPU usage & battery drain on laptops).

~db47h 19 hours ago

BTW, on a 60 Hz display, draw times of about 16ms (1/60s) are what we are looking for. Anything less and we're drawing more frames than the display can actually render (and wasting energy), anything higher and we're skipping frames (jitter).

~eliasnaur 18 hours ago

On Tue Nov 12, 2019 at 11:08 PM ~db47h wrote:

BTW, on a 60 Hz display, draw times of about 16ms (1/60s) are what we are looking for. Anything less and we're drawing more frames than the display can actually render (and wasting energy), anything higher and we're skipping frames (jitter).

Actually, we'd like draw to be as close to 0 as possible, with v-sync enabled. Spending 16ms on a frame that takes, say, 1 ms to draw is wasting 15ms in eglSwapBuffers that could have been spent processing additional input, lowering input latency.

That's what the frame callback is for in Wayland, and the hacky time.Sleep I did as an experiment: delay the start of the frame as long as possible to catch late input.

That said, we're probably not going to get better than 16ms on X. What I'm worried about is the > 20ms erratic frame times that Alessandro and I experienced.

-- elias

~kaey 14 hours ago

Fullscreen applications (like shooters) have the easy option to bypass the compositor

i3 is not a compositor so it's not a problem.

thus reducing lag even with vsync, which can be further reduced by using XCB where available

Vsync is problematic for games on windows as well it isn't xorg specific problem. I don't know any games that run on wayland AND use frame callback described by Elias so can't test there. Gio is not for games of course so some lag is fine, but I just tried moving box in firefox on https://draw.io and lag is comparable to gio on sway. It may be worth examining what FF is doing (with opengl debugger maybe).

disabling vsync in Gio is not an option (think CPU usage & battery drain on laptops).

vsync is about tearing, fps limit is about cpu usage. People use 240hz monitors, is it worth running gio animations at 240 fps? Fps limit should be present irregardless of vsync setting.

~eliasnaur 10 hours ago

On Wed Nov 13, 2019 at 4:22 AM ~kaey wrote:

Gio is not for games of course so some lag is fine, but I just tried moving box in firefox on https://draw.io and lag is comparable to gio on sway. It may be worth examining what FF is doing (with opengl debugger maybe).

The existence of this issue indicates input latency is important outside games.

disabling vsync in Gio is not an option (think CPU usage & battery drain on laptops).

vsync is about tearing, fps limit is about cpu usage. People use 240hz monitors, is it worth running gio animations at 240 fps? Fps limit should be present irregardless of vsync setting.

If I have a 240Hz monitor, I'd like Gio to draw at 240Hz. What we don't want is drawing frames that end up discarded because v-sync is off and we're drawing faster than the monitor can display.

Alessandro Arzilli 5 hours ago

On Tue, Nov 12, 2019 at 05:16:14PM -0000, ~db47h wrote:

Thanks for checking Alessandro. What's interesting is that Cinnamon supports _NET_WM_FRAME_DRAWN and _NET_WM_FRAME_TIMINGS! The first one is what we need to sync drawing with the WM. So I read the docs again and it turns out that it's not mandatory for the WM to advertise _NET_WM_SYNC_REQUEST, only _NET_WM_FRAME_DRAWN. There's still hope for Cinnamon :)

OpenBox... I did test on OpenBox 3.6.1 on Ubuntu 18.04, nvidia drivers and I get a solid ~16.5 ms per frame, with the occasional glitch (short enough that it's impossible to read the actual frame time). There's definitely input lag, but not worse than with Gnome Shell.

I wonder what kind of frame rate glxgears reports (I think it's in the mesa-demos package on Arch). You might want to try both windowed and fullscreen.

It says it's running synchronized to the vertical refresh and reporting almost exactly 60fps. This happens in window and fullscreen mode either with openbox or cinnamon.

Alessandro Arzilli 5 hours ago

On Wed, Nov 13, 2019 at 07:43:54AM -0000, ~eliasnaur wrote:

If I have a 240Hz monitor, I'd like Gio to draw at 240Hz.

I think this is very dependent on the kind of application but I think that for many GUI applications drawing 240 times per second just because the user is moving the mouse over it is a big waste of resources.

What we don't want is drawing frames that end up discarded because v-sync is off and we're drawing faster than the monitor can display.

~eliasnaur 4 hours ago

On Wed Nov 13, 2019 at 12:27 PM Alessandro Arzilli wrote:

On Tue, Nov 12, 2019 at 05:16:14PM -0000, ~db47h wrote:

Thanks for checking Alessandro. What's interesting is that Cinnamon supports _NET_WM_FRAME_DRAWN and _NET_WM_FRAME_TIMINGS! The first one is what we need to sync drawing with the WM. So I read the docs again and it turns out that it's not mandatory for the WM to advertise

_NET_WM_SYNC_REQUEST, only _NET_WM_FRAME_DRAWN. There's still hope

for Cinnamon :)

OpenBox... I did test on OpenBox 3.6.1 on Ubuntu 18.04, nvidia drivers and I get a solid ~16.5 ms per frame, with the occasional glitch (short enough that it's impossible to read the actual frame time). There's definitely input lag, but not worse than with Gnome Shell.

I wonder what kind of frame rate glxgears reports (I think it's in the mesa-demos package on Arch). You might want to try both windowed and fullscreen.

It says it's running synchronized to the vertical refresh and reporting almost exactly 60fps. This happens in window and fullscreen mode either with openbox or cinnamon.

Interesting. If it's not too much trouble, can you try the EGL variant to rule out glx/egl differences? I found

https://gist.github.com/tmpvar/445146

but I haven't tried it myself.

~eliasnaur 4 hours ago

On Wed Nov 13, 2019 at 12:40 PM Alessandro Arzilli wrote:

On Wed, Nov 13, 2019 at 07:43:54AM -0000, ~eliasnaur wrote:

If I have a 240Hz monitor, I'd like Gio to draw at 240Hz.

I think this is very dependent on the kind of application but I think that for many GUI applications drawing 240 times per second just because the user is moving the mouse over it is a big waste of resources.

What we don't want is drawing frames that end up discarded because v-sync is off and we're drawing faster than the monitor can display.

What else is a 240 Hz monitor good for, if not reduced input latency? Of course, 240 Hz is the limit: a clever program could only draw at, say, 60 Hz but switch to the maximum 240 Hz when input arrives.

Alessandro Arzilli 4 hours ago

On Wed, Nov 13, 2019 at 01:50:06PM -0000, ~eliasnaur wrote:

Interesting. If it's not too much trouble, can you try the EGL variant to rule out glx/egl differences? I found

https://gist.github.com/tmpvar/445146

but I haven't tried it myself.

Doesn't compile anymore, most of the functions it uses don't exist on the version of egl I have installed.

Alessandro Arzilli 4 hours ago

On Wed, Nov 13, 2019 at 01:51:53PM -0000, ~eliasnaur wrote:

What else is a 240 Hz monitor good for, if not reduced input latency?

Sure, but IMO for most GUI programs, past a certain point, it doesn't matter. To be honest I'm not even sure 240Hz matters for games.

~db47h 2 hours ago

As part of implementing _NET_WM_FRAME_DRAWN support, I've rewritten the X11 event handling loop to match Wayland's:

https://github.com/db47h/gio/commit/b5700276dcfd2a0268fe6698eef36692ae84c4a4

Since _NET_WM_FRAME_DRAWN support is not in yet, it feels laggier than the current implementation on a composited window manager (this is expected). As it is, it will however be the default on WMs not supporting _NET_WM_FRAME_DRAWN.

Could you guys give it a try and see how it affects all of you? VSync is still on by default, I'd suggest also disabling it and monitoring CPU usage while interacting with the app (in egl_x11.go, have x11Window.needVSync return false).

~kaey 10 minutes ago

Could you guys give it a try and see how it affects all of you?

This is without compositor. With vsync enabled lag is the same, but movement is choppier than on master branch. With vsync disabled result is the same as on master - no lag and redraws at 2k fps.

Register here or Log in to comment, or comment via email.