Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wait_event: Sdl.wait_event implements power saving since SDL 2.0.22 #15

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

edwintorok
Copy link
Contributor

See libsdl-org/SDL@0dd7024

The wiki has also been updated and doesn't mention the lack of power saving anymore.

Use Sdl.wait_event_timeout instead of Sdl.poll_event, and compute the timeout based on the next Timeout action, clamped to the [1, 1000] ms interval so that mouse at rest events can still be processed (needed for tooltips).

Always process check_mouse_atrest after waiting for an event, whether we received one or not:

  • mouse movements will generate mouse events and wake up Sdl.wait_event_timeout.
  • if we receive no events then eventually we'll generate the mouse_at_rest event. Could've used SDL timers, but they are not bound by Tsdl.

Keep the Thread.delay 0.01 on recursion to protect against very short timeouts or a failing Sdl.wait_event_timeout
(which internally falls back to polling if waiting is not supported by the platform).

Confirmed via 'strace' that 'boguex 12' is idle now, waking up only once a second, and tooltips in 'boguex 44' still work.
(unfortunately 'boguex 44' spawns a 2nd thread which enables a ~50ms timer in the OCaml runtime to switch between them)

@edwintorok
Copy link
Contributor Author

This could potentially be further optimized, so that we wait for the '1s' only once, then emit the mouse at rest event, and then don't wake up until we either have a scheduled timeout or we get a mouse moved event.

@edwintorok
Copy link
Contributor Author

I've implemented the optimization too, now 'boguex 44' sleeps properly, only waking up once after 1s to process the tooltip event and then sleeping indefinitely until next event.

@edwintorok
Copy link
Contributor Author

The opensuse CI failure is from tsdl, not this library, and appears to be due to a very old version of SDL2 in OpenSUSE Leap (2.0.8) which is missing SDL_DISPLAYEVENT:
https://ci.ocamllabs.io/github/sanette/bogue/commit/c05d874d5bc217625abdf1fe8c73dc8acd0a8986/variant/opensuse-15.3-5.0_opam-2.1

@sanette
Copy link
Owner

sanette commented Feb 5, 2023

thank you very much! I am currently quite busy and won't be able to review this probably before next week.
Just a first comment: my current ubuntu 22.04 ships with sdl 2.0.20, so doesn't include the Sdl.wait_event fix; it might be useful to check the Sdl version in the code. IIRC the old Sdl.wait_event was really bad, preventing other threads to execute correctly.

@edwintorok
Copy link
Contributor Author

Yeah we could fall back to polling on older SDL versions.

@sanette
Copy link
Owner

sanette commented Feb 9, 2023

I think Id' be happy to merge this PR but could you please remove the part concerning dpi scale? It would be clearer to keep both problems separated

@edwintorok
Copy link
Contributor Author

edwintorok commented Feb 9, 2023

Thanks for the review, I removed the DPI change (sorry it leaked from the other branch) and added the SDL version check as discussed (looking closer the CPU friendly wait_event_timeout got introduced in SDL 2.0.16, so I used that as a minimum version to enable wait_event_timeout, otherwise fall back to polling and the old wait extra 0.01s between polls to be nicer to the CPU.

But actually your Ubuntu 22.04 should have a new enough SDL then (2.0.20 > 2.0.16) for this to work, but the version check is useful nonetheless (I do the version check once on startup).

@sanette
Copy link
Owner

sanette commented Feb 11, 2023

I like the use of Timeout, but we have a little problem:

CPU usage when idle CPU usage when moving mouse
Previous Bogue 1.3% 3.3%
New PR 0% 7 to 10%

the idle column is great, but moving mouse becomes expensive. Maybe that's because Timeouts use Mutex, and we are constantly creating many of them.

@edwintorok
Copy link
Contributor Author

maybe we get too many mouse events, there are various discussions and improvements on the SDL bug tracker, especially if you have a mouse that sends events at 1000Hz: kometbomb/klystrack#299
From my measurements we get ~100 mouse events/s (which is higher than our desired frame rate), I'll do some more investigation on where time is spent with flamegraphs.

'(lint-fmt)(failed: ocamlformat enabled but no .ocamlformat file!)'

ocamlformat is only enabled because it is enabled by default by Dune.
This project doesn't seem to use ocamlformat yet.

Signed-off-by: Edwin Török <edwin@etorok.net>
tsdl.0.9.6 is last one that would work with it, but this package needs
tsdl.0.9.7+.

See also ocaml/opam-repository#23235 to adjust
the availability condition of 'tsdl'.

bogue.opam is autogenerated, so use bogue.opam.template to declare
additional field ('dune-project' doesn't know about 'available').

Signed-off-by: Edwin Török <edwin@etorok.net>
See libsdl-org/SDL@0dd7024

The wiki has also been updated and doesn't mention the lack of power
saving anymore.

Use `Sdl.wait_event_timeout` instead of `Sdl.poll_event`,
and compute the timeout based on the next `Timeout` action,
clamped to the `[1, 1000] ms` interval so that mouse at
rest events can still be processed (needed for tooltips).

Always process `check_mouse_atrest` after waiting for an event,
whether we received one or not:
* mouse movements will generate mouse events and wake up `Sdl.wait_event_timeout`.
* if we receive no events then eventually we'll generate the
  mouse_at_rest event. Could've used SDL timers, but they are not bound
  by Tsdl.

Replace Thread.delay 0.01 with the Time.make_fps mechanism
to protect against very short timeouts or a failing `Sdl.wait_event_timeout`
(which internally falls back to polling if waiting is not supported by
the platform).

To rate limit events being produced too quickly (e.g. by a 1000Hz
mouse), add the call to 'fps' before calling wait_even_timeout.
As long as events are not produced too quickly (100Hz limit to match old
sleep duration) there will be no additional delays for event processing:
Bogue will wake up exactly when an event is received and will process it
immediately, only inserting the sleep when it runs out of events.

Confirmed via 'strace' that 'boguex 12' is idle now, waking up only once a
second, and tooltips in 'boguex 44' still work.
(unfortunately 'boguex 44' spawns a 2nd thread which enables a ~50ms timer in
the OCaml runtime to switch between them)

Signed-off-by: Edwin Török <edwin@etorok.net>
This avoids 'wait_event' having to wake up every second to check mouse
position:
* when the mouse moves it'll send and SDL event, and
  'check_mouse_at_rest' will get called, if position is different the
  timeout is canceled and a new 1s idle timer is started
* when the mouse is idle for 1 second then the timeout triggers, and
  pushes a mouse at rest event once. If no more events arrive the
  application will sleep until another event arrives, and won't wake up
  once a second anymore (there is no need: the mouse at rest event got
  delivered)

Signed-off-by: Edwin Török <edwin@etorok.net>
Avoids spawning separate thread for mouse enter/leave,
this allows 'boguex 44' to avoid waking up frequently due to the OCaml
timer which would be active when >1 thread is active in the program.

Signed-off-by: Edwin Török <edwin@etorok.net>
A timer action may add more timers, some of which may be in the past.
Ensure we do not return negative values here.

Signed-off-by: Edwin Török <edwin@etorok.net>
libsdl-org/SDL@0dd7024
is part of SDL 2.0.16.

Older versions are not very nice to the CPU, so fall back polling and
sleeping 0.01 inbetween.

Signed-off-by: Edwin Török <edwin@etorok.net>
Sometimes SDL can receive a lot of events (e.g. 1000Hz mouse),
and we want to limit the frame-rate when not animating.

Previously the limit would've been ~10ms between Sdl.poll_event + 5ms
from Thread.delay ~66 FPS.

When the events have no effect on the display (nothing is rendered,
and in fact Sdl.render_present will be skipped too) limit the processing
rate to 60 Hz to match typical monitor refresh rate.

When we do receive some useful events that change the display then use
just the old 5ms Thread sleep as a throttling mechanism, but do it after
we've rendered the frame to reduce input-to-display latency.

Signed-off-by: Edwin Török <edwin@etorok.net>
Signed-off-by: Edwin Török <edwin@etorok.net>
Time.adaptive_fps also has optional vsync support, off by default for
backwards compatibility.

If we cannot keep up with requested FPS then try adaptive VSync if
supported by the platform, otherwise turn vsync off.

This will require a new version of Tsdl that releases the runtime lock
[not yet released].

Signed-off-by: Edwin Török <edwin@etorok.net>
@edwintorok
Copy link
Contributor Author

edwintorok commented Feb 12, 2023

I've reworked the approach here, there were basically 2 things missing:

  • previously there was almost always a 0.01s sleep between 'Sdl.poll_event' which effectively limited the mouse motion event rate to 100Hz (once all events got consumed).
    However some devices (e.g. mouse) can generate motion events at 1000 Hz (mine only does 125Hz though, but perhaps your is different?). https://cpstesters.com/mouse-rate-checker/ can be used to check (or other tools like 'evhz'). I used 'xdotool' under 'X11' to generate lots of mouse events: while xdotool $(echo 'search BOGUE'; for i in $(seq 1 10000); do echo 'mousemove --window %1 10 10'; echo 'mousemove --window %1 100 100'; done|xargs echo); do echo -n; done

  • some of these events were complete no operations: 'is_fresh' was still saying 'true'. So add an additional FPS limiter (to 60 FPS, although we could potentially go even lower) when we get these nop events (mouse motion that has no effect on rendered area). If an application does actually draw something based on mouse motion then of course this won't kick in

  • vsync was not used at all, and in fact using vsync in its current form is not advisable, it will need a newer 'Tsdl' that releases the runtime lock on 'Sdl.render_present'. See Sdl: better support for vsync (release runtime lock during Sdl.render_present) dbuenzli/tsdl#90. I've included it in this PR for discussion/testing purposes for now, but I'll likely remove it and put it in a separate PR. However when vsync is used then we don't need to do the delay in the application (best let the OS time it so we don't miss the next vsync), except for the 5ms safety delay that protects against OCaml code overloading the process/the system.

  • I've also added a function to query SDL for the refresh rate. 60 Hz is all well and good for my monitors, but there are 144Hz monitors too, and that is not an integer multiple of 60.

I'm hoping that these changes would eventually restore the original CPU usage on your platform, but let me do some further testing on my machines first (I've just pushed a commit so you can have a look meanwhile if you want).

Thanks for bearing with me on this PR.

@edwintorok edwintorok marked this pull request as draft February 12, 2023 21:54
@@ -31,7 +31,7 @@ depends: [
"tsdl-image" {>= "0.3.0"}
"tsdl-ttf" {>= "0.3"}
"ocaml" {>= "4.08.0"}
"tsdl" {>= "0.9.7" & < "0.9.9"}
"tsdl" {> "0.9.9"}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you mean >= 0.9.9

@sanette
Copy link
Owner

sanette commented Feb 16, 2023

until now I have refrained from using a more recent version of tsdl because they implicitly require a recent version of SDL2. And last time I checked, this meant than many OS distributions were not able to build, because their default SDL was not that recent.

@edwintorok
Copy link
Contributor Author

I can understand the tension between wanting to support old systems and taking advantage of bugfixes/new features in later versions of SDL, both are desirable.
I'll try to see whether it is possible to use conditional compilation to still support old versions of 'tsdl', and expose (or automatically use) new features when new versions are available (e.g. we cannot use 'vsync' with old versions of 'tsdl' because they do not release the runtime lock on Sdl.render_present).

Conditional compilation is usually problematic with editor support, I'll try to find something that fits within regular OCaml syntax ('ppx_optcomp' seems to use just regular OCaml extension points, so even an editor without special support for it would work and it requires both sides of the conditional to be otherwise valid OCaml, which is good. It only has builtin support for conditionals based on ocaml version, but custom variables can be defined in imported files, so it might be possible to do something here with dune). Or dune's 'select...from' might be an alternative, but that is for whole files.
In the .mli we could then add '@SInCE' annotations to say which feature requires what minimum version of tsdl.
I'll do some experiments and propose something in a separate PR, and then remove the vsync from this one.

@edwintorok
Copy link
Contributor Author

Here is a draft example of how conditional compilation could work (note that this compares versions as strings, which is not ideal, but with a few more dune rules it should be possible to split that into tsdl_major/minor/patch):

==> dune <==
(executable
  (name vtest)
  (libraries tsdl)
  (preprocess (pps ppx_optcomp -- -cookie "ppx_optcomp.env=env ~tsdl_version:(Defined \"%{version:tsdl}\")"))
)

==> dune-project <==
(lang dune 2.9)

==> vtest.ml <==
let () = print_endline ""
[%%if show(tsdl_version) >= "v0.9.9"]
let () = print_endline "new TSDL"
[%%else]
let () = print_endline "old TSDL"
[%%endif]

let () =
    let major,minor,patch = Sdl.get_version () in
    Printf.printf "SDL version: %d.%d.%d\n" major minor patch

@sanette
Copy link
Owner

sanette commented Aug 5, 2023

I merged your PR into the "power_saving" branch. It works quite nicely! However I did have an issue:
when running boguex 34 35 36 37 then examples 34 35 36 work fine, but at the turn of example 37, there is a very large pause (several seconds) before painting the window. This is reproducible: I was able to get the logs with

BOGUE_DEBUG=true BOGUE_LOG_TO_FILE=true boguex 34 35 36 37

and here is a section that seems relevant:
(see the line Graphics: FPS:100 (round_trip=9223372036854758799))

[129][0] :	 Board: draw box
[129][0] :	 Graphics: clear=true
[129][0] :	 Graphics: flip layout #2522
[129][0] :	 Graphics: Clear canvas
[129][0] :	 Graphics: Render layers
[129][0] :	 Graphics: Present
[131][0] :	 Graphics: ==> Rendering took 2 ms
[131][0] :	 Graphics: ---------- end of loop -----------
[136][0] :	 Board: Cancelling Timeout 150
[136][0] :	 Board: Adding timeout 151
[136][0] :	 Graphics: FPS:100 (round_trip=9223372036854758799)

[17156][0] :	 Event: Event loop 0
[17156][0] :	 Event: 1==> Filtering event type: 0x200 (window_event)
[17156][0] :	 Event: 2==> Treating event type: 0x200 (window_event)
[17156][0] :	 Event: Window event [3]
[17156][0] :	 Event: Exposed

@@ -878,10 +878,11 @@ let event_loop anim new_anim board =
let e = !Trigger.my_event in
continue e 0

let nop_event_fps = Time.make_fps ()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function should be renewed at each run, otherwise the start hidden variable becomes meaningless


let no_timeout () = -1

let poll_noevent_fps = B_time.make_fps ()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function should be renewed at each run, otherwise the start hidden variable becomes meaningless

@@ -1291,6 +1302,8 @@ let init ?window ?(name="BOGUE Window") ?fill ?x ?y ~w ~h () =
end;

printd debug_graphics "Canvas created";
if not (Theme.get_bool "NO_VSYNC") then
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one should add the default value of NO_VSYNC to Theme.default_vars

@@ -985,6 +991,17 @@ let create ?shortcuts ?(connections = []) ?on_user_event windows =
mouse_alive = false;
on_user_event }

let get_monitor_refresh_rate board =
Option.bind Layout.(window_opt board.windows_house) @@ fun win ->
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

window_opt board.windows_house will always return None because it is not associated to any physical window.
The problem is that SDL (and bogue) allows to work with several windows. A workaround would be to select the first element of the board.windows

@edwintorok
Copy link
Contributor Author

This is reproducible: I was able to get the logs with

BOGUE_DEBUG=true BOGUE_LOG_TO_FILE=true boguex 34 35 36 37

and here is a section that seems relevant: (see the line Graphics: FPS:100 (round_trip=9223372036854758799))

Yes, I can reproduce that large number (that looks like an overflow of a negative number), but only with those 4 tests in sequence, removing e.g. '36' doesn't reproduce it anymore, and neither does '36 37' on its own.

A workaround is to guard against negative time, and this avoids the excessively large 'round_trip' and the unnecessarily large 'wait':

diff --git a/lib/b_time.ml b/lib/b_time.ml
index 1a57c268e4..3c91048a1d 100644
--- a/lib/b_time.ml
+++ b/lib/b_time.ml
@@ -46,9 +46,9 @@ let make_fps ?(min_delay=5) () =
   fun fps ->
     if !start = 0 then (delay min_delay; start := now ())
     else
-      let round_trip = now () - !start in begin
+      let round_trip = max 0 (now () - !start) in begin
         let wait = max min_delay ((1000 / fps) - round_trip) in
-        printd debug_graphics "FPS:%u (round_trip=%u)\n" (1000 / (round_trip + wait)) round_trip;
+        printd debug_graphics "FPS:%u (round_trip=%d, wait = %d)\n" (1000 / (round_trip + wait)) round_trip wait;
         if wait > 0 then
           delay wait;
         start := now ();

Although there is probably a more fundamental bug here (race condition if we use multiple threads, or somehow resetting SDL's internal clock?). Time should never go backwards (in the past I've seen this due to unsynchronized TSC, but that shouldn't happen with modern CPUs anymore, and if it was that then it wouldn't be this reliably reproducible in the exact same spot every time). Somehow resetting SDL's internal clock sounds more plausible....

@edwintorok
Copy link
Contributor Author

edwintorok commented Aug 5, 2023

Although there is probably a more fundamental bug here (race condition if we use multiple threads, or somehow resetting SDL's internal clock?). Time should never go backwards (in the past I've seen this due to unsynchronized TSC, but that shouldn't happen with modern CPUs anymore, and if it was that then it wouldn't be this reliably reproducible in the exact same spot every time). Somehow resetting SDL's internal clock sounds more plausible....

Calling so many examples causes Sdl.quit and Sdl.init to be called multiple times, also '36' is different from the others in that it initialises just the audio subsys but then quits everything (instead of just audio).

I pushed some commits to this branch which seems to fix the underlying problem too: 'start' needs to be reinitialized whenever SDL is reinitialized https://github.com/edwintorok/bogue/commits/fixwrap. I'm not entirely happy with relying on 'at_cleanup' to do this, it should instead be done in the (re)initialization code of bogue. I'll look into doing it more nicely later, meanwhile can you try whether the commits from my branch fix the delay on your system too?

@sanette
Copy link
Owner

sanette commented Aug 6, 2023

Although there is probably a more fundamental bug here

the culprit is probably simply poll_noevent_fps which should be re-inititalized at each run.
To fix this, instead of at_cleanup one could use the same strategy as adaptive_fps: return a "start" function,
and call this function in Main.run

@sanette
Copy link
Owner

sanette commented Aug 6, 2023

see
896b3fa
this seems to work for me

@@ -50,3 +50,4 @@ build: [
]
]
dev-repo: "git+https://github.com/sanette/bogue.git"
available: [ (os-distribution != "opensuse-leap" | os-version >= 16) ]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is good for avoiding CI errors, but what if an OpenSuse user decides to manually install a recent version of SDL2?

@sanette
Copy link
Owner

sanette commented Jan 1, 2024

I'll do some experiments and propose something in a separate PR, and then remove the vsync from this one.

I moved this to the 'vsync' branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants