-
Notifications
You must be signed in to change notification settings - Fork 869
Various of fixes for the HeadlessMode and more #8011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
992b4c8
to
d26c399
Compare
It seems my SDL event processing fix needs some massagging for SDL1. I'm looking into the SDL1.2 |
Probably that there is no actual limitation as it's just a 32-bit integer in the end of the day, and we don't use Let's reduce the number of custom events by putting the actual event identifier into |
Right, that what I thought should be used for sdl 1.2, but there is no such code in DevilutionX, thus my confusion. I'll try to make a few tests in order to prove / disprove that events with types >= 32 can be missed on the sdl 1.2 before going deeper in these changes. |
Oh, it's even worse, than just missing events. C17 standard (I assume same for all others, including CPP) says the following for bit shift: 6.5.7 Bitwise shift operators If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined. And here we are "lucky" that at least on x86 Also, Intel's manual[1] states that the results are undefined when cnt is greater than the operand size, but at least for 32- and 64-bit data sizes it has been observed that shift operations are performed by (cnt mod n), with n being the data size. So on x86 regardless of bits count the value is wrapped:
But on other platforms this is undefined and if 0 is returned (as I expected from the very beginning), then we get missing events on SDL 1.2 |
It might help explain this. DevilutionX/Source/interfac.cpp Lines 43 to 48 in e57a2d7
We should just fix it as glebm described, and then I can test on 3DS. |
Great there is an issue which can be explained by this. I was scratching my head, thinking it couldn't go unnoticed.
Awesome. I will rebase then on top of your PR. |
d26c399
to
66921e0
Compare
@StephenCWills @glebm thanks folks for helping me resolve the SDL1.2 issue. Now with only one type of custom event Am I correct that there is no full |
There isn't one but it'd be good to add it to the existing SDL1 Linux workflow |
Hey folks, just a quick reminder to review this PR when you get a chance! Thanks. |
Anything else to address from my side or this can be merged? |
Looks good to me, just waiting for another maintainer to also have a look and merge! /cc @StephenCWills @AJenbo |
Thanks @glebm for reviewing that. |
Source/interfac.cpp
Outdated
// We call `CustonEventHandler.poll` directly instead of | ||
// calling the `FetchMessage` to process real events rather | ||
// than the recorded ones in demo mode. | ||
while (CurrentEventHandler.poll(&event)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you have a look at #7712? It appears to me that you've replaced every call to PollEvent()
with SDL_PollEvent()
, which I expect will reintroduce this bug for a second time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I accidentally replaced PollEvent()
here and in FetchMessage_Real()
with SDL_PollEvent
or SDL_PeepEvent()
. I missed that PollEvent()
also handles controller states and motion by calling:
UnlockControllerState(*event);
ProcessControllerMotion(*event);
Good catch, thanks.
I updated the code by splitting the commit in question on two for simpler review:
- The first one splits the
EventHandler
into the actual handler and poller callback. This commit should not change the logic of event processing and is just preparation for the next commit. ThePollEvent()
calls frominput.h
should be kept as they are. - The second commit introduces
SDL_PeepEvent()
instead ofSDL_PollEvent()
for all the message handlers which are called from theShowProgress()
routine. This is actually a fix for the problem described.
The main idea is that ShowProgress()
handles only custom events and SDL_QUIT
, leaving other events in the queue for future processing. PollEvent()
is replaced with PollEventCustom()
, which performs the same input processing but uses the SDL_PeepEvent()
call for polling.
Please, take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to leave events in the queue rather than process and ignore most of them? Wouldn't it be odd if I pressed some keys in the loading screen and suddenly they got processed all at once once I get in-game?
Perhaps we only want to leave certain types of events in the queue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I encountered a nasty problem of lost keystroke events when a hero warps to another location (I assume there are other, not only warping). In my case these were save/load/pause keystrokes, which were dropped while being processed in the ShowProgress() routine.
Unfortunately once the event injected to the queue, I don't have a clue from outside, was a keystroke dropped or not yet processed. This breaks a lot of things in the training procedure and lead to an agent hang waiting for the key to be completed.
Wouldn't it be odd if I pressed some keys in the loading screen and suddenly they got processed all at once once I get in-game?
Well, this is of course arguable, but this is the behavior I expect. To be frank, I do not see this would really break the human experience - the game load time is not as it was in 90th - almost instance, I doubt anyone will really notice anything unusual if any prior event executes after a hero warps to another level. But a lost key breaks the automation, that's for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps automation should not be sending keystrokes during loading?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once any SDL event lands in the queue between steps 2 and 3, it will be lost. This logic is demonstrated by my snippet.
You're right that human input is asynchronous and delayed at many stages from perception to motor action to event dispatch. But regardless of a delay along the way, the final destination of the event is a queue, and what matters is the actual moment when an event added to the queue.
No, I maintain that the moment when an event is queued does not matter. Rather, what I mean is, it is completely irrelevant to the human experience. The queue merely dictates the order in which those events are processed, nothing more. This ordering of events does not promise anything about the order in which external actions actually occurred except that it is accurate "within reason", mostly defined by how accurately a human can perceive the ordering of the physical actions that trigger those events.
To illustrate what I mean, consider that what we are talking about is a race condition between a key press and an in-game event. The former is triggered by physical action, and the moment it occurs is perceived by the user based on the tactile and auditory feedback they get from pressing the key on the keyboard. The latter is subject to the game's frame rate and CPU speed, and the moment that it occurs is perceived by the user through processing the visual signals coming from the display. These physical processes are completely independent, and the "true order in which they occurred" cannot be properly perceived by a human. That is why the human player can't tell the difference between a key press that gets queued between 2 and 3 and a keypress that gets queued after 3. Whether we change the behavior of the logic that processes events from the queue or not, the human experience does not change.
So the problem does affect human players, just not consistently or predictably, which arguably makes it worse, since these bugs are the hardest to diagnose. That's why I believe it's worth addressing.
I don't know what bugs you're talking about. Processing events in the order in which they are queued is not a bug. That's how the event queue is supposed to work.
My agent experienced the problem with lost events and hanging in infinite wait. This PR ensures that no events are lost, and extensive RL training (I usually run up to a hundred instsances simultaneously on a machine, which of course increases the chance to hit an issue) does not reproduce an issue for months.
It is asynchronous, and the only guarantee needed is that no events are lost, which this PR addresses.
If it is asynchronous, then there can be no guarantee. Even if you change the cutoff point from 2 to 3, the bot has no guarantee that the event it sent before it observed 3 actually made it to the queue before the moment 3 actually occurred. It sounds to me like you'd be better off implementing a timeout based on feedback from the game so the bot can recognize, the same way a human player would, that the action it intended to execute wasn't executed.
I'm not sure I follow...
What I was saying was that the logic you are using to determine whether the key press should or should not be processed is not generally applicable to all events. Ignore whether I think that the behavior regarding the save action or the ESC key is actually a bug. Whether you are able to show that there exists a bug with a particular input event does not and cannot prove that there exists a problem for all input events. There is no clear conceptual problem that requires us to process any input event in a different order from the order in which it is queued. All events are equal in the eyes of the event queue.
If you run this on the latest master, you should see something like the following https://youtu.be/90Q2Txev9c0. The player warps and continues moving unless another mouse button is pressed.
I don't suppose you've tried changing the order of those events? I ran some tests just now, and it doesn't seem to matter whether the SDL_RELEASED
event is processed before or after the custom event. Try running this code. The bug you've found has nothing to do with whether the engine "discards" the mouse up event.
--- a/Source/player.cpp
+++ b/Source/player.cpp
@@ -2860,12 +2862,34 @@ StartNewLvl(Player &player, interface_mode fom, int lvl)
if (&player == MyPlayer) {
player._pmode = PM_NEWLVL;
player._pInvincible = true;
+
SDL_Event event;
+
+ // Let's imagine the MOUSE BUTTON DOWN event lands in the
+ // queue right before the custom event was pushed.
+ event.type = SDL_MOUSEBUTTONDOWN;
+ event.button.x = MousePosition.x;
+ event.button.y = MousePosition.y;
+ event.button.button = SDL_BUTTON_LEFT;
+ event.button.state = SDL_PRESSED;
+ SDL_PushEvent(&event);
+
+ // and MOUSE BUTTON UP event lands in the queue also right before
+ // the custom event was pushed. This event will not be discarded
+ // by the `ShowProgress()` filter yet the player never stops
+ // moving until next MOUSE BUTTON sequence or when they reach their destination.
+ event.type = SDL_MOUSEBUTTONUP;
+ event.button.button = SDL_BUTTON_LEFT;
+ event.button.state = SDL_RELEASED;
+ SDL_PushEvent(&event);
+
CustomEventToSdlEvent(event, fom);
SDL_PushEvent(&event);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is why the human player can't tell the difference between a key press that gets queued between 2 and 3 and a keypress that gets queued after 3. Whether we change the behavior of the logic that processes events from the queue or not, the human experience does not change.
Absolutely agree - there's no visual difference for a human to spot regarding when an event is executed. But if a human presses ESC (or any other key action to your choice) expecting the game to show a menu and this doesn't happen, then that's an issue. Of course, we could speculate that the key was pressed almost simultaneously with the moment when a loading screen should show up, but in my opinion, this isn't proper behavior for an event queue. Lost events are a nightmare to debug and fix.
I don't know what bugs you're talking about.
Fair enough. It’s a weird edge case and not easy to catch. I guess it just comes down to different styles: some people are more reactive, others more proactive about fixing stuff.
Processing events in the order in which they are queued is not a bug. That's how the event queue is supposed to work.
Correct, the order is not an issue, but discarding events is.
If it is asynchronous, then there can be no guarantee. Even if you change the cutoff point from 2 to 3, the bot has no guarantee that the event it sent before it observed 3 actually made it to the queue before the moment 3 actually occurred. It sounds to me like you'd be better off implementing a timeout based on feedback from the game so the bot can recognize, the same way a human player would, that the action it intended to execute wasn't executed.
The agent does not concern itself with when an event is executed relative to the queue, this is out of the scope of this discussion. What the agent expects is that the event eventually reaches completion, regardless of event position in the queue. Unfortunately, with the current queue design, this is not guaranteed, thus this PR.
I don't suppose you've tried changing the order of those events?
No, I have not.
I ran some tests just now, and it doesn't seem to matter whether the SDL_RELEASED event is processed before or after the custom event. Try running this code.
Interesting. That's seems to me a regression, I was able to bisect to this one:
3e6b501 ("Fix pathfinding and increase player path limit")
Cc: @glebm
The bug you've found has nothing to do with whether the engine "discards" the mouse up event.
Ooops. But still, the bug I've found doesn't prove the opposite, that discard won't happen, right? :) And peeking into the queue to see whether the UP event exists or not won't be difficult.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A mouse DOWN button event in the queue before warp leads to MakePlrPath
being called after a warp with the following arguments (in my quick test): pos.future=(25,31), target=(82,44), where pos.future
belongs to the current dungeon (town), but the target
"leaked" from the dungeon that the player just warped from.
The commit 3e6b501 I mentioned in the previous comment increases the walk path, which is why it is much easier to reproduce a spontaneous walk after a warp (before the commit, 25 steps were not enough and a player would stay put), but the commit does not cause the described issue. Sorry, @glebm, for the noise.
To my limited understanding, there might need to be some logic which resets the walk destination variable after a warp. Anyway, I leave this to devilutionx experts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, we could speculate that the key was pressed almost simultaneously with the moment when a loading screen should show up, but in my opinion, this isn't proper behavior for an event queue. Lost events are a nightmare to debug and fix.
This is not speculation. That's literally the thing you're describing. The event has to be queued in the same "display frame" as the custom event is queued, otherwise they would be processed in separate iterations of the main loop. If they are processed in separate iterations of the main loop, then there is no choice; you have to process them in the order they appear in the queue. Even if they are not processed in separate iterations of the main loop, you may as well process them in the order they appear in the queue, because there is no difference as far as the engine is concerned.
Correct, the order is not an issue, but discarding events is.
The agent does not concern itself with when an event is executed relative to the queue, this is out of the scope of this discussion. What the agent expects is that the event eventually reaches completion, regardless of event position in the queue. Unfortunately, with the current queue design, this is not guaranteed, thus this PR.
Perhaps I haven't been clear enough about this. No events are actually "lost" or "discarded" in the process of transitioning to the loading screen. The mouse up event that you claim was lost is actually being processed by the ProgressEventHandler()
function, which invokes the DisableInputEventHandler()
function, which in turn sets sgbMouseDown = CLICK_NONE
. The only way in which it is "lost" is the fact that it doesn't get processed by the GameEventHandler()
function. I am arguing that this is proper because that handler executes in an entirely different context where a mouse up event might open a UI menu or something. It only stands to reason that a mouse up event would work differently on the loading screen, which doesn't have a UI menu to open.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me sum it all up so we can wrap this long discussion and close the PR:
-
Non-custom events on the loading screen are intentionally dropped to prevent a character from executing "old" events right after a warp.
-
Even if a non-custom event is dropped just before the loading screen appears, it's not a big deal, because a human player won't notice whether it was dropped right before or during the loading screen.
-
All event handlers in the engine should be able to handle dropped or partially dropped events (e.g. a key press consists of a "down" and an "up" events, and only the "up" event might get dropped because of point 1, thus "partially dropped" naming). Any bugs (like the one where the character keeps moving after a warp) should be fixed in their own handlers and are probably not caused by point 1 (at least there are no known such issues by now).
-
Any automation (like Lua scripts, etc.) has its own logic and needs to deal with dropped events (as described in point 1) on its own.
Points 1 and 4 are crucial for my automated agent, which makes me stick to my own branch. That's totally fine with me, I just wanted to clarify things.
Thanks to all of you folks for such a long conversation.
A quite legitimate warning: if FMT_EXCEPTIONS is not defined we should return something from the function. Let it be an empty string. Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
66921e0
to
3a0afe3
Compare
There are various message handlers in the code, including: PollEvent() FetchMessage() FetchMessage_Real() These functions internally call SDL_PollEvent() and then handle the received events differently depending on the context. This commit separates the actual handlers from the poll invocations by adding an additional `poll` callback parameter. This parameter can be either SDL_PollEvent() or something else, allowing the caller of the message handler to decide which events need to be processed. This commit does not change the logic of event processing. Its main purpose is to make it possible to replace SDL_PollEvent() with SDL_PeepEvents() in the next patch. Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
`ShowProgress` is called on any custom event, such as when a player warps to a new level. It is designed to handle only custom events and discards others from the queue. This behavior results in lost keystrokes sent by the AI agent. For instance, the issue was reproduced when attempting to pause the game by sending the PAUSE key, which was lost due to the behavior of `ShowProgress`. This change introduces a poller along with the custom handler. For the `ShowProgress` routine, the poller will peek at only custom events, leaving others in the queue. Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
Patch fixes various crashes and accesses to the audio or SDL subsystems that were not properly initialized in HeadlessMode: * Don't create an SDL window * Fully disable audio/sndfx during headless mode * Init SDL with SDL_INIT_EVENTS, otherwise an error: "The event system has been shut down" Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
This patch updates the function `gmenu_presskeys()` to return `true` if the keys are actually handled. This allows the caller function `PressKey()` to continue processing keys and not silently drop them, even when the game menu is active. Why bother? First of all, there should not be any reasons not to handle keys once the game menu is open. For example, why not handle the "Quick Load" shortcut once the player is dead and the menu is shown on the screen? The other reason is more practical. In headless mode, once the game is controlled by a 3rd party application (training of an AI agent, for example), lost key presses lead to a broken training procedure. Therefore, even if the menu is raised, it is crucial not to lose a keystroke. Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
3a0afe3
to
1512614
Compare
This PR addresses various crashes in HeadlessMode and lost keystrokes caused by the game menu or incorrect event polling. These issues were observed during AI agent training, and commits targeting them are part of the RFC #7974 PR. I decided to split the original RFC on several PRs because it is unnecessarily complex and has an unclear status. This PR includes the following commits for easier review: