-
-
Notifications
You must be signed in to change notification settings - Fork 33.7k
Set executor to 15 and help to reduce flooting async core with updates #4252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@pvizeli, thanks for your PR! By analyzing the history of the files in this pull request, we identified @balloob, @fabianhjr and @fabaff to be potential reviewers. |
@robbiet480 say that fix the bug |
tasks = [] | ||
for entity in self.platform_entities: | ||
if entity.should_poll: | ||
task = self.component.hass.async_add_job( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this be a similar but easier approach:
tasks = []
to_update = []
for entity in self.platform_entities:
if not entity.should_poll:
continue
if hasattr(entity, 'async_update'):
tasks.append(entity.async_update_ha_state(True))
else:
to_update.append(entity)
for entity in to_update:
yield from self.component.hass.loop.run_in_executor(None, self.update_ha_state, True)
if tasks:
yield from asyncio.wait(tasks, loop=self.component.hass.loop)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should never return the task from async_add_job. You add a job because you don't care about the result. If you care about the result you yield from it.
self.component.hass.async_add_job( | ||
entity.async_update_ha_state(True) | ||
) | ||
# protect core for flooting all executors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from flooding
So, little description in the PR and just a comment about that it works for Robbie. So I did some research why it is fixing it. This is how the original code looked. The original code would inside a thread, call each update function for entities that should be polled. The current code is a complete overhaul and instead runs all commands in parallel at the same time. We incorrectly assumed that all components/platforms are able to figure this out. The good news is that components/platforms using Not all code is perfect however. Take for example Nest. Nest does not use a lock or an indicator that something is in progress thus it now all of a sudden will have many requests in flight and only the last one will set the cache. Good ol' race condition. This is what is causing #4241. However, all this does not give us a reason why Robbie's setup is no longer working after 3 minutes. My initial hunch is that it's because we are running out of executor threads. Right now we need executor threads for doing device updates and handling service calls. Note that we no longer have a prioritized task list for our thread pool with the move to the executor! So let's say that a user uses Philips Hue and has 8 lights. By scheduling all updates at the same time, we're running 8 tasks which each will yield to the executor to call the update method. That's 8 threads occupied! In the case of Hue we do use Platforms that have an async_update do not suffer this fate as they will yield from an async lock, giving control back to the event loop. So yes, we should move back to call update sequential:
We should also craft a patch for our current release, as this is causing issues today. |
@@ -219,6 +219,7 @@ def async_add_job(self, target: Callable[..., None], *args: Any) -> None: | |||
# if a task is sheduled | |||
if task is not None: | |||
self._pending_tasks.add(task) | |||
return task |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not return tasks, if people want to await a task, they should just use yield from
Ok to merge when tests pass 🐬 |
Description:
Should help on #4251