-
-
Notifications
You must be signed in to change notification settings - Fork 33.7k
Speed up history get_states
#23881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up history get_states
#23881
Conversation
Hey there @home-assistant/core, mind taking a look at this pull request as its been labeled with a integration ( This is a automatic comment generated by codeowners-mention to help ensure issues and pull requests are seen by the right people. |
Won't this cause single-entity history graphs to be truncated to when Hass last booted up, rather than getting the full 24 hours? |
I don’t think so, it gets the recorder run that was active on the moment you are requesting. So we are limiting the search on the recorder run we want: https://github.com/home-assistant/home-assistant/blob/dev/homeassistant/components/recorder/__init__.py#L88 Btw, the same is done for multiple entities requests. |
You just changed this from a single line change to a bit more. Let me know when you're all done and I'll review! |
Sorry! I'm happy to get the edit out and add it in a new PR? I'm done now btw :-) |
Got it. I see the improvement in the single-query conditional to avoid the join; good catch. However, clamping single-entity results to a single recorder run is going to truncate the possible results, as far as I can tell. This affects more than the Hass frontend, as this method would be used in API requests for getting weeks of state history, which will now always be clamped to within a single run? I might be misremembering how the recorder runs table works, but does it essentially create a new run on each Hass boot? (Also, from the benchmarks I did when I wrote this code, I believe even on a Pi with millions of state changes, the single-entity query was pretty much instant - how much of an improvement are you seeing with this change? And what DB engine? I only did benchmarks on MySQL) |
As far as I can tell it is searching the first state before a specific time, it will first search the recorder run that was active at that time, and will use the start time of that run as a start and the requested time as end. The only possibility that I see is that the requested time was very near the start of the recorder run and no states will be found. The multiple entities query has always had this condition and before the single entity was split into a different query it did too: 438edc5#diff-bfb762edec66809f6ea72fe4dd388975L125 The query to get history for periods is not changed, only the query to get 1 record for a specific time. My db is postgresql, and I see large improvements; from 4 seconds to 0.2 seconds. |
Ah ha, thanks for the info. I also just caught up on your musings in Discord which helped understand the situation. Am I understanding it right that the frontend requests don't actually hit this code path, and that the performance issues you saw were from the API calls? Sorry, just re-wrapping my head around all of this. It's been quite a long time since I modified this code, and it ain't simple! |
Frontend does touch this code, as they fire the same API requests I was running with curl When the frontend requests data for the graphs, we are first getting the state data for the requested period and then we create a fake 0 point. That zero point query is what this PR touches. |
Okay, I remember now. That first data point challenge is a huge one. (Grafana doesn't even try to solve it, and leaves its graphs blank until the first data point in the displayed time period!) I am also remembering that the method names here are part of the confusion, as they are very generic. Okay, I'll get this formally reviewed very soon. |
@OverloadUT would you have time to review this PR? |
Adding a boundary of the start of the recorder run the point is in, significantly increases the time of the query. This speeds up the fetching of the history of 1 entity.
no need for joins with single entity
bab6a8c
to
0ba478b
Compare
Rebased this PR in the hope of getting a review :-) |
Sorry about the delay here. I got blocked by not having good test db data in my dev instance and then put this off. Let me correct that and get this reviewed and tested ASAP. |
Just an update that I started looking at this today. I've got a dev environment with a database filling up and I've been poking at it to ensure everything is working correctly. I am seeing a bug manifesting as a UI bug, so I'm going to check to see if the issue is related to the API response. I should have this all done this weekend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks good to me. I tested it alongside the commit this was branched from, and I can indeed see a marked improvement to the time it takes to fetch the first datapoint for the history graphs.
In my test data which is running on a very fast computer, a test query was improved from 0.019181s to 0.006119s which is over 3 times faster. I imagine the improvement on a slower machine like a Pi would be far more impactful.
At some point a refactor of this code would be very helpful - the names of the functions here are very confusing which made this review harder to do.
Description:
Adding a boundary of the start of the recorder run the point is in, significantly decreases the time of the query. This speeds up the fetching of the history of 1 entity.
Checklist:
tox
. Your PR cannot be merged unless tests pass