8000 Speed up history `get_states` by bramkragten · Pull Request #23881 · home-assistant/core · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Speed up history get_states #23881

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 25, 2019
Merged

Speed up history get_states #23881

merged 3 commits into from
Aug 25, 2019

Conversation

bramkragten
Copy link
Member
@bramkragten bramkragten commented May 15, 2019

Description:

Adding a boundary of the start of the recorder run the point is in, significantly decreases the time of the query. This speeds up the fetching of the history of 1 entity.

Checklist:

  • The code change is tested and works locally.
  • Local tests pass with tox. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the [development checklist][dev-checklist]

@bramkragten bramkragten requested a review from a team as a code owner May 15, 2019 15:07
@ghost
Copy link
ghost commented May 15, 2019

Hey there @home-assistant/core, mind taking a look at this pull request as its been labeled with a integration (history) you are listed as a codeowner for? Thanks!

This is a automatic comment generated by codeowners-mention to help ensure issues and pull requests are seen by the right people.

@OverloadUT
Copy link
Contributor

Won't this cause single-entity history graphs to be truncated to when Hass last booted up, rather than getting the full 24 hours?

@bramkragten
Copy link
Member Author
bramkragten commented May 15, 2019

I don’t think so, it gets the recorder run that was active on the moment you are requesting. So we are limiting the search on the recorder run we want: https://github.com/home-assistant/home-assistant/blob/dev/homeassistant/components/recorder/__init__.py#L88

Btw, the same is done for multiple entities requests.

@OverloadUT
Copy link
Contributor

You just changed this from a single line change to a bit more. Let me know when you're all done and I'll review!

@bramkragten
Copy link
Member Author

Sorry! I'm happy to get the edit out and add it in a new PR? I'm done now btw :-)

@OverloadUT
Copy link
Contributor

Got it. I see the improvement in the single-query conditional to avoid the join; good catch.

However, clamping single-entity results to a single recorder run is going to truncate the possible results, as far as I can tell. This affects more than the Hass frontend, as this method would be used in API requests for getting weeks of state history, which will now always be clamped to within a single run?

I might be misremembering how the recorder runs table works, but does it essentially create a new run on each Hass boot?

(Also, from the benchmarks I did when I wrote this code, I believe even on a Pi with millions of state changes, the single-entity query was pretty much instant - how much of an improvement are you seeing with this change? And what DB engine? I only did benchmarks on MySQL)

@bramk
8000
ragten
Copy link
Member Author
bramkragten commented May 15, 2019

As far as I can tell it is searching the first state before a specific time, it will first search the recorder run that was active at that time, and will use the start time of that run as a start and the requested time as end.

The only possibility that I see is that the requested time was very near the start of the recorder run and no states will be found.

The multiple entities query has always had this condition and before the single entity was split into a different query it did too: 438edc5#diff-bfb762edec66809f6ea72fe4dd388975L125

The query to get history for periods is not changed, only the query to get 1 record for a specific time.

My db is postgresql, and I see large improvements; from 4 seconds to 0.2 seconds.

@OverloadUT
Copy link
Contributor

Ah ha, thanks for the info. I also just caught up on your musings in Discord which helped understand the situation.

Am I understanding it right that the frontend requests don't actually hit this code path, and that the performance issues you saw were from the API calls?

Sorry, just re-wrapping my head around all of this. It's been quite a long time since I modified this code, and it ain't simple!

@bramkragten
Copy link
Member Author

Frontend does touch this code, as they fire the same API requests I was running with curl
It ain't the simplest code to figure out no...

When the frontend requests data for the graphs, we are first getting the state data for the requested period and then we create a fake 0 point. That zero point query is what this PR touches.

@OverloadUT
Copy link
Contributor

Okay, I remember now. That first data point challenge is a huge one. (Grafana doesn't even try to solve it, and leaves its graphs blank until the first data point in the displayed time period!) I am also remembering that the method names here are part of the confusion, as they are very generic.

Okay, I'll get this formally reviewed very soon.

@balloob
Copy link
Member
balloob commented Jul 1, 2019

@OverloadUT would you have time to review this PR?

Adding a boundary of the start of the recorder run the point is in, significantly increases the time of the query. This speeds up the fetching of the history of 1 entity.
no need for joins with single entity
@bramkragten
Copy link
Member Author

Rebased this PR in the hope of getting a review :-)

@pvizeli
Copy link
Member
pvizeli commented Aug 5, 2019

@OverloadUT ?

@OverloadUT
Copy link
Contributor

Sorry about the delay here. I got blocked by not having good test db data in my dev instance and then put this off. Let me correct that and get this reviewed and tested ASAP.

@OverloadUT
Copy link
Contributor

Just an update that I started looking at this today. I've got a dev environment with a database filling up and I've been poking at it to ensure everything is working correctly. I am seeing a bug manifesting as a UI bug, so I'm going to check to see if the issue is related to the API response. I should have this all done this weekend.

Copy link
Contributor
@OverloadUT OverloadUT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me. I tested it alongside the commit this was branched from, and I can indeed see a marked improvement to the time it takes to fetch the first datapoint for the history graphs.

In my test data which is running on a very fast computer, a test query was improved from 0.019181s to 0.006119s which is over 3 times faster. I imagine the improvement on a slower machine like a Pi would be far more impactful.

At some point a refactor of this code would be very helpful - the names of the functions here are very confusing which made this review harder to do.

@pvizeli pvizeli merged commit 248619a into dev Aug 25, 2019
@delete-merged-branch delete-merged-branch bot deleted the history-opti branch August 25, 2019 19:11
@balloob balloob mentioned this pull request Sep 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0