Speed up history get_states

bramkragten · 2019-05-15T15:07:49Z

Description:

Adding a boundary of the start of the recorder run the point is in, significantly decreases the time of the query. This speeds up the fetching of the history of 1 entity.

Checklist:

The code change is tested and works locally.
Local tests pass with tox. Your PR cannot be merged unless tests pass
There is no commented out code in this PR.
I have followed the [development checklist][dev-checklist]

ghost · 2019-05-15T15:07:58Z

Hey there @home-assistant/core, mind taking a look at this pull request as its been labeled with a integration (history) you are listed as a codeowner for? Thanks!

This is a automatic comment generated by codeowners-mention to help ensure issues and pull requests are seen by the right people.

OverloadUT · 2019-05-15T15:39:14Z

Won't this cause single-entity history graphs to be truncated to when Hass last booted up, rather than getting the full 24 hours?

bramkragten · 2019-05-15T15:49:33Z

I don’t think so, it gets the recorder run that was active on the moment you are requesting. So we are limiting the search on the recorder run we want: https://github.com/home-assistant/home-assistant/blob/dev/homeassistant/components/recorder/__init__.py#L88

Btw, the same is done for multiple entities requests.

OverloadUT · 2019-05-15T20:49:46Z

You just changed this from a single line change to a bit more. Let me know when you're all done and I'll review!

bramkragten · 2019-05-15T20:50:41Z

Sorry! I'm happy to get the edit out and add it in a new PR? I'm done now btw :-)

OverloadUT · 2019-05-15T20:54:46Z

Got it. I see the improvement in the single-query conditional to avoid the join; good catch.

However, clamping single-entity results to a single recorder run is going to truncate the possible results, as far as I can tell. This affects more than the Hass frontend, as this method would be used in API requests for getting weeks of state history, which will now always be clamped to within a single run?

I might be misremembering how the recorder runs table works, but does it essentially create a new run on each Hass boot?

(Also, from the benchmarks I did when I wrote this code, I believe even on a Pi with millions of state changes, the single-entity query was pretty much instant - how much of an improvement are you seeing with this change? And what DB engine? I only did benchmarks on MySQL)

bramkragten · 2019-05-15T21:04:00Z

As far as I can tell it is searching the first state before a specific time, it will first search the recorder run that was active at that time, and will use the start time of that run as a start and the requested time as end.

The only possibility that I see is that the requested time was very near the start of the recorder run and no states will be found.

The multiple entities query has always had this condition and before the single entity was split into a different query it did too: 438edc5#diff-bfb762edec66809f6ea72fe4dd388975L125

The query to get history for periods is not changed, only the query to get 1 record for a specific time.

My db is postgresql, and I see large improvements; from 4 seconds to 0.2 seconds.

OverloadUT · 2019-05-15T21:08:48Z

Ah ha, thanks for the info. I also just caught up on your musings in Discord which helped understand the situation.

Am I understanding it right that the frontend requests don't actually hit this code path, and that the performance issues you saw were from the API calls?

Sorry, just re-wrapping my head around all of this. It's been quite a long time since I modified this code, and it ain't simple!

bramkragten · 2019-05-15T21:14:04Z

Frontend does touch this code, as they fire the same API requests I was running with curl
It ain't the simplest code to figure out no...

When the frontend requests data for the graphs, we are first getting the state data for the requested period and then we create a fake 0 point. That zero point query is what this PR touches.

OverloadUT · 2019-05-15T21:17:29Z

Okay, I remember now. That first data point challenge is a huge one. (Grafana doesn't even try to solve it, and leaves its graphs blank until the first data point in the displayed time period!) I am also remembering that the method names here are part of the confusion, as they are very generic.

Okay, I'll get this formally reviewed very soon.

balloob · 2019-07-01T21:48:58Z

@OverloadUT would you have time to review this PR?

Adding a boundary of the start of the recorder run the point is in, significantly increases the time of the query. This speeds up the fetching of the history of 1 entity.

no need for joins with single entity

bramkragten · 2019-08-05T13:46:44Z

Rebased this PR in the hope of getting a review :-)

pvizeli · 2019-08-05T20:09:54Z

@OverloadUT ?

OverloadUT · 2019-08-05T20:20:10Z

Sorry about the delay here. I got blocked by not having good test db data in my dev instance and then put this off. Let me correct that and get this reviewed and tested ASAP.

OverloadUT · 2019-08-24T07:25:19Z

Just an update that I started looking at this today. I've got a dev environment with a database filling up and I've been poking at it to ensure everything is working correctly. I am seeing a bug manifesting as a UI bug, so I'm going to check to see if the issue is related to the API response. I should have this all done this weekend.

OverloadUT

This all looks good to me. I tested it alongside the commit this was branched from, and I can indeed see a marked improvement to the time it takes to fetch the first datapoint for the history graphs.

In my test data which is running on a very fast computer, a test query was improved from 0.019181s to 0.006119s which is over 3 times faster. I imagine the improvement on a slower machine like a Pi would be far more impactful.

At some point a refactor of this code would be very helpful - the names of the functions here are very confusing which made this review harder to do.

bramkragten requested a review from a team as a code owner May 15, 2019 15:07

homeassistant added cla-signed core integration: history small-pr PRs with less than 30 lines. labels May 15, 2019

bramkragten requested a review from OverloadUT May 15, 2019 15:13

bramkragten mentioned this pull request May 15, 2019

Use localStorage for caching history data kalkih/mini-graph-card#88

Merged

bramkragten added 2 commits August 5, 2019 15:39

Speed up history get_states

fda0de4

Adding a boundary of the start of the recorder run the point is in, significantly increases the time of the query. This speeds up the fetching of the history of 1 entity.

Make single entity query easier

0ba478b

no need for joins with single entity

bramkragten force-pushed the history-opti branch from bab6a8c to 0ba478b Compare August 5, 2019 13:44

Lint

4dc944f

OverloadUT approved these changes Aug 24, 2019

View reviewed changes

pvizeli merged commit 248619a into dev Aug 25, 2019

delete-merged-branch bot deleted the history-opti branch August 25, 2019 19:11

balloob mentioned this pull request Sep 18, 2019

0.99.0 #26710

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Speed up history `get_states` #23881

Speed up history `get_states` #23881

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Speed up history get_states #23881

Speed up history get_states #23881

Uh oh!

Conversation

Uh oh!

Description:

Checklist:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Speed up history `get_states` #23881

Speed up history `get_states` #23881