-
-
Notifications
You must be signed in to change notification settings - Fork 120
create-index should run analyze after creating index #365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Relevant documentation: https://www.sqlite.org/lang_analyze.html |
I've not used the Annoyingly I use the word "analyze" to mean something else in the CLI - for these features: |
i added an index to one table with sqlite-utils, and then a query that used to take about 1 second started taking hundreds of seconds. running analyze got me back to sub second speed. |
After implementing #366 I can make it so |
Or I could leave off |
The one thing that worries me a little bit about doing this by default is that it adds a surprising new table to the database - it may be confusing to users if they run Options here are:
I'm currently leading towards that third option - @fgregg any thoughts? |
for options 2 and 3, i would worry about discoverablity. in other db’s it is not necessary to explicitly call analyze for most indices. ie for postgres
i suppose i would propose raising a warning if the stats table is created that explains what is going on and informs users about a —no-analyze argument. |
Is there a downside to having a Imagine the following sequence of events:
The user now has a database file with several million records and a statistics table that is wildly out of date, having been populated when they only had a few. Will this result in surprisingly bad query performance compared to it that statistics table did not exist at all? If so, I lean much harder towards |
The reason I'm hesitating on this is that I've not actually used ANALYZE at all in nearly five years of messing around with SQLite! So I'm nervous that there are surprise downsides I haven't thought of. My hunch is that ANALYZE is only worth worrying about on much larger databases, in which case I'm OK supporting it as a thoroughly documented power-user feature rather than a default. |
the table with the query ran so bad was about 50k. i think the scenario should not be worse than no stats. i also did not know that sqlite was so different from postgres and needed an explicit analyze call. |
the out-of-date scenario you describe could be addressed by automatically adding an analyze to the insert or convert commands if they implicate an index |
or using “ pragma optimize” |
Found one report on Stack Overflow from 9 years ago of someone seeing broken performance after running |
i don’t want to be such a partisan for analyze, but the query planner deciding not to use an index based on information collected by analyze is not necessarily a bug, but could be the correct choice.
|
I decided to go with making this opt-in, mainly for consistency with the other places where I added this feature - see: You can now run the following:
And |
thanks so much! always a pleasure to see how you work through these things |
sqlite's query planner depends upon analyze to make good use of indices. It would be nice if analyze was run as part of the create-index command.
If data is inserted later, things can get out date, but it would still probably be a net win.
The text was updated successfully, but these errors were encountered: