8000 Speed-up and cleanup objToJSON by eltoder · Pull Request #615 · ultrajson/ultrajson · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Speed-up and cleanup objToJSON #615

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 10, 2023

Conversation

eltoder
Copy link
Contributor
@eltoder eltoder commented Dec 4, 2023
  • Use PyDict_Next() to iterate over dicts.
  • Use macros to access lists, tuples, bytes.
  • Avoid calling PyErr_Occurred() if not necessary.
  • Fix a memory leak when encoding very large ints.
  • Delete dead and duplicate code.

Also,

  • Raise TypeError if toDict() returns a non-dict instead of silently converting it to null.

@hugovk
Copy link
Member
hugovk commented Dec 4, 2023

What sort of speedup are we looking at? Please can you share some benchmark scripts to demonstrate it?

@eltoder
Copy link
Contributor Author
eltoder commented Dec 5, 2023

@hugovk I ran tests/benkmark.py with and without the change. The results are noisy, so I ran both sides 5 times and took max for each result. The highlights are:

Medium complex object
(before) ujson encode: 24,684.92 calls/sec
(after) ujson encode: 29,502.89 calls/sec

Array with 256 dict{string, int} pairs
(before) ujson encode: 29,745.80 calls/sec
(after) ujson encode: 35,518.47 calls/sec

Dict with 256 arrays with 256 dict{string, int} pairs
(before) ujson encode: 109.98 calls/sec
(after) ujson encode: 129.91 calls/sec

So we got about 20% speedup in these 3 cases. The other cases are unchanged, modulo noise. I think this is pretty good, given that the new code is simpler and shorter.

@hugovk hugovk added the changelog: Changed For changes in existing functionality label Dec 5, 2023
* Use PyDict_Next() to iterate over dicts.
* Use macros to access lists, tuples, bytes.
* Avoid calling PyErr_Occurred() if not necessary.
* Fix a memory leak when encoding very large ints.
* Delete dead and duplicate code.

Also,

* Raise TypeError if toDict() returns a non-dict instead of silently
  converting it to null.
Do not create a list of tuples with (converted key, value) upfront.
Instead, convert keys and fetch values during iteration.

Also, if sorting fails, preserve the original exception instead of
overwriting it with a less informative ValueError. This is the same
behavior as the standard library's json module.
@eltoder eltoder force-pushed the feature/speedup-dumps branch from fba64dc to 5b9fd1e Compare December 7, 2023 07:19
@codecov-commenter
Copy link

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (a08b75b) 91.68% compared to head (810e9bf) 92.34%.

❗ Current head 810e9bf differs from pull request most recent head 5b9fd1e. Consider uploading reports for the commit 5b9fd1e to get more accurate results

Files Patch % Lines
python/objToJSON.c 94.87% 4 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #615      +/-   ##
==========================================
+ Coverage   91.68%   92.34%   +0.65%     
==========================================
  Files           6        6              
  Lines        1949     1908      -41     
==========================================
- Hits         1787     1762      -25     
+ Misses        162      146      -16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@eltoder
Copy link
Contributor Author
eltoder commented Dec 7, 2023

@hugovk I pushed another small commit that optimizes encoding with sorted keys. Now sorted output is also about 20% faster and uses less memory:

Dict with 256 arrays with 256 dict{string, int} pairs, outputting sorted keys
(before) ujson encode: 101.27 calls/sec
(after) ujson encode: 121.38 calls/sec

@bwoodsend bwoodsend merged commit 381f248 into ultrajson:main Dec 10, 2023
@bwoodsend
Copy link
Collaborator

@hugovk I'm going to go hit the release button unless you object?

@hugovk
Copy link
Member
hugovk commented Dec 10, 2023

Go for it!

Do we want a major release because of #614? I don't mind too much either way.

@bwoodsend
Copy link
Collaborator

Do we want a major release because of #614? I don't mind too much either way.

Actually, I don't think I do now. It seems too out of typical use to call a breaking change.

@eltoder eltoder deleted the feature/speedup-dumps branch December 11, 2023 01:42
@eltoder
Copy link
Contributor Author
eltoder commented Dec 11, 2023

@bwoodsend Thanks!

I assumed you were going to make a new major version because of #614, so I included

Raise TypeError if toDict() returns a non-dict instead of silently converting it to null.

in this PR, which is also technically a breaking change. We should probably at least highlight the change in release notes.

@jamadden
Copy link

Raise TypeError if toDict() returns a non-dict instead of silently converting it to null.

in this PR, which is also technically a breaking change. We should probably at least highlight the change in release notes.

Indeed, that change unexpectedly hit us and broke our CI pipeline. The ultimate error is ours, but it did take a bit of head-scratching to figure out what happened, as this is not currently called out in the release notes that I've been able to find.

@bwoodsend
Copy link
Collaborator

Ugh, sorry guys. I've added a breaking section to the release notes although that doesn't fix the fact that I broke semantic versioning. We could yank 5.9.0 from PyPI and rerelease as 6.0.0 although I'd definitely consider that to be too far.

@eltoder
Copy link
Contributor Author
eltoder commented Dec 12, 2023

@jamadden for my information, were you using this as a feature, or was this just a bug that went unnoticed? I was considering to allow toDict() to return None. I only didn't do this because there were no tests or documentation for this behavior.

@jamadden
Copy link

@eltoder In our case, in the one scenario we've identified so far, it was basically accidental. A unittest.mock.MagicMock instance had snuck its way as into a dict as a value. The test case responsible didn't care about that value, so when it examined the final JSON, it never noticed that some key had a null value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog: Changed For changes in existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0