Speed-up and cleanup objToJSON #615

eltoder · 2023-12-04T11:42:36Z

Use PyDict_Next() to iterate over dicts.
Use macros to access lists, tuples, bytes.
Avoid calling PyErr_Occurred() if not necessary.
Fix a memory leak when encoding very large ints.
Delete dead and duplicate code.

Also,

Raise TypeError if toDict() returns a non-dict instead of silently converting it to null.

hugovk · 2023-12-04T12:21:46Z

What sort of speedup are we looking at? Please can you share some benchmark scripts to demonstrate it?

eltoder · 2023-12-05T13:56:47Z

@hugovk I ran tests/benkmark.py with and without the change. The results are noisy, so I ran both sides 5 times and took max for each result. The highlights are:

Medium complex object
(before) ujson encode: 24,684.92 calls/sec
(after) ujson encode: 29,502.89 calls/sec

Array with 256 dict{string, int} pairs
(before) ujson encode: 29,745.80 calls/sec
(after) ujson encode: 35,518.47 calls/sec

Dict with 256 arrays with 256 dict{string, int} pairs
(before) ujson encode: 109.98 calls/sec
(after) ujson encode: 129.91 calls/sec

So we got about 20% speedup in these 3 cases. The other cases are unchanged, modulo noise. I think this is pretty good, given that the new code is simpler and shorter.

* Use PyDict_Next() to iterate over dicts. * Use macros to access lists, tuples, bytes. * Avoid calling PyErr_Occurred() if not necessary. * Fix a memory leak when encoding very large ints. * Delete dead and duplicate code. Also, * Raise TypeError if toDict() returns a non-dict instead of silently converting it to null.

Do not create a list of tuples with (converted key, value) upfront. Instead, convert keys and fetch values during iteration. Also, if sorting fails, preserve the original exception instead of overwriting it with a less informative ValueError. This is the same behavior as the standard library's json module.

codecov-commenter · 2023-12-07T07:20:55Z

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (a08b75b) 91.68% compared to head (810e9bf) 92.34%.

❗ Current head 810e9bf differs from pull request most recent head 5b9fd1e. Consider uploading reports for the commit 5b9fd1e to get more accurate results

Files	Patch %	Lines
python/objToJSON.c	94.87%	4 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #615      +/-   ##
==========================================
+ Coverage   91.68%   92.34%   +0.65%     
==========================================
  Files           6        6              
  Lines        1949     1908      -41     
==========================================
- Hits         1787     1762      -25     
+ Misses        162      146      -16

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

eltoder · 2023-12-07T07:25:24Z

@hugovk I pushed another small commit that optimizes encoding with sorted keys. Now sorted output is also about 20% faster and uses less memory:

Dict with 256 arrays with 256 dict{string, int} pairs, outputting sorted keys
(before) ujson encode: 101.27 calls/sec
(after) ujson encode: 121.38 calls/sec

bwoodsend · 2023-12-10T21:12:07Z

@hugovk I'm going to go hit the release button unless you object?

hugovk · 2023-12-10T21:18:31Z

Go for it!

Do we want a major release because of #614? I don't mind too much either way.

bwoodsend · 2023-12-10T22:34:30Z

Do we want a major release because of #614? I don't mind too much either way.

Actually, I don't think I do now. It seems too out of typical use to call a breaking change.

eltoder · 2023-12-11T02:22:05Z

@bwoodsend Thanks!

I assumed you were going to make a new major version because of #614, so I included

Raise TypeError if toDict() returns a non-dict instead of silently converting it to null.

in this PR, which is also technically a breaking change. We should probably at least highlight the change in release notes.

jamadden · 2023-12-11T20:18:15Z

Raise TypeError if toDict() returns a non-dict instead of silently converting it to null.

in this PR, which is also technically a breaking change. We should probably at least highlight the change in release notes.

Indeed, that change unexpectedly hit us and broke our CI pipeline. The ultimate error is ours, but it did take a bit of head-scratching to figure out what happened, as this is not currently called out in the release notes that I've been able to find.

bwoodsend · 2023-12-11T21:35:40Z

Ugh, sorry guys. I've added a breaking section to the release notes although that doesn't fix the fact that I broke semantic versioning. We could yank 5.9.0 from PyPI and rerelease as 6.0.0 although I'd definitely consider that to be too far.

eltoder · 2023-12-12T01:19:17Z

@jamadden for my information, were you using this as a feature, or was this just a bug that went unnoticed? I was considering to allow toDict() to return None. I only didn't do this because there were no tests or documentation for this behavior.

jamadden · 2023-12-12T11:33:33Z

@eltoder In our case, in the one scenario we've identified so far, it was basically accidental. A unittest.mock.MagicMock instance had snuck its way as into a dict as a value. The test case responsible didn't care about that value, so when it examined the final JSON, it never noticed that some key had a null value.

hugovk added the changelog: Changed For changes in existing functionality label Dec 5, 2023

eltoder added 2 commits December 6, 2023 20:14

eltoder force-pushed the feature/speedup-dumps branch from fba64dc to 5b9fd1e Compare December 7, 2023 07:19

bwoodsend approved these changes Dec 9, 2023

View reviewed changes

hugovk approved these changes Dec 10, 2023

View reviewed changes

bwoodsend merged commit 381f248 into ultrajson:main Dec 10, 2023

eltoder deleted the feature/speedup-dumps branch December 11, 2023 01:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed-up and cleanup objToJSON #615

Speed-up and cleanup objToJSON #615

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Speed-up and cleanup objToJSON #615

Speed-up and cleanup objToJSON #615

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!