-
Notifications
You must be signed in to change notification settings - Fork 674
Optimize list of digits for generating identifiers for gzip compression. (fixes #142) #397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The list is based on reserved words and identifiers used in dot-expressions. It saves a quite a few bytes.
This will likely help most projects gain extra gzip juice, however, to maximize the benefit from gzip, a more sophisticated "two pass" method would need to be implemented. First you'd have to minify the source once and determine what keywords and variables can't be munged. Then, using the list of "unmungable" keywords/strings/etc, you would then determine which characters in that list appear most frequently. Use this new list as your |
Was the list weighted by how common each reserved word is in real-world code? I bet edit: @ralphholzmann has a much better approach. |
@michaelficarra This is why I propose making the list dynamic based on the script being compressed. |
@michaelficarra The list was based on jQuery, Prototype and Mootools. There are two basic things that most likely would never be minified: keywords and identifiers in dot-expressions. To generate a more weighted list it would be trivial to take thousands of scripts from various website and just count everything and generate the list again. I am currently unable to decide on whether implement a two-pass system for generating this list would be truly beneficial. I would not attempt it, since this weighted list does provide a few kilobytes more compression with gzip than the original list on a source tree of just about 3.5 MB. Every little bit counts. |
@rvanvelzen a generalized list of digits will never be more efficient than a two-pass analysis per script. So the question isn't whether it will be truly beneficial -- it will always be more beneficial -- it's a question of whether or not its worth it to implement. |
@ralphholzmann That was the point I was trying to make, but expressing myself in English isn't always as easy as in Dutch. :-) Besides that: even this simple optimisation will have significant gains in all manner of projects. There are some small improvements that could be made, but most of those would only influence 0..9 in this list which are probably never used at all. One extra gain could be checking the contents of strings as well. Those are not taken into account in this list. |
Optimize list of digits for generating identifiers for gzip compression. (fixes #142)
I didn't even test it, but I think it's a good idea. Yeah, we should think about the dynamic version, that'll be a lot better in general. Any case, for now patch accepted — if it shaves 100b on jQuery then great, thanks! ;-) |
Side note: because of the way planets are aligned, I can't run nodeunit right now to see how all the tests fail. This patch is a nice example of a perfectly valid improvement which breaks all the tests, confirming my belief that unit testing (at least the way we think of it today) is not such a good idea. |
- support for directives (i.e. "use strict";) - newlines in multi-line comments trigger ASI - added nodeunit dependency for NPM, other code restructuring - apply ascii_only option to regexps - allow defines when not mangling - some parser/code generator fixes #376, #396 - (static) mangler optimization for gzip (#397) Contributors (in no particular order): Richard van Velzen <rvanvelzen@expert-shops.com> Paul Baumgart <paul@proxv.com> Mal Graty <mal.graty@googlemail.com> Jez Ng <jezreel@gmail.com> Robert Gust-Bardon <donate@robert.gust-bardon.org>
The list is based on reserved words and identifiers used in dot-expressions. It saves a quite a few bytes.
The savings are about 100 bytes on jQuery 1.7.2 and the same goes for Prototype.