8000 Optimize list of digits for generating identifiers for gzip compression. (fixes #142) by rvanvelzen · Pull Request #397 · mishoo/UglifyJS-old · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Optimize list of digits for generating identifiers for gzip compression. (fixes #142) #397

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 25, 2012
Merged

Optimize list of digits for generating identifiers for gzip compression. (fixes #142) #397

merged 1 commit into from
May 25, 2012

Conversation

rvanvelzen
Copy link
Contributor

The list is based on reserved words and identifiers used in dot-expressions. It saves a quite a few bytes.

The savings are about 100 bytes on jQuery 1.7.2 and the same goes for Prototype.

The list is based on reserved words and identifiers used in dot-expressions. It saves a quite a few bytes.
@ralphholzmann
Copy link

This will likely help most projects gain extra gzip juice, however, to maximize the benefit from gzip, a more sophisticated "two pass" method would need to be implemented. First you'd have to minify the source once and determine what keywords and variables can't be munged. Then, using the list of "unmungable" keywords/strings/etc, you would then determine which characters in that list appear most frequently. Use this new list as your digits variable and do a second pass of the minifier over the code. This will yield a unique set of digits for each script being minified and will maximize the potential of gzip.

@michaelficarra
Copy link
Contributor

Was the list weighted by how common each reserved word is in real-world code? I bet function is an extremely common keyword, so its characters should have a higher weight, pushing e out of that front position. Also surprising: 5 has a higher precedence than 1. Remember that numbers are used outside of identifiers, and I bet adding/subtracting 1 in an expression is a common task (length + 1, you get the idea). I don't think this is acceptable yet, at least not without a better explanation of the methodology. -1 for now.

edit: @ralphholzmann has a much better approach.

@ralphholzmann
Copy link

@michaelficarra This is why I propose making the list dynamic based on the script being compressed.

@rvanvelzen
Copy link
Contributor Author

@michaelficarra The list was based on jQuery, Prototype and Mootools. There are two basic things that most likely would never be minified: keywords and identifiers in dot-expressions. To generate a more weighted list it would be trivial to take thousands of scripts from various website and just count everything and generate the list again.

I am currently unable to decide on whether implement a two-pass system for generating this list would be truly beneficial. I would not attempt it, since this weighted list does provide a few kilobytes more compression with gzip than the original list on a source tree of just about 3.5 MB. Every little bit counts.

@ralphholzmann
Copy link

@rvanvelzen a generalized list of digits will never be more efficient than a two-pass analysis per script. So the question isn't whether it will be truly beneficial -- it will always be more beneficial -- it's a question of whether or not its worth it to implement.

@rvanvelzen
Copy link
Contributor Author

@ralphholzmann That was the point I was trying to make, but expressing myself in English isn't always as easy as in Dutch. :-)

Besides that: even this simple optimisation will have significant gains in all manner of projects. There are some small improvements that could be made, but most of those would only influence 0..9 in this list which are probably never used at all.

One extra gain could be checking the contents of strings as well. Those are not taken into account in this list.

mishoo added a commit that referenced this pull request May 25, 2012
Optimize list of digits for generating identifiers for gzip compression. (fixes #142)
@mishoo mishoo merged commit f834ec6 into mishoo:master May 25, 2012
@mishoo
Copy link
Owner
mishoo commented May 25, 2012

I didn't even test it, but I think it's a good idea. Yeah, we should think about the dynamic version, that'll be a lot better in general. Any case, for now patch accepted — if it shaves 100b on jQuery then great, thanks! ;-)

@mishoo
Copy link
Owner
mishoo commented May 25, 2012

Side note: because of the way planets are aligned, I can't run nodeunit right now to see how all the tests fail.

This patch is a nice example of a perfectly valid improvement which breaks all the tests, confirming my belief that unit testing (at least the way we think of it today) is not such a good idea.

mishoo added a commit that referenced this pull request May 28, 2012
- support for directives (i.e. "use strict";)
- newlines in multi-line comments trigger ASI
- added nodeunit dependency for NPM, other code restructuring
- apply ascii_only option to regexps
- allow defines when not mangling
- some parser/code generator fixes #376, #396
- (static) mangler optimization for gzip (#397)

Contributors (in no particular order):

Richard van Velzen <rvanvelzen@expert-shops.com>
Paul Baumgart <paul@proxv.com>
Mal Graty <mal.graty@googlemail.com>
Jez Ng <jezreel@gmail.com>
Robert Gust-Bardon <donate@robert.gust-bardon.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0