Optimize list of digits for generating identifiers for gzip compression. (fixes #142) #397

rvanvelzen · 2012-05-25T15:35:50Z

The list is based on reserved words and identifiers used in dot-expressions. It saves a quite a few bytes.

The savings are about 100 bytes on jQuery 1.7.2 and the same goes for Prototype.

The list is based on reserved words and identifiers used in dot-expressions. It saves a quite a few bytes.

ralphholzmann · 2012-05-25T15:45:24Z

This will likely help most projects gain extra gzip juice, however, to maximize the benefit from gzip, a more sophisticated "two pass" method would need to be implemented. First you'd have to minify the source once and determine what keywords and variables can't be munged. Then, using the list of "unmungable" keywords/strings/etc, you would then determine which characters in that list appear most frequently. Use this new list as your digits variable and do a second pass of the minifier over the code. This will yield a unique set of digits for each script being minified and will maximize the potential of gzip.

michaelficarra · 2012-05-25T15:49:42Z

Was the list weighted by how common each reserved word is in real-world code? I bet function is an extremely common keyword, so its characters should have a higher weight, pushing e out of that front position. Also surprising: 5 has a higher precedence than 1. Remember that numbers are used outside of identifiers, and I bet adding/subtracting 1 in an expression is a common task (length + 1, you get the idea). I don't think this is acceptable yet, at least not without a better explanation of the methodology. -1 for now.

edit: @ralphholzmann has a much better approach.

ralphholzmann · 2012-05-25T15:50:41Z

@michaelficarra This is why I propose making the list dynamic based on the script being compressed.

rvanvelzen · 2012-05-25T16:00:19Z

@michaelficarra The list was based on jQuery, Prototype and Mootools. There are two basic things that most likely would never be minified: keywords and identifiers in dot-expressions. To generate a more weighted list it would be trivial to take thousands of scripts from various website and just count everything and generate the list again.

I am currently unable to decide on whether implement a two-pass system for generating this list would be truly beneficial. I would not attempt it, since this weighted list does provide a few kilobytes more compression with gzip than the original list on a source tree of just about 3.5 MB. Every little bit counts.

ralphholzmann · 2012-05-25T16:05:46Z

@rvanvelzen a generalized list of digits will never be more efficient than a two-pass analysis per script. So the question isn't whether it will be truly beneficial -- it will always be more beneficial -- it's a question of whether or not its worth it to implement.

rvanvelzen · 2012-05-25T16:57:32Z

@ralphholzmann That was the point I was trying to make, but expressing myself in English isn't always as easy as in Dutch. :-)

Besides that: even this simple optimisation will have significant gains in all manner of projects. There are some small improvements that could be made, but most of those would only influence 0..9 in this list which are probably never used at all.

One extra gain could be checking the contents of strings as well. Those are not taken into account in this list.

Optimize list of digits for generating identifiers for gzip compression. (fixes #142)

mishoo · 2012-05-25T22:07:06Z

I didn't even test it, but I think it's a good idea. Yeah, we should think about the dynamic version, that'll be a lot better in general. Any case, for now patch accepted — if it shaves 100b on jQuery then great, thanks! ;-)

mishoo · 2012-05-25T22:20:03Z

Side note: because of the way planets are aligned, I can't run nodeunit right now to see how all the tests fail.

This patch is a nice example of a perfectly valid improvement which breaks all the tests, confirming my belief that unit testing (at least the way we think of it today) is not such a good idea.

- support for directives (i.e. "use strict";) - newlines in multi-line comments trigger ASI - added nodeunit dependency for NPM, other code restructuring - apply ascii_only option to regexps - allow defines when not mangling - some parser/code generator fixes #376, #396 - (static) mangler optimization for gzip (#397) Contributors (in no particular order): Richard van Velzen <rvanvelzen@expert-shops.com> Paul Baumgart <paul@proxv.com> Mal Graty <mal.graty@googlemail.com> Jez Ng <jezreel@gmail.com> Robert Gust-Bardon <donate@robert.gust-bardon.org>

Optimize list of digits for generating identifiers for gzip compression.

4072f80

The list is based on reserved words and identifiers used in dot-expressions. It saves a quite a few bytes.

mishoo added a commit that referenced this pull request May 25, 2012

Merge pull request #397 from rvanvelzen/optimize_gzip

f834ec6

Optimize list of digits for generating identifiers for gzip compression. (fixes #142)

mishoo merged commit f834ec6 into mishoo:master May 25, 2012

nicolas-grekas mentioned this pull request Sep 2, 2012

Optimize for gzip / borrow ideas from JSqueeze mishoo/UglifyJS#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize list of digits for generating identifiers for gzip compression. (fixes #142) #397

Optimize list of digits for generating identifiers for gzip compression. (fixes #142) #397

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Optimize list of digits for generating identifiers for gzip compression. (fixes #142) #397

Optimize list of digits for generating identifiers for gzip compression. (fixes #142) #397

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!