8000 GitHub - rsp/git-experiments: Some experiments with git commits, hashes and collisions
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

rsp/git-experiments

Repository files navigation

git-experiments

https://github.com/rsp/git-experiments

Some experiments performed on Git repositories, originally created to to add some experimental data in my answer on Stack Overflow to: How safely can I assume unicity of a part of SHA1 hash?

This is still work in progress and is not finished yet.

Prerequisites

It should work on any standard Linux/Unix distribution. If it doesn't, please sumbit an issue.

It was written and tested on Ubuntu 14.04.

Commands

Basic

list-hashes

Usage: ./list-hashes DIRECTORY

Example: ./list-hashes .

limit-digits

Usage: cat hashes.txt | ./limit-digits NUMBER

Example: ./list-hashes . | ./limit-digits 7

Complex

list-collisions

List hash collisions for a certain number of digits (7 by default) in a given repository.

Usage: ./list-collisions DIRECTORY [DIGITS]

where DIRECTORY is a path to a Git repository (a dot for CWD) and DIGITS is the hash length.

Example:

git clone https://github.com/jquery/jquery.git
./list-collisions jquery 6

count-collisions

Count the number of hash collisions for every hash length for a given repository.

Usage: ./count-collisions DIRECTORY

where DIRECTORY is a path to a Git repository (a dot for CWD).

git clone https://github.com/jquery/jquery.git
./count-collisions jquery

Examples

Clone PostgreSQL repository to check for interesting data:

$ git clone ttps://github.com/postgres/postgres.git
...
$

Count the collisions for all hash lengths:

$ ./count-collisions postgres
count-collisions from git-experiments
https://github.com/rsp/git-experiments
by Rafał Pocztarski - https://github.com/rsp

Directory:	postgres
Repository:	https://github.com/postgres/postgres.git
GitHub page:	https://github.com/postgres/postgres
Total objects:	49405

Digits: 1, Unique: 16, Colliding: 49405, Noncolliding: 0
Digits: 2, Unique: 256, Colliding: 49405, Noncolliding: 0
Digits: 3, Unique: 4096, Colliding: 49405, Noncolliding: 0
Digits: 4, Unique: 34863, Colliding: 25935, Noncolliding: 23470
Digits: 5, Unique: 48304, Colliding: 2185, Noncolliding: 47220
Digits: 6, Unique: 49334, Colliding: 141, Noncolliding: 49264
Digits: 7, Unique: 49404, Colliding: 2, Noncolliding: 49403
Digits: 8, Unique: 49405, Colliding: 0, Noncolliding: 49405
$

Look for those 2 colliding hashes with 7-digit hashes:

$ ./list-collisions postgres 7

list-collisions from git-experiments
https://github.com/rsp/git-experiments
by Rafał Pocztarski - https://github.com/rsp

Directory:	postgres
Repository:	https://github.com/postgres/postgres.git
GitHub page:	https://github.com/postgres/postgres
Total objects:	49405
Using digits:	7
Unique: 49404, Colliding: 2, Noncolliding: 49403

Hashes that collide with first 7 digits:
aaeef4d17db9ded501fa02c9ca6c00f86258b171
aaeef4dae843fc9ecab7e6989e3015fda7cbc826

GitHub commit URLs of hashes that collide with first 7 digits:
https://github.com/postgres/postgres/commit/aaeef4d17db9ded501fa02c9ca6c00f86258b171
https://github.com/postgres/postgres/commit/aaeef4dae843fc9ecab7e6989e3015fda7cbc826
$

Interesting data

Some example repositories:

git clone https://github.com/rsp/git-experiments.git # 13 commits
git clone https://github.com/jquery/jquery.git # 6593 commits
git clone https://github.com/twbs/bootstrap.git # 11232 commits
git clone https://github.com/postgres/postgres.git # 49405 commits
git clone https://github.com/gcc-mirror/gcc.git # 188174 commits
git clone https://github.com/torvalds/linux.git # 506320 commits

(numbers of commits as of 2015-03-13)

Note: Use the https protocol to get GitHub links in the output.

Example data from 2015-03-13:

Project Total 1DC 2DC 3DC 4DC 5DC 6DC 7DC 8DC 9DC 10DC
jquery 6593 6593 6593 5231 590 41 4 0 0 0 0
bootstrap 11232 11232 11232 10519 1786 118 6 0 0 0 0
postgres 49405 49405 49405 49405 25935 2185 141 2 0 0 0
gcc 188174 188174 188174 188174 177476 30726 2092 144 10 0 0
linux 506320 506320 506320 506320 506068 193608 14947 924 54 0 0

(Total - total number of hashes, NDC - N-digit collisions)

A number of collisions is defined as the total number of hashes that collide with some other hashes (e.g. 3 hashes with the same prefix are counted as 3 colliding hashes).

Conclusions

I tested few popular Git repositories of various sizes and searched for collisions of the commit hashes prefixes of different sizes - this is if you only count commits and not other objects.

jQuery has two pairs of colliding 6-digit commit hashes:

so eg. this is ambiguous:

and gives 404 even though a shorter one:

works fine (at least for now).

Bootstrap has 3 pairs of 6-digit collisions, e.g.:

PostgreSQL has a 7-digit collision:

And Linux has 54 colliding 8-digit commit hashes, e.g.:

When you get all of the objects with git rev-list --all --objects and not only git rev-list --all then you get even more collisions (but it takes much longer). For example here are two pairs of colliding 11-digit hashes in the Linux repo:

d597639e2036f04f0226761e2d818b31f2db7820
d597639e203a100156501df8a0756fd09573e2de
ef91b6e893a00d903400f8e1303efc4d52b710af
ef91b6e893afc4c4ca488453ea9f19ced5fa5861

so for example:

$ git show d597639e203
error: short SHA1 d597639e203 is ambiguous.
error: short SHA1 d597639e203 is ambiguous.
fatal: ambiguous argument 'd597639e203': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
$

Author

Rafał Pocztarski - https://github.com/rsp

License

MIT License (Expat). See LICENSE.md for details.

About

Some experiments with git commits, hashes and collisions

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0