memory optimization

Hi
I'm doing some work on duff because I found it useful when fixing broken rsnapshot repositories (I will make some pull request in few days). Unfortunately such repositories are a bit unusual (millions of files, mostly hardlinked in groups of 30-50).

It seems that I'm having problem with large buckets (long lists): because each sampled file allocates 4KB of data that is going to be free at the end of bucket processing - I'm getting "out of memeory" errors at 3GB of memory allocated (because the box is light 32-bit atom-based system).

As sizeof(FileList) == 12 I see no problem increasing HASH_BITS to 16 (~800KB) or even 20 (~13MB).
I wonder what you think - if it's a good idea to add an option to make it runtime-configurable?

Another idea is to replace (optionally?) sample with some simple fast running checksum (crc64?).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions