8000 feat(backup): Improve dedup algorithm to work with old backup by zatteo · Pull Request #1077 · cozy/cozy-flagship-app · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

feat(backup): Improve dedup algorithm to work with old backup #1077

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up f 8000 or GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 15, 2023

Conversation

zatteo
Copy link
Member
@zatteo zatteo commented Dec 14, 2023

Dedup in photo backup means that we do not upload pictures if they already exists in the Cozy.

To identify if two pictures are identical, we compare name and creation date.

For new backup, we compare with our own creation date added in the io.cozy.files metadata. It just works.

But in photo uploaded by old backup, we compare with the creation date that has been taken from EXIF and can where the timezone could have been badly managed. So if we compare stricly date, it may not work.

So here we compare only part of the date in dedup mode by ignoring the "hour" field :

  • its almost impossible to have a false identity by just ignoring the "hour" field => OK
  • we can miss some identity => but we accept that our dedup is not 100% accurate

Dedup in photo backup means that we do not upload pictures if they already exists in the Cozy.

To identify if two pictures are identical, we compare name and creation date.

For new backup, we compare with our own creation date added in the io.cozy.files metadata. It just works.

But in photo uploaded by old backup, we compare with the creation date that has been taken from EXIF and can  where the timezone could have been badly managed. So if we compare stricly date, it may not work.

So here we compare only part of the date in dedup mode by ignoring the "hour" field :
- its almost impossible to have a false identity by just ignoring the "hour" field => OK
- we can miss some identity => but we accept that our dedup is not 100% accurate
@zatteo zatteo requested a review from Crash-- December 14, 2023 16:33
@zatteo zatteo merged commit 859e3b2 into master Dec 15, 2023
@zatteo zatteo deleted the feat/update-dedup-algorithm branch December 15, 2023 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0