8000 Add configs for DPI, Page Segmentation Mode, and Zotero non-linked attachments by danpf · Pull Request #41 · UB-Mannheim/zotero-ocr · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Add configs for DPI, Page Segmentation Mode, and Zotero non-linked attachments #41

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

danpf
Copy link
@danpf danpf commented Jun 29, 2022

Thought I'd take a stab at this.

Added outputDPI and outputAsCopyAttachment as configuration options.

It seems to work, but I'm unable to get it to work with group libraries - do you have any idea why that might be?
briefly:

It works when I have a pdf selected on my personal 'My library' sub-collection, but when I use it on something selected in a sub-collection in my 'group library' I get errors like (below). The errors happen with the zotero-ocr plugin as well so maybe I shouldn't be basing my logic off that plugin and that's my problem.

[JavaScript Error: "Parent item 1/4Q5DY97J not found" {file: "chrome://zotero/content/xpcom/data/item.js" line: 1537}]

My guess is that for some reason in group libraries parents are mangled in the database, but I'm not sure how to check or confirm.
because the code to me appears correct and this line
https://github.com/danpf/zotero-ocr/blob/9eb9a8ec9a5ada40be27d07ca6de847637c14d2b/chrome/content/zoteroocr.js#L105 seems to be returning the right stuff.

I made a post in zotero dev about the issue but didn't get a response:
https://groups.google.com/g/zotero-dev/c/LVmcjIMqYvA

@stweil stweil changed the title [WIP] Add configs for DPI + copyattacments [WIP] Add configs for DPI + copyattachments Mar 6, 2023
@danpf
Copy link
Author
danpf commented Mar 11, 2023

Not sure if you are interested in this @stweil

but I got a response from the Zotero devs, and was able to get this PR fixed for Group Library + 'hard' attachments. Their API is currently incompatible with linked attachments in the Group Libraries section. I think it only would make sense for them to implement that in the context of network drives, so they probably won't address that.

Docs:
This PR adds 3 new options to ZoteroOCR

  • The ability to modify the output DPI
    • The default is set to 300
  • The ability to modify the Tesseract Page Segmentation Mode (PSM)
  • The ability to add the new PDFs as attachments rather than 'linked files'

I have confirmed that this PR works on an M1 macbook, and here is a new screenshot of the settings panel
image

If you would be interested in merging, please confirm that it works on your device as well. I don't normally touch JS.

Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR. (not implemented)
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.
 11    Sparse text. Find as much text as possible in no particular order.
 12    Sparse text with OSD.
 13    Raw line. Treat the image as a single text line,
       bypassing hacks that are Tesseract-specific.

@danpf danpf changed the title [WIP] Add configs for DPI + copyattachments Add configs for DPI, Page Segmentation Mode, and Zotero non-linked attachments Mar 11, 2023
@stweil
Copy link
Member
stweil commented Aug 28, 2024

@danpf, thanks for your contribution. I am afraid that the ability to add the new PDFs as attachments rather than 'linked files' causes a conflict with the latest commit b61ca1f which addresses the same issue as far as I see, but does not support a user choice.

I think it would be good to solve this first and handle DPI and PSM for Tesseract in an extra pull request. We could either revert @aborel's commit and use your code or try to update your code based on Git master. I don't know which way will be better.

@aborel
Copy link
Collaborator
aborel commented Aug 28, 2024

I'm sorry about the conflict, too! I think the proposed configurable settings are a valuable addition, and I'd like to integrate it through whatever path will be seen as most convenient.
Since my own code has already been pushed as version 0.8.0 to an unknown number of users, and considering that the pull request only applies to Zotero 6 (a Zotero 7 version is needed, perhaps even more now that v7 is out of beta), I have a small preference for updating the PR code. I'm willing to adapt @danpf's code outside of a PR if necessary.

@danpf
Copy link
Author
danpf commented Sep 3, 2024

My current company doesn't use zotero, so feel free to adapt this as you see fit. I don't really have the time to update it unfortunately, as it has been more than a year since I looked at this.

@aborel
Copy link
Collaborator
aborel commented Sep 3, 2024

Thanks for the PR, and thanks for taking the time to reply here!
I have some available time to work on this, so I'm volunteering to integrate the proposed code into the main repository and adapt it to Zotero7. I will perhaps leave the PR aside and work manually, but the contribution is very much appreciated nonetheless :-)

8000
@aborel aborel self-assigned this Sep 3, 2024
aborel pushed a commit that referenced this pull request Sep 7, 2024
@aborel
Copy link
Collaborator
aborel commented Sep 7, 2024

e3cf968 integrates the proposed code for Zotero7 and Zotero6. I also noticed that I had broken 0.8.0 for Zotero6 users, I have fixed that as well.

@stweil
Copy link
Member
stweil commented Sep 7, 2024

Should I prepare a new release 0.8.1?

@aborel
Copy link
Collaborator
aborel commented Sep 7, 2024

I have updated the README now, so I guess we're ready for the new release :-)

@stweil
Copy link
Member
stweil commented Sep 7, 2024

Thank you @danpf and @aborel. The new release 0.8.1 includes the new features, so I close this pull request here.

@stweil stweil closed this Sep 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0