-
Notifications
You must be signed in to change notification settings - 8000 Fork 159
Windows compatibility of Deepdoctection #354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Supporting Windows on one hand side would be nice. On other problem regarding Pymupdf is its licence: AGPL would require to change this projects license and prevent building anything that's basically beyond Open Source. The current trend is that most parsers have a restricted license and I do not want to follow this trend. Poppler is mainly used for converting pdf bytes into numpy arrays. One alternative which seems to provide this, is Pypdfmium2. In order to keep the amount of dependencies low one should let users install Pypdfmium2 by themselves. One could than extend
|
Okay got it, should I do a PR for contributing or you would like to assign this task to the existing contributors? Other feature enhancement can be classifying, detecting and recognizing handwritten texts in the images. TrOCR or any better lite weight models. I have seen into dOctr that they are also trying to work on handwritten text but have not any progress since 2022. Can we do this by integrating third-party models? |
You can work on a PR. With respect to TrOCR, I do not want to add the model (yet) as it will require a hand writing text detector for which there is yet no satisfying model I am aware of. I am much more in favor for trying Kosmos 2.5 once it has been added to the transformers library. |
Kosmos 2.5 seems very large model to run on lower specification servers or workstations. Around 5.5 GB. |
I am looking into the deepdoctection dependency list I have found out that there is popler & Detectron2 are stopping deepdoctection from running on windows without a docker file.
Pymupdf is independent from popler to work on which is being used to convert pdf into images.(img2pdf alternative.)
Detectron2 can be installed into Windows attaching a reference over here. https://medium.com/@yogeshkumarpilli/how-to-install-detectron2-on-windows-10-or-11-2021-aug-with-the-latest-build-v0-5-c7333909676f , https://dev.to/reckon762/how-to-install-detectron2-on-windows-3hil
Is it possible to do it? @JaMe76 I am open to contributing.
The text was updated successfully, but these errors were encountered: