8000 PDF parsing formatting issues · Issue #132 · docwire/docwire · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

PDF parsing formatting issues #132

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dimoqz opened this issue Jun 17, 2024 · 1 comment
Open

PDF parsing formatting issues #132

dimoqz opened this issue Jun 17, 2024 · 1 comment

Comments

@dimoqz
Copy link
85D4 dimoqz commented Jun 17, 2024

After parsing the .pdf file and converting it to text, the original formatting breaks.
Docwire 04.04.2024 version, MSVC compiler.
For example, the file from the tests was taken 3.pdf
Original pdf:
pdf

After parsing:
pdf_after

@as-ascii
Copy link
Contributor

Confirmed. I can see that it was accepted like this in automatic tests so it is not treated as error. We will analyze the issue and check what can be improved in this area.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0