8000 On the pptx type, Docwire is much worse than DocToText. What is the reason for this? · Issue #128 · docwire/docwire · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

On the pptx type, Docwire is much worse than DocToText. What is the reason for this? #128

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
UserZhangXiaoZhe opened this issue Jun 3, 2024 · 2 comments

Comments

@UserZhangXiaoZhe
Copy link
image

Docwire:docwire-2024.04.04/arm64-osx-dynamic
DocToText:The last one

@as-ascii
Copy link
Contributor

Thank you for analysis. Could you please recheck with the latest code? There were a lot of optimalisations introduced in this release:
https://github.com/docwire/docwire/releases/tag/2024.06.19
I addition please let us know what is the source of data for docwire, is this a CLI run or your C++ application? If an application than what is the source type for processing chain, a file path or std::istream?
Format detection is faster if file name with extension is passed, but starting from this release:
https://github.com/docwire/docwire/releases/tag/2024.06.24
there is a possibility to pass a memory buffer or stream together with a file extension to send a hint about file format to DocWire SDK. In this scenario parser matching file extension will be tried as first one and detection based on binary data is performed only if it fails.

@as-ascii
Copy link
Contributor

Hello,

Were you able to retest using the newest version of DocWire? In addition: are the test files confidential? If not, could you please attach them (especially that largest file) here so we can analyze the issue in our environment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0