10000 Layoutlm docs 2 [force ci] by JaMe76 · Pull Request #57 · deepdoctection/deepdoctection · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Layoutlm docs 2 [force ci] #57

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Sep 12, 2022
Merged

Layoutlm docs 2 [force ci] #57

merged 4 commits into from
Sep 12, 2022

Conversation

JaMe76
Copy link
Contributor
@JaMe76 JaMe76 commented Sep 12, 2022

This pull request adds some docs for LayoutLM:

  • How LayoutLM performs on sequence classification, if the dataset is not related to CDIP. The note added shows how to prepare a dataset of PDF documents, how to extract text with pdfplumber and how to train a LayoutLM classifier.
  • How to add a visual backbone for a fine tuning tasks. By adding visual features of cropped word bounding boxes to the final classifier layer one can slightly increase f1 results.

@JaMe76 JaMe76 self-assigned this Sep 12, 2022
@JaMe76 JaMe76 merged commit d20b475 into master Sep 12, 2022
@JaMe76 JaMe76 deleted the layoutlm_docs_2 branch November 15, 2022 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
0