8000 CLI marker_single error: pypdfium2/_helpers/document.py->Invalid input type 'PdfDocument' · Issue #736 · datalab-to/marker · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
CLI marker_single error: pypdfium2/_helpers/document.py->Invalid input type 'PdfDocument' #736
Open
@tzl

Description

@tzl

Error:

pypdfium2/_helpers/document.py", line 674, in _open_pdf
raise TypeError(f"Invalid input type '{type(input_data).__name__}'")
TypeError: Invalid input type 'PdfDocument'

Operation:

1: conda create -n marker-pdf python==3.10
2. conda activate marker-pdf
3. marker_single /User/mypath/novel_2019.pdf ./output/

Log:

/opt/anaconda3/envs/marker-pdf/lib/python3.10/site-packages/transformers/utils/hub.py:123: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
warnings.warn(
Loading detection model vikp/surya_det2 on device cpu with dtype torch.float32
Loading detection model vikp/surya_layout2 on device cpu with dtype torch.float32
Loading reading order model vikp/surya_order on device cpu with dtype torch.float32
Loaded texify model to cpu with torch.float32 dtype
Traceback (most recent call last):
File "/opt/anaconda3/envs/marker-pdf/bin/marker_single", line 8, in
sys.exit(main())
File "/opt/anaconda3/envs/marker-pdf/lib/python3.10/site-packages/convert_single.py", line 26, in main
full_text, images, out_meta = convert_single_pdf(fname, model_lst, max_pages=args.max_pages, langs=langs, batch_multiplier=args.batch_multiplier)
File "/opt/anaconda3/envs/marker-pdf/lib/python3.10/site-packages/marker/convert.py", line 65, in convert_single_pdf
pages, toc = get_text_blocks(
File "/opt/anaconda3/envs/marker-pdf/lib/python3.10/site-packages/marker/pdf/extract_text.py", line 85, in get_text_blocks
char_blocks = dictionary_output(doc, page_range=page_range, keep_chars=True)
File "/opt/anaconda3/envs/marker-pdf/li 5A37 b/python3.10/site-packages/pdftext/extraction.py", line 98, in dictionary_output
pages = _get_pages(pdf_path, page_range, workers=workers, flatten_pdf=flatten_pdf, quote_loosebox=quote_loosebox)
File "/opt/anaconda3/envs/marker-pdf/lib/python3.10/site-packages/pdftext/extraction.py", line 48, in _get_pages
pdf_doc = _load_pdf(pdf_path, flatten_pdf)
File "/opt/anaconda3/envs/marker-pdf/lib/python3.10/site-packages/pdftext/extraction.py", line 18, in _load_pdf
pdf = pdfium.PdfDocument(pdf)
File "/opt/anaconda3/envs/marker-pdf/lib/python3.10/site-packages/pypdfium2/_helpers/document.py", line 78, in init
self.raw, to_hold, to_close = _open_pdf(self._input, self._password, self._autoclose)
File "/opt/anaconda3/envs/marker-pdf/lib/python3.10/site-packages/pypdfium2/_helpers/document.py", line 674, in _open_pdf
raise TypeError(f"Invalid input type '{type(input_data).name}'")
TypeError: Invalid input type 'PdfDocument'

Env:

Package Version


annotated-types 0.7.0
attrs 25.3.0
certifi 2025.4.26
charset-normalizer 3.4.2
click 8.2.1
coloredlogs 15.0.1
filelock 3.18.0
filetype 1.2.0
flatbuffers 25.2.10
fsspec 2025.5.1
ftfy 6.3.1
grpcio 1.72.1
hf-xet 1.1.3
huggingface-hub 0.32.4
humanfriendly 10.0
idna 3.10
Jinja2 3.1.6
joblib 1.5.1
jsonschema 4.24.0
jsonschema-specifications 2025.4.1
marker-pdf 0.2.6
MarkupSafe 3.0.2
mpmath 1.3.0
msgpack 1.1.0
networkx 3.4.2
numpy 1.26.4
onnxruntime 1.22.0
opencv-python 4.11.0.86
packaging 25.0
pdftext 0.3.20
pillow 10.4.0
pip 25.1
protobuf 6.31.1
pydantic 2.11.5
pydantic_core 2.33.2
pydantic-settings 2.9.1
pypdfium2 4.30.1
python-dotenv 1.1.0
PyYAML 6.0.2
RapidFuzz 3.13.0
ray 2.46.0
referencing 0.36.2
regex 2024.11.6
requests 2.32.3
rpds-py 0.25.1
safetensors 0.5.3
scikit-learn 1.7.0
scipy 1.15.3
setuptools 78.1.1
surya-ocr 0.4.5
sympy 1.14.0
tabulate 0.9.0
texify 0.1.10
threadpoolctl 3.6.0
tokenizers 0.15.2
torch 2.2.2
tqdm 4.67.1
transformers 4.36.2
typing_extensions 4.14.0
typing-inspection 0.4.1
urllib3 2.4.0
wcwidth 0.2.13
wheel 0.45.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0