Description
Error:
pypdfium2/_helpers/document.py", line 674, in _open_pdf
raise TypeError(f"Invalid input type '{type(input_data).__name__}'")
TypeError: Invalid input type 'PdfDocument'
Operation:
1: conda create -n marker-pdf python==3.10
2. conda activate marker-pdf
3. marker_single /User/mypath/novel_2019.pdf ./output/
Log:
/opt/anaconda3/envs/marker-pdf/lib/python3.10/site-packages/transformers/utils/hub.py:123: FutureWarning: Using TRANSFORMERS_CACHE
is deprecated and will be removed in v5 of Transformers. Use HF_HOME
instead.
warnings.warn(
Loading detection model vikp/surya_det2 on device cpu with dtype torch.float32
Loading detection model vikp/surya_layout2 on device cpu with dtype torch.float32
Loading reading order model vikp/surya_order on device cpu with dtype torch.float32
Loaded texify model to cpu with torch.float32 dtype
Traceback (most recent call last):
File "/opt/anaconda3/envs/marker-pdf/bin/marker_single", line 8, in
sys.exit(main())
File "/opt/anaconda3/envs/marker-pdf/lib/python3.10/site-packages/convert_single.py", line 26, in main
full_text, images, out_meta = convert_single_pdf(fname, model_lst, max_pages=args.max_pages, langs=langs, batch_multiplier=args.batch_multiplier)
File "/opt/anaconda3/envs/marker-pdf/lib/python3.10/site-packages/marker/convert.py", line 65, in convert_single_pdf
pages, toc = get_text_blocks(
File "/opt/anaconda3/envs/marker-pdf/lib/python3.10/site-packages/marker/pdf/extract_text.py", line 85, in get_text_blocks
char_blocks = dictionary_output(doc, page_range=page_range, keep_chars=True)
File "/opt/anaconda3/envs/marker-pdf/li
5A37
b/python3.10/site-packages/pdftext/extraction.py", line 98, in dictionary_output
pages = _get_pages(pdf_path, page_range, workers=workers, flatten_pdf=flatten_pdf, quote_loosebox=quote_loosebox)
File "/opt/anaconda3/envs/marker-pdf/lib/python3.10/site-packages/pdftext/extraction.py", line 48, in _get_pages
pdf_doc = _load_pdf(pdf_path, flatten_pdf)
File "/opt/anaconda3/envs/marker-pdf/lib/python3.10/site-packages/pdftext/extraction.py", line 18, in _load_pdf
pdf = pdfium.PdfDocument(pdf)
File "/opt/anaconda3/envs/marker-pdf/lib/python3.10/site-packages/pypdfium2/_helpers/document.py", line 78, in init
self.raw, to_hold, to_close = _open_pdf(self._input, self._password, self._autoclose)
File "/opt/anaconda3/envs/marker-pdf/lib/python3.10/site-packages/pypdfium2/_helpers/document.py", line 674, in _open_pdf
raise TypeError(f"Invalid input type '{type(input_data).name}'")
TypeError: Invalid input type 'PdfDocument'
Env:
Package Version
annotated-types 0.7.0
attrs 25.3.0
certifi 2025.4.26
charset-normalizer 3.4.2
click 8.2.1
coloredlogs 15.0.1
filelock 3.18.0
filetype 1.2.0
flatbuffers 25.2.10
fsspec 2025.5.1
ftfy 6.3.1
grpcio 1.72.1
hf-xet 1.1.3
huggingface-hub 0.32.4
humanfriendly 10.0
idna 3.10
Jinja2 3.1.6
joblib 1.5.1
jsonschema 4.24.0
jsonschema-specifications 2025.4.1
marker-pdf 0.2.6
MarkupSafe 3.0.2
mpmath 1.3.0
msgpack 1.1.0
networkx 3.4.2
numpy 1.26.4
onnxruntime 1.22.0
opencv-python 4.11.0.86
packaging 25.0
pdftext 0.3.20
pillow 10.4.0
pip 25.1
protobuf 6.31.1
pydantic 2.11.5
pydantic_core 2.33.2
pydantic-settings 2.9.1
pypdfium2 4.30.1
python-dotenv 1.1.0
PyYAML 6.0.2
RapidFuzz 3.13.0
ray 2.46.0
referencing 0.36.2
regex 2024.11.6
requests 2.32.3
rpds-py 0.25.1
safetensors 0.5.3
scikit-learn 1.7.0
scipy 1.15.3
setuptools 78.1.1
surya-ocr 0.4.5
sympy 1.14.0
tabulate 0.9.0
texify 0.1.10
threadpoolctl 3.6.0
tokenizers 0.15.2
torch 2.2.2
tqdm 4.67.1
transformers 4.36.2
typing_extensions 4.14.0
typing-inspection 0.4.1
urllib3 2.4.0
wcwidth 0.2.13
wheel 0.45.1