8000 TypeError: can only concatenate list (not "NoneType") to list · Issue #644 · datalab-to/marker · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
TypeError: can only concatenate list (not "NoneType") to list #644
Closed
@rjrobben

Description

@rjrobben

Describe the bug

When processing a PDF using marker_single, a TypeError occurs during the line merging process.

Traceback

Traceback (most recent call last):
  File "/Users/xxxxx/.local/bin/marker_single", line 8, in <module>
    sys.exit(convert_single_cli())
             ^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxxxx/.local/pipx/venvs/marker-pdf/lib/python3.12/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxxxx/.local/pipx/venvs/marker-pdf/lib/python3.12/site-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/xxxxx/.local/pipx/venvs/marker-pdf/lib/python3.12/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxxxx/.local/pipx/venvs/marker-pdf/lib/python3.12/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxxxx/.local/pipx/venvs/marker-pdf/lib/python3.12/site-packages/marker/scripts/convert_single.py", line 35, in convert_single_cli
    rendered = converter(fpath)
               ^^^^^^^^^^^^^^^^
  File "/Users/xxxxx/.local/pipx/venvs/marker-pdf/lib/python3.12/site-packages/marker/converters/pdf.py", line 154, in __call__
    document = self.build_document(filepath)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxxxx/.local/pipx/venvs/marker-pdf/lib/python3.12/site-packages/marker/converters/pdf.py", line 149, in build_document
    processor(document)
  File "/Users/xxxxx/.local/pipx/venvs/marker-pdf/lib/python3.12/site-packages/marker/processors/line_merge.py", line 130, in __call__
    self.merge_lines(lines, block)
  File "/Users/xxxxx/.local/pipx/venvs/marker-pdf/lib/python3.12/site-packages/marker/processors/line_merge.py", line 104, in merge_lines
    line.merge(other_line)
  File "/Users/xxxxx/.local/pipx/venvs/marker-pdf/lib/python3.12/site-packages/marker/schema/text/line.py", line 99, in merge
    self.structure = self.structure + other.structure
                     ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
TypeError: can only concatenate list (not "NoneType") to list

Cause

The error occurs in the merge method of the Line class (marker/schema/text/line.py). The line self.structure = self.structure + other.structure attempts to concatenate the structure attributes directly. If either self.structure or other.structure is None, this results in the observed TypeError.

Proposed Fix

Modify the merge method to handle potential None values by treating them as empty lists before concatenation:

    def merge(self, other: "Line"):
        self.polygon = self.polygon.merge([other.polygon])
        # Handle potential None values for structure
        self_structure = self.structure if self.structure is not None else []
        other_structure = other.structure if other.structure is not None else []
        self.structure = self_structure + other_structure
        if self.formats is None:
            self.formats = other.formats
        elif other.formats is not None:
            self.formats = list(set(self.formats + other.formats))

I am not sure whether the fix is acceptable for the original intended purpose of merge.

Environment (if relevant)

  • marker-pdf version: (Please add the version you are using)
  • Python version: 3.12
  • OS: macOS Sonoma

Additional context

This error was encountered while processing a microsoft word converted pdf, the documents are quite dense with text.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0