8000 EMLParser is slow to process large EML documents · Issue #1 · NCEAS/eml · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
EMLParser is slow to process large EML documents #1
Closed
@csjx

Description

@csjx

The org.ecoinformatics.eml.EMLParser does not perform well when processing large EML documents (for instance, a document with 250 to 1000 attribute fully fleshed out elements defined). It can take 10, 30, 45 or more minutes to validate a document -- the duration scales with document size.

To try to alleviate this, change the parser to use a SAX-based model rather than a DOM.

org.ecoinformatics.eml.EMLParser uses two methods to validate a document: parseKeys() and parseKeyrefs(), both of which call getPathContent() and pass in an XPath selector. getPathContent() creates a DOM and passes back an org.w3.dom.NodeList.

See the attached file as an example.

eml250.xml.txt

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0