8000 Concerns on whitespace emitting and parsing · Issue #113 · qcam/saxy · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Concerns on whitespace emitting and parsing #113
Open
@jekku

Description

@jekku

Context

We use Saxy to parse XML files from several APIs.

There have been changes introduced to emit whitespaces correctly, connected to [this issue and its pull request](#51).

Concerns

For our first concern related to the above, we noticed a reduction in performance when using the latest version of Saxy (v1.4.0), as opposed to a fork based on v0.9.1.

We did some digging to check whether it was changes on our end, or if it was a performance regression in this library.

We observed that the generated SimpleForm data from a pretty-printed XML, using the latest version of saxy is double the file size from the original one. Very likely caused or correlated by the emitted / parsed whitespaces.

Running benchmarks between SimpleForm results coming from both versions, we found that parsed / emitted whitespaces cause some performance regressions.

Operating System: macOS
CPU Information: Apple M1 Pro
Number of Available Cores: 10
Available memory: 16 GB
Elixir 1.13.4
Erlang 25.0.4

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 5 s
parallel: 5
inputs: none specified
Estimated total run time: 24 s

Benchmarking new_saxy_version ...
Benchmarking old_saxy_version ...

Name                       ips        average  deviation         median         99th %
old_saxy_version          4.33      231.03 ms     ±3.94%      229.68 ms      257.67 ms
new_saxy_version          4.22      237.09 ms     ±4.67%      234.80 ms      265.52 ms

Comparison:
old_saxy_version          4.33
new_saxy_version          4.22 - 1.03x slower +6.05 ms

Reduction count statistics:

Name                     average  deviation         median         99th %
old_saxy_version         20.86 M     ±0.04%        20.86 M        20.87 M
new_saxy_version         24.84 M     ±0.03%        24.84 M        24.85 M

Comparison:
old_saxy_version         20.86 M
new_saxy_version         24.84 M - 1.19x reduction count +3.98 M

The second concern is about whitespace values within tags. Previously, an XML in the shape of:

<Body> <Value> </Value> </Body>

Would result into the contents of Value being parsed as nil. In the current build, it is parsed as a string containing whitespace.

In a previous issue mentioned in the beginning, this should only happen when certain attributes are provided. Like so:

<Body> <Value xml:space="preserve"> </Value> </Body>

Proposed solutions

  • We should only emit whitespaces when xml:space="preserve" is provided. By default whitespaces should not be emitted.
  • Alternatively, providing an option to ignore whitespaces might be a way to solve it, such that we can do calls similar to these:
Saxy.parse_stream(stream, Saxy.SimpleForm.Handler, [emit_whitespaces?: false])

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0