Support rowspans and colspans in grid tables #8202

tarleb · 2022-07-26T13:20:04Z

No description provided.

tarleb · 2022-07-26T13:24:16Z

This is unfinished work, so this PR is currently more of a preview and an RFC. The new parser doesn't yet support some Markdown features like column alignment, so it currently only works for reStructuredText.

The parser is a quite verbatim reimplementation of the algorithm used in rst, with all its pros and cons. I'm going to put it into a new package gridtables in the hope that it should be reasonably easy to reuse the library in commonmark-extensions.

tarleb · 2022-07-28T12:07:22Z

This should be good to go here (on the pandoc side), but I'll continue with more improvements in the gridtables package. See this project for my todo list.

jgm · 2022-07-28T16:40:01Z

Cool! I wasn't quite clear from the above what the status of column alignments is in markdown grid tables. We don't want to drop support for them if they're already supported. Maybe they're now supported by your gridtables? I guess we have tests already that test this feature?

Other question is about support in the markdown writer. I guess we still don't have that, correct?
Is this something you plan to add to gridtables?

tarleb · 2022-07-28T19:20:35Z

Yes, column alignments are supported. The change is almost completely backwards compatible, with two exceptions:

If there are multiple blocks in a cell then Para are not converted to Plain; they remain as Para elements. There was one test case that I changed because of this.
Zero-width characters are treated like half-width chars. This is consistent with the behavior in docutils RST, but differs from pandoc's current parser, which treats those as having no width.

Side note on character widths: neither docutils nor pandoc handle the table below as one might expect, with docutils failing and pandoc producing spurious results.

+----+---+---+
|Fo👨‍🔬|Bar| 1 |
+----+---+---+

I do plan to add writer support as well. I'd like to do:

Add support for doclayout-based output to gridtables.
Create another separate package text-tables (or maybe textual-tables) that comes with a typeclass and unified interfaces for different table representations.
Write the code to connect pandoc to these new types.

My hope here is to reduce the number of table conversions we need to do in our code. Appart from pandoc's default Table constructor, we also have AnnotatedTable, the (unfortunately named) GridTable for docx output, OrgTable, and possibly more -- I hope to move some of these to external packages and find a common interface.

It may take quite a few weeks though, due to various circumstances.

jgm · 2022-07-28T19:28:24Z

Zero-width characters are treated like half-width chars. This is consistent with the behavior in docutils RST, but differs from pandoc's current parser, which treats those as having no width.

Can you clarify? Is this because treating them as 0-width will mess up RST parsing, or for other reasons?

+----+---+---+
|Fo👨‍🔬|Bar| 1 |
+----+---+---+

Can you explain why this isn't handled properly? Is it because of a bad width calculation for the emoji, and if so, maybe we can fix this? (I know that we recently did a lot of work getting proper widths from the emojis library, but maybe we haven't integrated this into pandoc?)

ALthough it's not ideal to have reader but not writer support (round-trip will break, for example), it's probably okay for the moment to merge this.

tarleb · 2022-07-28T19:44:55Z

The problem with zero-width characters is the cell-tracing algorithm used by rst: the input is treated as an array of characters, and the algorithm then walks the columns and rows of that array to trace out the table cells. Adding placeholders into the array for wide characters is easy, but handling zero-width chars isn't.

Can you explain why this isn't handled properly? Is it because of a bad width calculation for the emoji, and if so, maybe we can fix this? (I know that we recently did a lot of work getting proper widths from the emojis library, but maybe we haven't integrated his into pandoc?)

Oh, I didn't know that was available! Yes, the problem is the width calculation, as the emoji consists of three characters (two emoji and the zero-width joiner). Emojis like 👨‍👩‍👦 contain five chars, etc. I'm not sure if its reasonable, but I was thinking about treating all characters connected with a zwj as a single character that has the width of the widest character that's being joined.

jgm · 2022-07-28T21:09:25Z

Actually the changes I was thinking about are in doclayout -- which should provide accurate widths for emojis now.
See the changelog.
https://hackage.haskell.org/package/doclayout-0.4/changelog
There's a function in there to detect zero-width joiners.

jgm · 2022-07-29T17:42:50Z

Let me know when you consider this ready to merge.

tarleb · 2022-07-29T19:52:08Z

If the different behavior with regard to zero-width characters is acceptable for now, then I think we're good to go.

To be honest, I'm still a bit nervous as the gridtables package makes heavy use of array manipulations. It feels fragile.¹ But, given that all tests pass, I'm confident that it won't cause any serious problems.

¹ Maybe Liquid Haskell and dependent types can solve that.

jgm · 2022-07-29T21:39:30Z

I am kind of nervous about the zero-width character issue.

They can be used not just with emojis, but, e.g., to defeat ligatures.

And it seems that there are some languages in which they are used more widely.
https://ptiglobal.com/2018/04/26/the-beauty-of-unicode-zero-width-characters/

There's also stuff like the text presentation selector, FE0E, which we use in pandoc's HTML writer to prevent macos from rendering the footnote backreferences as huge emojis.

Is it possible to modify the RST algorithm somehow to allow for zero-width code points?

Perhaps a simple modification would be to include each zero-width code point in a cell with the following (or maybe preceding) code point? That's sort of like what you were suggesting for the emojis, above, but generalized.

tarleb · 2022-07-29T21:51:26Z

👍 I'll give that a try.

The functions `gridTableWith` and `gridTableWith'` no longer takes a boolean argument that toggles whether a table head should be parsed: both, tables with heads and without heads, are always accepted now.

Grid tables in Markdown, reStructuredText, and Org can now contain cells spanning over multiple columns and/or multiple rows; table headers containing multiple rows are supported as well.

tarleb · 2022-07-30T09:39:55Z

OK, I've pushed some new additional tests with wide and zero-width characters to master. The updated gridtables version passes these tests now.

jgm · 2022-07-30T15:55:41Z

Excellent, thanks!

jgm · 2022-07-30T15:57:34Z

Sorry, I didn't mean to squash your commits.

tarleb · 2022-07-30T16:34:30Z

tarleb commented

Jul 30, 2022

🥳 🎉

No worries about squashing, I only added a second commit to highlight the API change.

tarleb force-pushed the gridtables branch 2 times, most recently from 7644d3b to 3a81809 Compare July 28, 2022 11:43

tarleb marked this pull request as ready for review July 28, 2022 11:45

tarleb force-pushed the gridtables branch 2 times, most recently from 993a2b8 to a39ad7a Compare July 28, 2022 12:07

tarleb force-pushed the gridtables branch 2 times, most recently from a05b874 to 48fea3d Compare July 29, 2022 08:55

tarleb added 3 commits July 30, 2022 11:27

Add tests for zero-width and fullwidth chars in grid tables

de5620b

T.P.Parsing: simplify gridTableWith', gridTableWith [API Change]

4638c09

The functions `gridTableWith` and `gridTableWith'` no longer takes a boolean argument that toggles whether a table head should be parsed: both, tables with heads and without heads, are always accepted now.

Support colspans, rowspans, and multirow headers in grid tables.

18b24d7

Grid tables in Markdown, reStructuredText, and Org can now contain cells spanning over multiple columns and/or multiple rows; table headers containing multiple rows are supported as well.

tarleb force-pushed the gridtables branch from 48fea3d to 18b24d7 Compare July 30, 2022 09:37

jgm merged commit c015c35 into jgm:master Jul 30, 2022

This was referenced Jul 30, 2022

RST reader - support new table features #6991

Closed

Table with span column and span row are not transform correctly from rst to asciidoc #7326

Open

tarleb deleted the gridtables branch July 30, 2022 16:32

This was referenced Oct 3, 2022

Generate grid tables quarto-dev/gridtables#5

Open

Documentation for grid tables and rowspan / colspan #8346

Closed

Documentation for grid tables and rowspan / colspan #8347

Closed

tumidi mentioned this pull request Oct 21, 2022

Update Table type to reflect current Pandoc API version mvhenderson/pandoc-filter-node#20

Merged

jgm mentioned this pull request Sep 17, 2023

Centering cell content in a grid table #9085

Open

jgm mentioned this pull request May 10, 2024

Unwanted rowspan in grid table #9743

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Support rowspans and colspans in grid tables #8202

Support rowspans and colspans in grid tables #8202

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Support rowspans and colspans in grid tables #8202

Support rowspans and colspans in grid tables #8202

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!