8000 Support rowspans and colspans in grid tables by tarleb · Pull Request #8202 · jgm/pandoc · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Support rowspans and colspans in grid tables #8202

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 30, 2022
Merged

Conversation

tarleb
Copy link
Collaborator
@tarleb tarleb commented Jul 26, 2022

No description provided.

@tarleb
Copy link
Collaborator Author
tarleb commented Jul 26, 2022

This is unfinished work, so this PR is currently more of a preview and an RFC. The new parser doesn't yet support some Markdown features like column alignment, so it currently only works for reStructuredText.

The parser is a quite verbatim reimplementation of the algorithm used in rst, with all its pros and cons. I'm going to put it into a new package gridtables in the hope that it should be reasonably easy to reuse the library in commonmark-extensions.

@tarleb tarleb force-pushed the gridtables branch 2 times, most recently from 7644d3b to 3a81809 Compare July 28, 2022 11:43
@tarleb tarleb marked this pull request as ready for review July 28, 2022 11:45
@tarleb tarleb force-pushed the gridtables branch 2 times, most recently from 993a2b8 to a39ad7a Compare July 28, 2022 12:07
@tarleb
Copy link
Collaborator Author
tarleb commented Jul 28, 2022

This should be good to go here (on the pandoc side), but I'll continue with more improvements in the gridtables package. See this project for my todo list.

@jgm
Copy link
Owner
jgm commented Jul 28, 2022

Cool! I wasn't quite clear from the above what the status of column alignments is in markdown grid tables. We don't want to drop support for them if they're already supported. Maybe they're now supported by your gridtables? I guess we have tests already that test this feature?

Other question is about support in the markdown writer. I guess we still don't have that, correct?
Is this something you plan to add to gridtables?

@tarleb
Copy link
Collaborator Author
tarleb commented Jul 28, 2022

Yes, column alignments are supported. The change is almost completely backwards compatible, with two exceptions:

  1. If there are multiple blocks in a cell then Para are not converted to Plain; they remain as Para elements. There was one test case that I changed because of this.
  2. Zero-width characters are treated like half-width chars. This is consistent with the behavior in docutils RST, but differs from pandoc's current parser, which treats those as having no width.

Side note on character widths: neither docutils nor pandoc handle the table below as one might expect, with docutils failing and pandoc producing spurious results.

+----+---+---+
|Fo👨‍🔬|Bar| 1 |
+----+---+---+

I do plan to add writer support as well. I'd like to do:

  1. Add support for doclayout-based output to gridtables.
  2. Create another separate package text-tables (or maybe textual-tables) that comes with a typeclass and unified interfaces for different table representations.
  3. Write the code to connect pandoc to these new types.

My hope here is to reduce the number of table conversions we need to do in our code. Appart from pandoc's default Table constructor, we also have AnnotatedTable, the (unfortunately named) GridTable for docx output, OrgTable, and possibly more -- I hope to move some of these to external packages and find a common interface.

It may take quite a few weeks though, due to various circumstances.

@jgm
Copy link
Owner
jgm commented Jul 28, 2022

Zero-width characters are treated like half-width chars. This is consistent with the behavior in docutils RST, but differs from pandoc's current parser, which treats those as having no width.

Can you clarify? Is this because treating them as 0-width will mess up RST parsing, or for other reasons?

+----+---+---+
|Fo👨‍🔬|Bar| 1 |
+----+---+---+

Can you explain why this isn't handled properly? Is it because of a bad width calculation for the emoji, and if so, maybe we can fix this? (I know that we recently did a lot of work getting proper widths from the emojis library, but maybe we haven't integrated this into pandoc?)

ALthough it's not ideal to have reader but not writer support (round-trip will break, for example), it's probably okay for the moment to merge this.

@tarleb
Copy link
Collaborator Author
tarleb commented Jul 28, 2022

The problem with zero-width characters is the cell-tracing algorithm used by rst: the input is treated as an array of characters, and the algorithm then walks the columns and rows of that array to trace out the table cells. Adding placeholders into the array for wide characters is easy, but handling zero-width chars isn't.

Can you explain why this isn't handled properly? Is it because of a bad width calculation for the emoji, and if so, maybe we can fix this? (I know that we recently did a lot of work getting proper widths from the emojis library, but maybe we haven't integrated his into pandoc?)

Oh, I didn't know that was available! Yes, the problem is the width calculation, as the emoji consists of three characters (two emoji and the zero-width joiner). Emojis like 👨‍👩‍👦 contain five chars, etc. I'm not sure if its reasonable, but I was thinking about treating all characters connected with a zwj as a single character that has the width of the widest character that's being joined.

@jgm
Copy link
Owner
jgm commented Jul 28, 2022

Actually the changes I was thinking about are in doclayout -- which should provide accurate widths for emojis now.
See the changelog.
https://hackage.haskell.org/package/doclayout-0.4/changelog
There's a function in there to detect zero-width joiners.

@tarleb tarleb force-pushed the gridtables branch 2 times, most recently from a05b874 to 48fea3d Compare July 29, 2022 08:55
@jgm
Copy link
Owner
jgm commented Jul 29, 2022

Let me know when you consider this ready to merge.

@tarleb
Copy link
Collaborator Author
tarleb commented Jul 29, 2022

If the different behavior with regard to zero-width characters is acceptable for now, then I think we're good to go.

To be honest, I'm still a bit nervous as the gridtables package makes heavy use of array manipulations. It feels fragile.¹ But, given that all tests pass, I'm confident that it won't cause any serious problems.

¹ Maybe Liquid Haskell and dependent types can solve that.

@jgm
Copy link
Owner
jgm commented Jul 29, 2022

I am kind of nervous about the zero-width character issue.

They can be used not just with emojis, but, e.g., to defeat ligatures.

And it seems that there are some languages in which they are used more widely.
https://ptiglobal.com/2018/04/26/the-beauty-of-unicode-zero-width-characters/

There's also stuff like the text presentation selector, FE0E, which we use in pandoc's HTML writer to prevent macos from rendering the footnote backreferences as huge emojis.

Is it possible to modify the RST algorithm somehow to allow for zero-width code points?

Perhaps a simple modification would be to include each zero-width code point in a cell with the following (or maybe preceding) code point? That's sort of like what you were suggesting for the emojis, above, but generalized.

@tarleb
Copy link
Collaborator Author
tarleb commented Jul 29, 2022

👍 I'll give that a try.

tarleb added 3 commits July 30, 2022 11:27
The functions `gridTableWith` and `gridTableWith'` no longer takes a
boolean argument that toggles whether a table head should be parsed:
both, tables with heads and without heads, are always accepted now.
Grid tables in Markdown, reStructuredText, and Org can now contain cells
spanning over multiple columns and/or multiple rows; table headers
containing multiple rows are supported as well.
@tarleb
Copy link
Collaborator Author
tarleb commented Jul 30, 2022

OK, I've pushed some new additional tests with wide and zero-width characters to master. The updated gridtables version passes these tests now.

@jgm
Copy link
Owner
jgm commented Jul 30, 2022

Excellent, thanks!

@jgm jgm merged commit c015c35 into jgm:master Jul 30, 2022
@jgm
Copy link
Owner
jgm commented Jul 30, 2022

Sorry, I didn't mean to squash your commits.

@tarleb
Copy link
Collaborator Author
tarleb commented Jul 30, 2022

🥳 🎉

No worries about squashing, I only added a second commit to highlight the API change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0