-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Support rowspans and colspans in grid tables #8202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is unfinished work, so this PR is currently more of a preview and an RFC. The new parser doesn't yet support some Markdown features like column alignment, so it currently only works for reStructuredText. The parser is a quite verbatim reimplementation of the algorithm used in rst, with all its pros and cons. I'm going to put it into a new package |
7644d3b
to
3a81809
Compare
993a2b8
to
a39ad7a
Compare
This should be good to go here (on the pandoc side), but I'll continue with more improvements in the gridtables package. See this project for my todo list. |
Cool! I wasn't quite clear from the above what the status of column alignments is in markdown grid tables. We don't want to drop support for them if they're already supported. Maybe they're now supported by your gridtables? I guess we have tests already that test this feature? Other question is about support in the markdown writer. I guess we still don't have that, correct? |
Yes, column alignments are supported. The change is almost completely backwards compatible, with two exceptions:
Side note on character widths: neither docutils nor pandoc handle the table below as one might expect, with docutils failing and pandoc producing spurious results.
I do plan to add writer support as well. I'd like to do:
My hope here is to reduce the number of table conversions we need to do in our code. Appart from pandoc's default Table constructor, we also have AnnotatedTable, the (unfortunately named) GridTable for docx output, OrgTable, and possibly more -- I hope to move some of these to external packages and find a common interface. It may take quite a few weeks though, due to various circumstances. |
Can you clarify? Is this because treating them as 0-width will mess up RST parsing, or for other reasons?
Can you explain why this isn't handled properly? Is it because of a bad width calculation for the emoji, and if so, maybe we can fix this? (I know that we recently did a lot of work getting proper widths from the emojis library, but maybe we haven't integrated this into pandoc?) ALthough it's not ideal to have reader but not writer support (round-trip will break, for example), it's probably okay for the moment to merge this. |
The problem with zero-width characters is the cell-tracing algorithm used by rst: the input is treated as an array of characters, and the algorithm then walks the columns and rows of that array to trace out the table cells. Adding placeholders into the array for wide characters is easy, but handling zero-width chars isn't.
Oh, I didn't know that was available! Yes, the problem is the width calculation, as the emoji consists of three characters (two emoji and the zero-width joiner). Emojis like 👨👩👦 contain five chars, etc. I'm not sure if its reasonable, but I was thinking about treating all characters connected with a zwj as a single character that has the width of the widest character that's being joined. |
Actually the changes I was thinking about are in doclayout -- which should provide accurate widths for emojis now. |
a05b874
to
48fea3d
Compare
Let me know when you consider this ready to merge. |
If the different behavior with regard to zero-width characters is acceptable for now, then I think we're good to go. To be honest, I'm still a bit nervous as the gridtables package makes heavy use of array manipulations. It feels fragile.¹ But, given that all tests pass, I'm confident that it won't cause any serious problems. ¹ Maybe Liquid Haskell and dependent types can solve that. |
I am kind of nervous about the zero-width character issue. They can be used not just with emojis, but, e.g., to defeat ligatures. And it seems that there are some languages in which they are used more widely. There's also stuff like the text presentation selector, FE0E, which we use in pandoc's HTML writer to prevent macos from rendering the footnote backreferences as huge emojis. Is it possible to modify the RST algorithm somehow to allow for zero-width code points? Perhaps a simple modification would be to include each zero-width code point in a cell with the following (or maybe preceding) code point? That's sort of like what you were suggesting for the emojis, above, but generalized. |
👍 I'll give that a try. |
The functions `gridTableWith` and `gridTableWith'` no longer takes a boolean argument that toggles whether a table head should be parsed: both, tables with heads and without heads, are always accepted now.
Grid tables in Markdown, reStructuredText, and Org can now contain cells spanning over multiple columns and/or multiple rows; table headers containing multiple rows are supported as well.
OK, I've pushed some new additional tests with wide and zero-width characters to master. The updated gridtables version passes these tests now. |
Excellent, thanks! |
Sorry, I didn't mean to squash your commits. |
🥳 🎉 No worries about squashing, I only added a second commit to highlight the API change. |
No description provided.