Description
Pandoc currently silently allows key-value metadata fields which are redundant or conflicting within a YAML metadata block. This allows errors or unintended metadata risk to build up over time.
I suggest that Pandoc ought to warn about there being more than 1 instance of a key within a YAML metadata block (but not across multiple YAML metadata blocks).
The current Pandoc metadata block documentation doesn't say much about how fields are required to be unique or non-overlapping, saying just
A document may contain multiple metadata blocks. If two metadata blocks attempt to set the same field, the value from the second block will be taken.
Currently, Pandoc appears to read values in like a Data.Map.fromList
operation: the last pair wins, overriding all previous key-values.
Example of a conflicting pair of key-values:
---
title: title
key: value1
key: value2
...
Hello world!
yields value2
, and no warnings:
$ xclip -o | pandoc -s -w native
Pandoc
Meta
{ unMeta =
fromList
[ ( "key" , MetaInlines [ Str "value2" ] )
, ( "title" , MetaInlines [ Str "title" ] )
]
}
[ Para [ Str "Hello" , Space , Str "world!" ] ]
(And if it's just a redundant key, like key: value2
twice, then it yields value2
, also no warnings.)
I was trying out a new site feature involving the metadata, and discovered I had ~10 instances, built up over the past decades, where a Pandoc YAML metadata field was either: 1. required but missing; 2. duplicated twice exactly; or 3. had two keys with different conflicting values (only one of which was right). Pandoc obviously hadn't warned or hinted about them at all, or else I would've fixed them long ago.
Now, enforcing required metadata is out of scope for Pandoc and should be handled by another tool like the build script, but the other 2 should be handled by Pandoc.
They have to be handled by Pandoc because it is difficult to handle them as a user. They do not appear in the Pandoc API: the unMeta
map has already erased the duplicates. So there is no way to add a check or lint easily at the Pandoc user level. It needs to be done while reading the original YAML.
One possibility would be to change the fromList
to append, but this would probably be confusing and no one would use it. So it would be more useful to just warn about the cases of redundant keys.
When might we want redundant or conflicting keys in a metadata block?
I have a hard time thinking of any legitimate usecase for having the exact same key-value pair twice in the same metadata block. It's easy to imagine that it would be useful for having redundant pairs across multiple metadata blocks, for templating or defaults, that sort of thing, but not within the same metadata block. So I think warning on a redundant pair is a very safe warning to add for what is almost certainly an oversight or error.
Conflicting keys are a little trickier. Again, you obviously might need it across blocks, but within blocks? It's possible that there are systems which try to munge metadata blocks and inject user metadata settings after a default, or vice-versa. But it seems like anything you would do like that where the redundancy/override is the goal, you could do more safely and more cleanly by using multiple metadata blocks?
For example, instead of having multiple title
or key
values in a single metadata block, just write multiple ones in a clean, compact, easier to genera
5507
te & read way, like:
---
title: Template Title
key: template-value
...
---
key: my-value
...
Hello world!
which yields the expected AST:
$ xclip -o | pandoc -s -w native
Pandoc
Meta
{ unMeta =
fromList
[ ( "key" , MetaInlines [ Str "my-value" ] )
, ( "title"
, MetaInlines [ Str "Template" , Space , Str "Title" ]
)
]
}
[ Para [ Str "Hello" , Space , Str "world!" ] ]
So a warning there is useful to encourage systems to move to a safer way of handling metadata in general, by using metadata defined in other blocks or in other files (like the --metadata-file
option) .