This document describes requirements for pagination and layout of
books in latin languages, based on the tradition of print book design
and composition. It is hoped that these principles can inform the
pagination of digital content as well, and serve as a reference for the
CSS Working Group and other interested parties. This work was inspired
by [JLREQ].
Status of This Document
This section describes the status of this document at the time of
its publication. Other documents may supersede this document. A list
of current W3C
publications and the latest revision of this technical report can be
found in the W3C
technical reports index at http://www.w3.org/TR/.
This is a work in progress. No section should be considered
final, and the absence of any content does not imply that such content
is out of scope, or may not appear in the future. If you feel
something should be covered here, tell us! The initial
version of this document will focus on books, and at this time will not
include requirements specific to magazines or newspapers. The scope will
depend heavily on the willingness of people to contribute to this
document. Please contact the Digital Publishing Interest Group if you
would like to help.
This document was published by the Digital
Publishing Interest Group as a Working Draft. Once the document is
stable, the group will publish it as an Interest Group Note. If you wish
to make comments regarding this document, please send them to public-digipub@w3.org
(subscribe,
archives).
All comments are welcome.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a
draft document and may be updated, replaced or obsoleted by other
documents at any time. It is inappropriate to cite this document as
other than work in progress.
The disclosure obligations of the Participants of this group are
described in the charter.
Not all stories worth telling can fit in a tweet, on a computer
screen, or on a single piece of paper. Ever since the codex replaced the
scroll, humans have divided our stories into pages. Pagination is the
art and the craft of turning that scroll of content into discrete
pieces, whether destined for book pages or screens. Pagination requires
us to think about the document at all levels, from the total number of
pages to the tiny spaces between letters. Along with graphic design and
typography, it determines the look of the page.
Typography is the craft of endowing human language with a durable
visual form, and thus with an independent existence.
—Robert Bringhurst, The Elements of Typographic
Style
Good pagination, like good typography, aims to be invisible. As the
reader turns the page, the stream of words and images in her mind should
not be interrupted. Two thousand years of experience have taught us how
best to do this. The goal of this document is to describe those rules,
as clearly as possible, so they can be implemented in the Open Web
Platform. We hope for a day where the pagination of digital books will
be as beautiful and transparent as the best printed books.
2.
Conformance
As well as sections marked as non-normative, all authoring guidelines,
diagrams, examples, and notes in this specification are non-normative.
Everything else in this specification is normative.
The key words MUST, MUST NOT, REQUIRED,
SHOULD, SHOULD NOT, RECOMMENDED,
MAY, and OPTIONAL
in this specification are to be interpreted as described in [RFC2119].
TK
3.
Fundamentals
Makeup is a highly skilled procedure. If the text is merely divided
mechanically into portions of equal length, without regard to where
the divisions fall, some of the pages that result are bound to be
unacceptable logically or aesthetically: they will incorporate bad
breaks.
—Chicago Manual of Style, 14th Edition, 19.40.
What therefore God hath joined together, let not man put asunder.
—The Bible, Matthew 19:6
Every rule of pagination boils down to a single principle: break pages
with as little disruption to the reading experience as possible. A widow
leaves the last line of a paragraph isolated from the rest of the
thought. A recto hyphen means a word is interrupted by a page turn. A
heading at the bottom of a page removes the title from the section, and
the section from the title.
3.1
Tradeoffs
Pagination involves tradeoffs. Fixing a widow may result in a
misaligned spread. Fixing that may result in a loose line or
paragraph. What is acceptable in one book, or for one publisher, may
be unacceptable to another. What is acceptable in one country, or
language, may be unacceptable elsewhere.
3.2 Untangling the Vertical and the
Horizontal
Page breaks are often line breaks. The tiniest change in kerning can
make a paragraph longer or shorter, and thus create a widow or an
orphan. The work of pagination, as done by typesetters, human or
machine, inevitably involves the consideration of the lines of text.
And so we will not try too hard to avoid talking about line breaks,
when they potentially influence pagination.
4. Hyphenation and Justification
Good hyphenation and justification is critically important to the
appearance and readability of text. Print typesetting systems can often
achieve very good results, but most online reading systems do this very
poorly.
4.1
Hyphenation
Text is often easier to read when words are allowed to break at the
end of lines, thus avoiding massive variations in word-spacing or
margins. But determining acceptable places to break words is a
difficult problem:
All of the following are the results of automated hyphenation
algorithms:
The following choices need to be made when considering hyphenation
of text.
Should this text be hyphenated at all? Hyphenation is generally
suppressed in headings.
What’s the shortest word that can be hyphenated? Five or six is
typical.
What's the minimum number of characters allowed before a
hyphen? Two is typical, and is sometimes stated as “two-up.”
What's the minimum number of characters allowed after a hyphen?
Three is typical, and can be stated as “three-down.”
How many consecutive lines can end with a hyphen (known as a
“ladder”)? Two or three is typical.
Should capitalized words be hyphenated?
Can the last word of a paragraph be hyphenated?
Can the last word in a column, page, or spread be hyphenated?
4.1.2 Choosing hyphenation points
A key question is, “who decides what is acceptable?” The answer
depends on the language, the culture, the subject matter, and the
material being typeset.
4.1.2.1
Language
Each language has its own conventions about hyphenation. U.S.
English hyphenates differently than U.K. English. In some European
languages, words may be spelled differently when hyphenated.
TK
Of course, the same text may include words from many different
languages.
4.1.2.2
Culture
Even within the same language, authorities differ on the proper
hyphenation of words.
Copyeditors will often specify a canonical reference for
hyphenation, which is usually a particular edition of a particular
dictionary.
4.1.2.3
Subject Matter
Specialized subject matter may require additional hyphenation
dictionaries. This is common in medicine, law, and science.
4.1.2.4
Exceptions
Authors should be able to provide a list of exceptions, which add
to or override what the system would normally do. The format for
doing so should be easily understood.
TeX uses the following format. Possible hyphenation positions
are indicated with (surprise!) hyphens. Hyphenation should be
prevented where hyphens are absent.
\hyphenation { sur-pris-ingly tan-ta-liz-ing-ly these }
4.2
Justification
4.2.1
Algorithms
4.2.1.1
Greedy
4.2.1.2
Knuth-Plass (TeX)
4.2.1.3
Adobe (InDesign)
5. Paginating Single-Column Text
The simplest situation, which is very common, is when the content is
only text, in a single column. Aside from chapter and book optimizations
(to be discussed later) and line-breaking, the biggest issue is likely
to be widows (see Fig. 1 Text spread with widow for an
example).
5.1
Widows
A widow is when the last line(s) of a
paragraph falls at the top of a page. Publishers have different
standards. Most frown on a single line at the top of the page,
although some are OK if that line spans at least three-quarters of the
page. Others require at least two lines of a paragraph at the top of a
page.
Issue
1
[css3-break]
does not consider a fractional value for the widow property.
Many typesetting systems have settings to prevent widows. CSS
discusses these issues in [css3-break].
Unfortunately, these systems usually create another problem when they
fix the widow. In Fig. 2 Widow fixed, but pages don’t align ,
there’s no longer a widow at the top of the page, but since the system
merely moved a line from the left page to the right, it left behind an
empty line, and the pages no longer align at the bottom.
An orphan has two possible meanings in
typesetting. It can refer to the minimum number of lines required
before a page break (as in [css3-break]).
It can also refer to the last line of a paragraph in any context. In
the former context, many publishers now accept a single line of a
paragraph before a page break. For the latter, standards vary widely.
Some publishers want the last line to be longer than the paragraph
indent. Some require one or two full words, or a certain number of
characters. Most avoid having only a fragment of a word as the last
line.
Issue
2
CSS does not currently address the second meaning of
orphan.
5.3 Constraints on page depth
In traditional typesetting, the first defense against bad breaks is
to change the depth of the page. “Running long” or “running short”
means including one more (or one less) line of text on each page of
the spread, thus sidestepping the previously-identified issue.
A typical book design includes instructions on whether it’s
acceptable to run short, long, or (more rarely) both. Often there are
also constraints on how many consecutive spreads (or pages) may be
altered in this way. If running both long and short, it’s usually
forbidden to go from one to another without an intervening normal
spread.
Running long or short may affect the space between the last line of
text and a page footer or folio. Most publishers prefer footers to be
in a fixed position. If, instead, the space between the last line of
text and the footer is fixed, the footer is said to "bounce."
5.4
Facing Pages
If a document has facing pages, the publisher usually requires that
they align top and bottom. Exceptions include:
It's the last page of a chapter.
The page contains no text—only images or tables
When aligning facing pages will make some other issue worse
5.5 Recto and Verso Hyphens
Publishers sometimes constrain what characters may appear before a
page break. Most commonly, the right-hand page of a spread may not end
with a word fragment, as the reader must turn the page before reading
the rest of the word. Less common is a prohibition on the verso page
ending with a hyphen.
5.6 Space Breaks and Ornaments
Many novels, and some narrative non-fiction books, include small
breaks in the text. These are usually represented by one to three
blank lines, or by a small ornament or dingbat. Problems arise when
these breaks fall at the top or bottom of a page.
If, however, the space break falls at the bottom of the page,
confusion can result. In Fig.
5 Incorrect:
Space break at bottom of page , it’s hard to tell
there’s a space break, as it just looks like the page is a few lines
short.
In that case, asterisks or some other ornament is added to the top
or bottom of a page, as a visual reminder of the break. To get
everything to work out, the spread was run short, and the space break
(now with ornament) pushed to the top of the second page. See Fig. 6
Space break at top of page with asterisks .
This is an example where the page position of an element determines
its content as well as design. A ::page-top or ::page-bottom
pseudo-element might prove useful.
6. Paragraphs and indentation
TK
7.
Initial Capitals
Large, decorative letters have been used to start new sections of text
since long before printing. In fact, their use predates lowercase
letters entirely.
7.1
Drop caps
A drop cap is a larger-than-usual
letter at the start of a paragraph, with a baseline at least one line
lower than the first baseline of the paragraph. The size of drop caps
is usually indicated by how many lines they occupy—two-line and
three-line drop caps are the most common.
Aligning the letter vertically is a challenge. The cap height of the
letter should align with the cap height of the first line of text. The
baseline of the letter should fall on the baseline of one of the
following lines (the second for a 2-line drop cap, etc.).
The horizontal position of the drop cap and the surrounding text is
also an issue, as variations in glyph shapes may require increasing or
decreasing space to the right of the drop cap, and in some cases
separate adjustments may be required for each line adjacent to the
drop cap.
The position of a drop cap in relation to the left margin may also
need to be adjusted. Letters like "C" may need to move left slightly
to visually align with the left margin.
A drop cap may be desired on a paragraph which starts with a
punctuation mark, most often a quotation mark. In this case, one
option is to delete the quotation mark entirely.
Note
Input on techniques for coping with initial punctuation on drop
caps would be appreciated.
7.2 Raised caps and sunken caps
A raised cap is a large letter used
to start a paragraph, which uses the same baseline as the rest of the
first line. A sunken cap both sinks
below the text baseline, and extends above.
Books often have material printed at the top and/or bottom of each
page, outside the normal content area. These headers or footers may
serve as guideposts for reader, fodder for designers, low-tech DRM, or
merely a way to know what book your fellow train passenger is reading.
There’s more to running headers than is dreamt of in the open web
platform…
8.1
Content
Running heads and footers may contain:
Content from the document: book title, chapter or part titles,
author name(s). Indexes and notes sections may have running heads to
identify which entries are on a particular page.
Content intended only for running heads: shortened versions of
chapter titles…
Counters of all sorts: page numbers, section numbers, chapter
numbers.
Ornaments, decorative type, or images
Copyright statements or other boilerplate
Date and/or time stamps
File names
Version numbers
combinations of the above
In some cases the content of running heads may have an internal
structure—a chapter title might have an italic word—or may require
different text styles or fonts.
In this example, the running header contains the author name, the
page number, and an ornament. This seemingly simple case was quite
complex, using [css3-gcpm]-like
features implemented by PrinceXML.
An element whose content is used in running heads may appear many
times on a page. Authors must be able to specify which instance is
used. [css3-gcpm]
provides the start, first, last, and first-except keywords to
accomplish this:
first
The value of the first assignment on the page is used. If there is
no assignment on the page, the "entry value" is used.
start
If the element is the first element on the page, the value of the
first assignment is used. Otherwise the "entry value" is used. The
"entry value" may be empty if the element hasn’t yet appeared.
last
The "exit value" of the named string is used.
first-except
This is identical to first, except that the empty
string is used on the page where the value is assigned.
Issue
3
Are these values enough to handle indexes, dictionaries,
and other use cases?
8.2
Placement
Running headers and footers may appear in almost any position on a
page.
The position of the running head may be different on first pages
vs subsequent pages, or the running head may be omitted on first
pages
Running heads may align to the inside or outside, and thus be
different on left and right pages.
Authors may need to control the layering of running head text
(i.e. “z-index”).
The running head may overflow the page boundary (i.e. “bleed”)
[epub-3] has
now deprecated support for headers and footers using oeb-page-head
and oeb-page-foot.
9.
Heads
9.1
General Considerations
TK
9.2 Heads at the top of a page
When a head falls at the top of a page, a spacing adjustment is often
necessary. Here's a typical arrangement, with a line and a half of
space above the head, and a half-line-space below, so that the text
stays on the proper baselines.
If that head appears at the top of the page, the subsequent text will
be off by a half-line.
Everything works out if we add a half-line-space back.
9.3 Heads at the bottom of a page
A head should never be the last thing on a page; it must be followed
by two or three lines of text.
9.4 Bridge heads, side heads, and run-in heads
TK
10.
Images
10.1 TK
Some things to note about this image
the caption and image are treated as a unit
Text runs around the image+caption
image runs right up to the gutter of the page (i.e. extends
beyond usual content area)
10.2
Inline images
TK
10.3
Bleeds
TK
Images that cross spread
image before callout?
placing multiple images on page… inside/outside, top/bottom, stagger
broadside
placement of caption/title
11.
Tables
11.1
Alignment
Many tables have specialized requirements for the alignment of cells
in a given column.
11.1.1
Align on character
All entries in a given column may need to align to a predetermined
character, most commonly a decimal point. Typically, the longest
entry in the column should be centered, and then the other entries
should align to that entry.
In some cases, a composite “longest entry” needs to be constructed:
| 445.85 |
| 12345.6 |
| 1.234 |
| .1 |
In this case, the user agent should act as if 12345.234 was the longest
line, so the margin to the left of 12345.6 will be equal to the margin
to the right of 1.234.
When a collection of whole numbers with no decimal points are in a
column and are asked to align, the longest whole number should
center in the column and the rest of the whole numbers should right
align on the right indent of the longest whole number.
If the content of a table cell is being aligned to a character,
that content should not have wrapping applied by the rendering
system.
11.1.2 Flush left center alignment
Issue
4
What should we call this?
Also known as centering on the longest line, the longest line in a
column is found and centered, and other entries in the column are
aligned to the left edge of the longest line.
As before, header and footer cells are ignored, and the author
should be able to exclude specified cells from the alignment
process.
This type of alignment is often used in text, for poetry or prose
extracts.
User agents should not break single-word cells.
11.2
Table widths
In print, tables are not randomly sized but typically set to one of
a few fixed widths. This requirement necessitates that a composition
engine know how to “snap to” one of the desired widths. This may help
show relationships between separate tables.
broadside
placement of caption/title
spread
multi-page
continued lines
12.
Lists
13.
Footnotes
Having to read footnotes resembles having to go downstairs to answer
the door while in the midst of making love.
—Noël Coward
In print publishing, a footnote consists of two parts: a reference
(often rendered as an asterisk or superscripted number) and the footnote
body.
Footnotes themselves can be quite complicated. Footnotes can contain
multiple paragraphs, block quotes, poems, lists, and tables. Footnotes
can contain other footnotes (an edge case, admittedly, but David Foster
Wallace was notorious for this). Footnotes can extend across multiple
pages. In short, a footnote is a container that can hold almost
anything.
In order to describe footnotes in HTML, one must separate the footnote
reference (which is an inline element) from the footnote itself, as HTML
frowns on placing complex block structures inside paragraphs. This is
quite different from something like DocBook, where the content model
allows a footnote element inside a paragraph, and that footnote can
itself contain multiple paragraphs, etc.
Example 1
<p>It was the best of times<spanclass="ref-footnote-rw">*</span>, it was the blurst of times.</p><divclass="block-rw footnotes-rw"><p><spanclass="num-footnote-rw">*</span>Oh yes, but the telephone is so impersonal.</p><p>I prefer the hands-on touch you only get with hired goons.</p></div>
There may also be more than one reference to the same footnote.
Footnote handling as described in [css3-gcpm]
assumes the footnote is coded inline at the point of reference. This
situation is under discussion on the www-style list.
13.1 Inline footnotes and
multiple footnote regions
Footnotes usually fall at the bottom of the page, but may need to be
at placed at the end of a column, table, sidebar, or other document
structure.
13.3 Breaking footnotes across pages
Some footnotes can extend across more than one page. Limits on the
size of the footnote area(s) may be required, so that a page
containing only footnotes is avoided.
Note
Sometimes, footnotes may require so much space that they cannot
all be placed before the end of a document section. In this case,
it’s acceptable to have pages that consist only of footnotes.
13.4
Numbering
Three questions must be answered when numbering footnotes. First,
which numbering scheme should be used? Second, what are we actually
numbering? Third, is the numbering system reset at some point in the
document?
13.4.1
Numbering schemes
Footnotes are most commonly numbered with arabic numerals,
lower-case letters, or a sequence of symbols: *, †, ‡, and §, ||,
and #. With symbols, they may be doubled or tripled after exhausting
the sequence, but long before |||||| is used, the choice of
numbering should be re-evaluated.
13.4.2
What are we counting?
Usually, footnote numbers count footnotes. But in some cases, the
reference may be a line number, paragraph number, or section number.
13.4.3
Resetting numbers
Footnote numbering may restart with each new chapter, or each new
page. The former is common with numeric footnotes, the latter with
footnotes using symbols.
Digital publications often render footnotes differently from print.
They may become pop-ups, move to the end of the section, or to the
end of the document. We are not currently attempting to document
digital best practices around footnotes.
14.
Cross-references
TK
15.
Sidebars
Some things to notice:
The image floats to the top of the column inside the sidebar
The columns themselves base-align
The sidebar title and “supertitle” are on the same line.
16.
Marginalia
alignment with reference
17.
Equations
17.1
Breaking equations
TK
17.2
Numbering equations
TK
17.3
Aligning equations
Many publishers require that all equations on a page align on the
equals sign.
x + 3z = 7 + 2y
2x + y + z = 4
Intervening text which may
extend for several lines
10 + 2y = 3x + 2z
18.
Columns
Often the first page of a chapter or article will be set in a single
column, and subsequent pages set in multiple columns.
19.
Punctuation
Spacing around punctuation marks is a known obsession of typographers.
19.1 Language-specific spacing rules
Punctuation
English
French
Exclamation Point !
!
[thin space]!
Colon :
[thin space]:
Question Mark ?
?
[thin space]?
Open Quote
“
«[thin space]
Close Quote
”
[thin space]»
19.2 Em-dashes and en-dashes
To space or not to space? That is the question. Even within
publishing houses, arguments continue over the proper display of
em-dashes. Some imprints at Hachette use closed em-dashes, others
insist on thin spaces around em-dashes. If the same book is to be
published in the United Kingdom, em-dashes would be replaced with
en-dashes, with larger spaces around them.
Given the subtlety of many of these rules, it’s helpful to use CSS
to generate typographically-sophisticated output from material written
by lay authors, or to adapt content to varying publisher or language
requirements.
Older drafts of [css3-gcpm]
contained a text-replace property,
which has been implemented by PrinceXML.
body {
prince-text-replace: "—" "\200A—\200A";
}
In this example, we’re adding hair spaces around em-dashes.
20. Special Considerations for Genres
20.1
Education
College textbooks
Elhi
Language
Study guides
20.2
Trade
Fiction
Narrative nonfiction
Children’s
YA
Religious
Bibles
Travel
How-to
Manga/Comics/Graphic Novels
20.3
STEM
20.4
Reference
Legal
Dictionaries
21.
Digital Issues
22. Large-Scale Issues in Pagination
22.1
Book optimization
In trade publishing, we often know how many pages will be in a book
before it is written. The nature of printing and binding also mandate
that the number of pages in a book be some multiple of eight, sixteen,
or thirty-two pages. Publishers often limit how many blank pages are
allowed at the end of a book.
22.2
Chapter optimization
A chapter that ends with only a few lines of text looks like a
mistake, and wastes paper (or electrons!) Generally a page should
contain at least five lines of text.
A.
Baseline Grids
A baseline grid is a series of evenly-spaced horizontal alignment
lines. This is used to provide a vertical rhythm for a design, to align
adjacent content (text or graphics), and to align baselines on facing
pages in printed material.
The grid lines can be spaced at line-height intervals or a factor of
line-height.
Content can be aligned to the grid in various ways. Roman body text
typically sets the baseline on a grid line. Graphics might have their
top, bottom or both set on grid lines, or be centered between grid
lines. Text blocks (consider a multi-line heading with line-height at
1.4x grid height) might have their last baseline or first baseline on a
grid line, or have the block's combined height centered between grid
lines. Centering is much more important in ideographic type systems.
If normal layout would result in a misalignment, content shifts down
to the next available grid line.
Sometimes it's necessary to have particular content opt out of
aligning to a grid.
There can be one or more grids per document. Multiple grids can
overlap (body grid and side content grid) or run in series (a vertical
stack of pages). Grids can be nested (think of a document being
represented as a graphic inside another document). A particular piece of
content only aligns to a single grid.
B. Of Leading and Sinkage: The Language of Print
Translating print designs to the open web platform can be tricky.
vertical distances are usually measured baseline to baseline.
print designers sometimes talk about a "text page" which includes
the running head.
The basic text area is often specified with a gutter margin and a
text "measure". In [css3-page]
this area is described by left/right or inside/outside margins.
Leading
Line-height
Recto
Right-hand page of a spread
Verso
Left-hand page of a spread
C. The Classical Rules of
Hyphenation and Pagination
At hyphenated line-ends, leave at least two characters behind, and
take at least three forward.
Avoid leaving the stub-end of a hyphenated word, or any word
shorter than four letters, as the last line of a paragraph.
Avoid more than three consecutive hyphenated lines.
Hyphenate proper names only as a last resort unless they occur with
the frequency of common nouns.
Hyphenate according to the conventions of the language.
Link short numerical and mathematical expressions with hard spaces.
Avoid beginning more than two consecutive lines with the same word.
Never begin a page with the last line of a multi-line paragraph.
Balance facing pages by moving single lines.
Avoid hyphenated breaks where the text is interrupted.
Abandon any and all rules of hyphenation and pagination that fail
to serve the needs of the text.
D.
Further Reading
Bringhurst, Robert. The Elements of Typographic Style
Felici, Jim. The Complete Manual of Typography
Haralambous, Yannis. Fonts & Encodings: From Advanced
Typography to Unicode and Everything in Between
Haslam, Andrew. Book Design
Highsmith, Cyrus. Inside Paragraphs
Kane, John. A Type Primer
Knuth, Donald. Digital Typography
Lawson, Alexander. Anatomy of a Typeface
Mitchell; Wightman. Book Typography
Nickel, Kristina. Ready to Print
Steer, Vincent. Printing Design and Layout (1948)
Tracy, Walter. Letters of Credit: A View of Type Design
Tschichold, Jan. The Form of the Book: Essays on the Morality of
Good Design
E.
Acknowledgments
Eric Aubourg, Luc Audrain, Bert Bos, Tom Byrer, James Clark, Brady
Duga, Ivan Herman, Tony Graham, Bill Kasdorf, Jean Kaplansky, Liam Quin,
Alan Stearns, Tzviya Siegman
Yasuhiro Anan; Hiroyuki Chiba; Junzaburo
Edamoto; Richard Ishida; Tatsuo KOBAYASHI; Toshi Kobayashi; Kenzou
Onozawa; Felix Sasaki; Seiichi Kato; Hajime Shiozawa et al. Requirements
for Japanese Text Layout. 3 April 2012. W3C Note.
URL: http://www.w3.org/TR/jlreq/
Garth Conboy,Matt Garrish, Markus
Gylling, William McCoy, MURATA Makoto,Daniel Weck. EPUB
3.0.1. 26 June 2014. IDPF (International Digital
Publishing Forum) final Recommended Specification. URL: http://idpf.org/epub/301