8000 GitHub - michal-h21/htfgen: create tex4ht supporting files for TeX fonts
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

michal-h21/htfgen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

htfgen

This is a set of tools to simplify creation of htf fonts for tex4ht.

Installation

cd `kpsewhich -var-value TEXMFHOME`
mkdir -p tex/latex
cd tex/latex
git clone https://github.com/michal-h21/htfgen.git
cd htfgen
tex makeenc.tex
chmod +x lstexenc
chmod +x tfmtochars
chmod +x htfgen
chmod +x dvitohtf
chmod +x scanfdfile
chmod +x make-t1-htf 
# Should almost always be set, but you never know.
[ -z "$PWD" ] && pwd="$(pwd)"
ln -s $PWD/lstexenc /usr/local/bin/lstexenc
ln -s $PWD/tfmtochars /usr/local/bin/tfmtochars
ln -s $PWD/htfgen /usr/local/bin/htfgen
ln -s $PWD/dvitohtf /usr/local/bin/dvitohtf
ln -s $PWD/scanfdfile /usr/local/bin/scanfdfile
ln -s $PWD/make-t1-htf /usr/local/bin/make-t1-htf
mktexlsr # this is optional, only if your distro cannot find installed Lua files

Introduction

Objective of Htfgen is to automatize the creation of the HTF font mapping files. These files are used by tex4ht to map character codes in the DVI files to Unicode.

There are two main scripts: scanfdfile and dvitohtf. The first one searches for declared fonts in the FD files, the other generates literate TeX file for the HTF generation. Sample usage is as follows:

cat /usr/local/texlive/2021/texmf-dist/tex/latex/ebgaramond/*.fd | scanfdfile | dvitohtf > ebgaramont-htf.tex
etex ebgaramont-htf.tex

This will create HTF files for all detected fonts defined in FD files for EB Garamond.

dvitohtf can also generate HTF files for missing fonts in the DVI file. So if tex4ht reports missing HTF files, it can be used directly on the DVI file:

dvitohtf sample.dvi > missing.tex etex missing.tex

dvitohtf supports both virtual and tfm fonts. It looks for virtual fonts first, the tfm file is used only when no vf is found. It looks for all fonts referenced in the virtual font and tries to look for corresponding .enc files in pdftex.map. The .enc files contain glyph lists, which are then mapped to Unicode.

It also parses the .pfb file for font family name and tries to detect style (italic, bold, small caps) from the font full name saved in the .pfb file.

It computes hashes for the font tables, so duplicate font tables aren't written, the fonts with same characters just link to the first used font.

If no .enc file is found, then the font cannot be supported. There can be also missing mappings between glyphs and Unicode. The missing mappings are reported in the TeX file. Htfgen contains large mapping files, but some fonts just use some custom glyphs which doesn't have Unicode equivalent. For example Q_u ligatures etc. In this case the mapping must be added by hand to glyphlists/glyphlist-fixes.txt.

It works reasonably well for fonts generated by Fontinst, because they usually use standard glyph names, contains .enc files, etc. For complex virtual fonts, especially math, it fails. HTF files for such fonts still needs to be created by hand.

Find font families in LaTeX packages

Use the listfonts.lua script to find .fd files required by a LaTeX package like times or bookman

texlua listfonts.lua times avant bookman | xargs cat  | scanfdfile | dvitohtf

Update the tex4ht literate sources

It is possible to update the literate source file using the update_literate_sources.lua script. It takes list of .htf file as an argument and reads the source from standard input.

 texlua update_literate_sources.lua `ls *.htf`  < tex4ht-fonts-noncjk.tex  > tex4ht-fonts-noncjk-updated.tex

It is useful for updating already existing fonts in the sources.

Obsolete information

Simple usage

If all what you need is to add support for normal text font encoded in T1 fontenc, you may try to use the make-t1-htf script.

Lets say that you have following file:

\documentclass{article}

\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}

%\usepackage{dejavu}
\renewcommand{\familydefault}{\sfdefault}
\usepackage[defaultsans]{droidsans}
\newcommand\sample{Příliš žluťoučký kůň úpěl ďábelské ódy. \% ``Nazdar světě''}

\begin{document}

\sample

\textit{\sample}

\textbf{\sample}

\textsc{\sample}

\texttt{\sample}

\end{document}

Then you will get lot of warnings in the terminal output about missing .htf files:

--- warning --- Couldn't find font `DroidSans-t1.htf' (char codes: 0--255)

because T1 encoded fonts should produce identical output, we don't need to create full .htf file with char table, but we can create file which points to some existing full .htf file in T1 encoding. This file has such form:

.lm-ec

dot on the first line indicated that this points to file lm-ec.htf, which is .htf file for Latin Modern in T1 encoding.

We can add also some CSS information to the file, with this instruction:

.lm-ec
htfcss: DroidSans-Bold-t1 font-weight:bold; font-family: 'Droid Sans', sans-serif

first entry after htfcss is the font name, which is also used as CSS class. Then CSS instructions follows.

To simplify creation of this light .htf file, make-t1-htf script is provided. The usage is following:

usage: maket1htf fontname [style] [family]

available styles are rm for normal text, it -- italic, b -- bold and sc -- small-caps

The sample .htf file was created with command:

$ make-t1-htf DroidSans-Bold-t1 b "'Droid Sans', sans-serif"

Usage

First, we need to figure out, whether tested font is virtual. Virtual fonts aren't supported at the moment.

lstexenc fontname

for tfm font, we get something like:

$ lstexenc eccc1000
eccc1000        tfm     EXTENDED TEX FONT ENCODING - LATIN

for virtual font, the outpus is like this:

$ lstexenc ntxmia 
ntxmia	vf	FONTSPECIFIC
  txmia	tfm	FONTSPECIFIC
  txsyc	tfm	FONTSPECIFIC
  txr	vf	TEX TEXT
    rtxptmr	tfm	TEXBASE1ENCODING
    rtxr	tfm	FONTSPECIFIC
  ntxexb	tfm	UNSPECIFIED
  rtxmio	tfm	FONTSPECIFIC
  zxlr-8r	tfm	TEXBASE1ENCODING
  ptmr8r	tfm	TEXBASE1ENCODING
  zxxrl7z	tfm	ADOBESTANDARDENCODING

first entry on each line is a font name, second font type and third is font encoding. Font encoding information provided by fonts is unreliable!

If font type is tfm, we can process to another step, which is translation of 8-bit characters to unicode characters using glyph info encoded in enc or afm files.

tfmtochars script is used to generate tsv file with char number, glyph name and hex unicode value.

$ tfmtochars fontname [enc]

tfmtochars tries to find enc file in map files provided by pdftex, you need to provide the name only in the case the enc file is not configured for given font.

$ tfmtochars eccc1000
0	grave	0060
1	acute	00B4
2	circumflex	02C6
...
253	yacute	00FD
254	thorn	00FE
255	germandbls	00DF

for some fonts, enc file can't be found:

$ tfmtochars zxxrl7z
Cannot load enc file for txmia

You may try to find a enc file by hand (Google?). Common enc names can be found in fontname guide.

From lstexenc zxxrl7z we know that this font uses ADOBESTANDARDENCODING, which is coded as 8a:

$ tfmtochars zxxrl7z 8a 

we should control, whether the output is correct (nil values instead of hexadecimal codes are suspicious). We may use

$ tfmtochars zxxrl7z 8a | grep nil

to test that. If the output is correct, we need to save the output to the tsv file, with command:

$ tfmtochars zxxrl7z 8a > zxxrl7z.zsv

now we can generate the htf file with htfgen

$ htfgen zxxrl7z.tsv > zxxrl7z.htf

To test the htf file, you can use testfont.lua script:

$ texlua testfont.lua zxxrl7z

and compare the resulting pdf and html files. Note that occasionally some characters in either pdf or html files are missing, I am not sure why.

Issues

math fonts are mess

Math fonts with calligraphic features such as fraktur or scripts doesn't reflect these styles in glyph names, so it is impossible to automatically reflect these features. See unicode math symbols for unicode values of these features. You may need to correct unicode hex codes in the tsv file by hand. Any ideas how to solve that automatically will be really appreciated.

parse afm files when no enc is found

We may parse afm files for glyph names, when no encoding is found or provided. Not all fonts does have afm files, though.

virtual font 5EDC s

My idea is following:

  • first we need to make tsv files for all referenced fonts
  • then make new tsv with referenced characters from particular tsv files
  • sometimes character in the virtual font is composed from more characters from one or more referenced fonts. It is unlikely that this can be solved automatically

Some symbols in TeX fonts doesn't have unicode characters

About

create tex4ht supporting files for TeX fonts

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  
0