-
Notifications
You must be signed in to change notification settings - Fork 95
Fontmake is slow #367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yes, we know it's very slow. |
It seems to that interpolation is the slowest part, based on the fact that generating variable font is several times faster than generating static instances. |
cProfile run on UFO to ttf build in Python 3.6.3 with the Ranked by cumulative time over top 50 function calls:
Ranked by internal time over top 50 function calls:
|
cProfile run on UFO to otf build in Python 3.6.3 with the Ranked by cumulative time over top 50 function calls:
Ranked by internal time over top 50 function calls:
|
@khaledhosny do you have a smaller (than Mada) body of source that you build from a designspace (or is it possible to push one)? Happy to add data on |
There is https://github.com/alif-type/reem-kufi, but it does not do any interpolation which I think is the interesting difference. But I can run with cProfile myself. I’ll add my results here. |
@davelab6 just asked me if we could run fontmake in parallel on the Google Cloud Platform, since I'm doing something similar for Font Bakery right now. Sure, we could, it would need some modifications though, but we could reuse or extend what we have. It would probably build one instance per pod. Similarly, one quick improvement on a local machine would be to just spawn one fontmake process per instance to create. The kernel would take care of distributing this to different processor cores. It would be nice though, to run the work within an instance build in parallel, i.e. map per glyph, interpolate and then reduce to the new font, on a multiprocessor machine that would be awesome. Still, my guess is that something fishy is going on in that code, I believe it should be much faster judging from what it does. So, a thorough performance review may help as well.
Yeah, that's the hard part :-) Usually it's either moving calls to functions out of a loop, to execute just before or to do some memoization (=buy speed with memory) etc. Caching is another form of hell though. The problem is, a performance review is time-consuming since you need to get to know the code very well. About profiling to identify bottlenecks, I remember reading a blog post (which I can't find anymore of course). The author used to interrupt the running program at random points in time and claimed that the performance critical code would be very often where the program was at that moment, because that's where most time is spent. So he would take a deep look at these common locations and start optimizing there. This is a bit brutal and dirty, but not a really bad idea. For python, this would mean to start fontmake and then hit |
for parallelizing python programs that do CPU intensive stuff (non I/O intensive) one cannot use multi-threading because of the infamous CPython's global interpreter lock (GIL). So one has to use multiprocessing. But unlike threads, sub-processes by default don't share memory with the parent process and communication between them may require serialization and deserialization (read |
because it is truly a waste that 7 of my 8 cores sit idly in the background while fontmake is crunching my fonts. |
Yeah, for now, would it be possible to just start say 8 processes of fontmake by hand, to build different instances of the same project, with a shell-script or so? Or would they race for the same intermediate folders on the harddrive?
Sure, maybe not so bad though, since shared memory can cause problems as well. |
yes, if you have one UFO for each output font $ fontmake -u MyFont-Regular.ufo -o ttf
$ fontmake -u MyFont-Bold.ufo -o ttf If you have interpolated instances to make, you could run |
Maybe something along these lines (Py3)? from multiprocessing import Pool
from fontmake.font_project import FontProject
def build_fonts(ufo_path):
fp = FontProject()
fp.run_from_ufos(ufo_path, output=('ttf', 'otf'))
if __name__ == '__main__':
with Pool(4) as p:
p.map(build_fonts, ["UFOPATH1", "UFOPATH2", "UFOPATH3", "UFOPATH4"]) The standard output/logging will get ugly I suspect but I believe that would do the trick to build in four parallel processes. Filesystem access across processes possibly a problem too? Not somewhere where I can test at the moment. Perhaps parallel builds of ttf and otf would be of benefit too? I don't know how much shared compilation happens there. If that works, the build script can be modified to accept arguments for the source paths to replace the ufo path list. |
Removal of the logger and timer may be low hanging fruit in production code too. I can look at this tonight if you haven't previously assessed. Isn't immediately clear to me what the timer is doing. |
the timer does nothing else than timing. It doesn't uses resources. You see it in the profiler because it wraps the other method calls as a decorator. |
I suggest you simply redirect stderr to /dev/null for now. We'll have to rethink the whole logging interface in a multiprocessing fontmake. |
@anthrotype wondering if the decorated method calls are leading to this issue:
|
which issue? |
as you can see from the legend, the total time ( |
Misread the percall columns. Sorry! |
What project is |
actually to mute the fontmake logger while multiprocessing, instead of redirecting stderr to /dev/null, you can just configure the global logging verbosity with a level of "ERROR" or "CRITICAL", e.g. BTW, I noted this elsewhere, the fact fontmake's FontProject class takes a "verbose" argument to configure logging is a lie: the basicConfig function only works the first time is run, as the whole logging settings are global, not class or instance specific. |
@graphicore This might get you the line by line profiling that you need once a bit more digging has been performed at a higher level? |
it's not a chance that that is one of the most computationally heavy tasks |
Cython possible there? |
Cython is already used for the clipping. Pyclipper is a Cython wrapper for the C++ Clipper library. BooleanOperations is a python wrapper that "flattens" (converts curves into polygons) the contours, passes them on to pyclipper, and re-curves the output polygons back into beziers. I'm afraid there's no easy shortcut here, Chris. |
I can use a Makefile for this kind of parallelism, but it still wouldn’t help me that much since I build from design space not from individual UFOs. Let me try to interpolate then build the OTFs and see if it saves me much. |
Building OTF instances ordered by cumulative time:
and ordered by total time:
|
Building TTF variable font ordered by cumulative time:
and ordered by total time:
|
With PyPy it takes ~6 minutes to build everything, but without overlap removal and probably CFF optimization:
Python 3:
|
@khaledhosny is it necessary to modify the logging level to see that is the case during the build? The default standard output looks the same for me (i.e. no notification of error with any of the steps in the build process) as it does for our cPython builds (PyPy stdout shown):
|
Sorry my wording was misleading, I intentionally disabled overlap removal and CFF optimization to speed non-release builds (I was saying something else above originally, but edited the text before posting). |
There are specific options for keeping overlaps and disable subroutinization. Those have nothing to do with logging |
Ah, I see. Misunderstood. Thought that you meant that (at least a portion of) the decrease in time was due to skipped routines. |
Thanks both for the profiler data you have collected, it's very helpful. |
It would be helpful, though, if there was a way to generate a single UFO instance from a designspace file, right now all instances are generated once and this makes it hard to do things in parallel (more so with make, as one rule making multiple targets is not compatible with parallelism). |
I know... But I believe mutatorMath API doesn't allow that. At least |
Nice discussion, I have had missed this. So, this is the first time I'm seeing PyPy perform better than cPython. Good to know. Looks like majority of time is spent in booleanOperations.flatten; so we're back to replacing that. Let's figure out who's going to work on that. cc @adrientetar |
the majority of time for compiling a single UFO is spent on booleanOperations, yes. But when compiling a designspace, the majority of time is spent in mutatorMath/fontMath/defcon/... |
For starters, how about changing epsilon in booleanOperations flatten.py from 1e-8 to 1e-4? @typemytype |
And for mutatorMath/fontMath/defcon, we should add a lean ufoLib to fontTools, with minimal Glyph and Font and Contour etc objects that already do math, and use varLib on them. |
I hear pypy works best with long running processes as its JIT compiler takes some time to heat up |
@anthrotype below is from PyPy web site:
|
With this change,
I cannot reproduce this. flatten.py takes, like, 10 seconds out of 180s runtime on NotoSansArabic-MM.glyphs for me. |
@behdad If you'd like to replicate with the source that I used: Data that I showed indicating the issue with were direct builds from UFO source files to ttf / otf binaries using https://github.com/source-foundry/Hack/tree/master/source tar.gz download: https://github.com/source-foundry/Hack/archive/v3.000.tar.gz The parallel build data that I posted above included all four variant UFO source directories in the repository link above. I don't believe anyone has presented profiling directly from glyphs source files yet. Not sure how that might influence the booleanOperations issue that occurred with my testing. |
dont know if that is going to change that much, BooleanOperations is and always will be a heavy and time consuming call... I guess the best option is to port to a more completely non py package... as proposed here typemytype/booleanOperations#40 |
Pushed this for anyone who is building directly from UFO source and might find it of use. It automates the number of spawned processes for requested compiles up to a max of the number of cores detected on the system (can be manually set if desired): Builds ttf + otf by default. Modify directly in the script. Details on README. |
Thanks! We should add a mode to fontmake to do this automatically. |
Agree! It sounds like @anthrotype plans to tackle this at some stage. #367 (comment) |
Khaled, I'm curious, what's the current build time for mada?
|
I don’t know, but it does not really matter. Python is slow and I don’t think this is going to change. |
FWIW, I manged to cut the build time by half (down to 15 minutes from 30 minutes) by using I managed to cut down to 4 minutes by building only the masters with fontmake, using varLib to build a variable font out of them, and varLib.mutator to create the instances from it. |
@khaledhosny If you're bored, try building Mada instances with https://gitlab.gnome.org/GNOME/cantarell-fonts/blob/master/scripts/make-static-fonts.py and https://gitlab.gnome.org/GNOME/cantarell-fonts/blob/master/scripts/instantiator.py. It launches a subprocess per instance and psautohints the resulting OTF. I can build the 14 Noto Sans uprights OTFs in 2:50 mins (without autohinting) on my 16 logical cores, only takes around 7 GB of RAM for the duration. |
Not sure if this helps but the command |
Building Mada using fontmake takes about ~15 minutes on my (not terribly slow) machine. I can live with some slowness, but 15 minutes is a bit too much. It was actually ~5 minutes before aliftype/mada@a367974 (5 minutes is still a bit too much as well).
The text was updated successfully, but these errors were encountered: