OCaml 4.03
The next version of the OCaml programming language, 4.03, has had a few beta versions already, and should be released in the following weeks. A major theme of this new release is to help users improve the efficiency of their programs by making changes to the language, compiler implementation, and tooling. It also contains a large variety of other changes that demonstrate the accelerating development of the language implementation over the past few years, which is caused in part by the steady increase in the number of external contributors. Some promising experimental projects that are still in development suggest that future releases may retain this accelerated pace.
A quick tour of OCaml
OCaml is a typed functional language of the ML family, close to Standard ML or F# and, to a lesser extent, Haskell. Functional programming emphasizes first-class functions, immutable data, and making it easier for users to reason about their programs. OCaml is a mature language with a robust implementation; it is garbage-collected and has a native compiler and efficient runtime resulting in reasonably fast programs, as well as separate compilation and fast compile times. Idiomatic OCaml programs tend to be shorter but slower than optimized C or C++ code, but in the same ballpark as the faster languages implemented on the Java Virtual Machine (JVM).
Here is a taste of OCaml code:
(* Binary tree with leaves carrying an integer. *) type tree = | Leaf of int | Node of tree * tree let rec exists_leaf test tree = match tree with | Leaf v -> test v | Node (left, right) -> exists_leaf test left || exists_leaf test right let has_even_leaf tree = exists_leaf (fun n -> n mod 2 = 0) tree
The values of type tree
are either the Leaf
constructor
followed by an integer or a Node
constructor followed by two
sub-trees. This is an instance of an algebraic data
type, which is a form of discriminated union common in functional programming.
The function exists_leaf
takes two arguments, a test function and
a tree, and returns true if one of the leaves of the tree satisfies
the test function. Pattern matching (match tree with ...
) allows
inspecting the value of an algebraic data type; it performs
a conditional test on the constructor (Leaf
or
Node
), and gives
names to its parameters (v
, left
,
right
).
The general function exists_leaf
is then used to define the more
specific has_even_leaf
function; in this definition, an anonymous
function (fun n ...)
is passed as a test function.
Algebraic data types and pattern-matching make it easy to write
programs that manipulate symbolic data: interpreters, compilers,
static analyzers, theorem provers, model checkers, etc. Another
strength of ML languages is their handling of generic, polymorphic
code. Instead of a type tree
whose leaves contain integers, one can
define a parameterized type 'a tree
(in Java notation, that would be
Tree<A>
) whose leaves contain values of type 'a
:
type 'a tree =
| Leaf of 'a
| Node of 'a tree * 'a tree
The rest of the code does not have to be changed at all; the function
exists_leaf
becomes a generic function that works on trees of any
type, without requiring extra annotations..
OCaml has never seen large adoption in the world of enterprise software, and as such remains lesser-known than other mainstream programming languages. It has, however, kept and grown an active community of users since the 1990s, with lots of code released as free software, visible industrial users, and widespread adoption in some domains of computer science academia. LWN readers may be familiar with the OCaml Weekly News, the weekly summary of activity in the OCaml community that was launched by David Mentré in March 2001 (as Caml Weekly News) and has been edited by Alan Schmitt since late 2001.
OCaml is the implementation language of Coccinelle, a "semantic patching" tool for C that is regularly used in the Linux kernel community and frequently mentioned on LWN (for example in 2009, 2010, and 2014). Some other notable open-source projects are the Coq proof assistant, the MLdonkey peer-to-peer client, and the MirageOS library operating system for constructing unikernels — its supporting company, Unikernel Systems, was acquired by Docker in January 2016.
One strong niche of typed functional languages is compiler implementations; OCaml is used for the OCaml compiler itself, but also for the Haxe compiler, the old bootstrap compiler for Rust, the Hack language implementation at Facebook, and the reference interpreter of the WebAssembly project. OCaml is also used in several open-source static analysis and program verification projects, such as the Frama-C static analyzer for C programs, the Infer memory-safety analyzer for Java and Objective-C programs, the Why3 program verification platform, and the Flow type analyzer for JavaScript. Some of these projects rely on Js_of_ocaml, which is an excellent compiler from OCaml to JavaScript, to easily deploy OCaml code on the web.
OCaml project development
Like Java or Python, but unlike C or Ada, the OCaml programming language has a dominant implementation, and the user-exposed language evolves along with this implementation. This implementation, born in 1995, was developed inside a programming language research group at INRIA, a French public research institute. Several of these initial developers, in particular Xavier Leroy and Damien Doligez, are still actively maintaining the language. They have since been joined by other maintainers from academia or industry. There are currently 13 contributors with commit access — but none of them work full-time on the OCaml implementation. As a frequent contributor, I have commit access, but am not an expert or maintainer of any specific part of the compiler. I mostly help fixing bugs and by reviewing and integrating patches proposed by external contributors.
In the last few years, several changes were made to encourage external contributions to the compiler distribution. Both bug fixes and patch proposals were previously handled on an austere Mantis issue tracker. In January 2014, the project started accepting pull requests against the non-official GitHub mirror of the reference Subversion repository and, in October 2015, this GitHub repository became the official source repository of the project. Compiler hacking sessions were also organized, notably by Jeremy Yallop and Leo White in Cambridge UK, to encourage people already familiar with the language to jump in and make their first contribution to the compiler implementation.
These various efforts proved successful, as the number of external contributions to the compiler distribution code base has noticeably increased. Mantis never received more than a few dozen patches a year, but 130 GitHub pull requests were submitted in 2014, 254 in 2015, and there have already been 149 pull requests in 2016.
Having many changes proposed by external contributors can be a curse as much as a blessing, if the bandwidth of available maintainers is too restricted to effectively evaluate, provide feedback, and make decisions on the proposals. Of the 535 pull requests so far, only 82 are still open, which shows that the project is reasonably effective at managing those contributions.
While the evolution of the compiler implementation and tooling is relatively fast, the language designers have remained fairly conservative in accepting language changes. Strict compatibility requirements are expected; breaking existing user programs is almost never accepted. Tastefully evolving a programming language is a difficult craft, and some users legitimately complain that the changes are too slow — or too fast. OCaml users appreciate the language compromise between expressivity and simplicity; each new advanced feature risks moving the language outside this comfort zone.
Compiler optimizations
The OCaml compiler distribution has historically remained a relatively simple project. In particular, the compiler did not perform aggressive optimizations. Good performance for the generated program came instead from a few high-impact optimizations and a choice of data representation and garbage-collection strategies tailored to typical OCaml programs. Those programs generally have a high allocation rate of short-lived data structures, more immutable than mutable data, and more symbolic data traversal and transformation than numeric computations.
The relative simplicity of the compiler has been changing under the pressure of users moving into new problem domains, in particular numeric computations.
To achieve separate compilation of highly polymorphic functions, OCaml uses a uniform data representation where each value fits in one word of memory. Immediate values such as integers, booleans, or constructors of an algebraic data type that have no parameters, are represented as "tagged" words, with their lowest bit set — so that the garbage collector knows they are not pointers. All other values, such as records/structures, double-precision floating-point numbers, or constructors with parameters, are "boxed": allocated on the heap and represented by a pointer.
Tagging and boxing impose a runtime overhead on numeric operations; inside a function declaration, the compiler will remove redundant boxing and unboxing operations, so that intermediate computations do not perform (un)boxing or (un)tagging. The choice of which (un)boxing operations to insert and remove is called an "unboxing strategy". Inlining can significantly increase performance, as it widens the scope of these local optimizations.
The main change for the 4.03 release is the addition of a new intermediate representation to the compiler, called Flambda. It is designed to support more aggressive inlining and unboxing strategies, and more optimizations than were previously feasible.
This new Flambda middle-end is the result of a multi-year effort by Pierre Chambart; reviewing it and integrating it into the compiler code base was itself massive and has been led by Mark Shinwell and Leo White. The results are promising but there is still a margin for improvement. In particular, compilation times are noticeably degraded (generally twice slower; the OCaml compiler is fast, which makes it a difficult target to catch). It is also currently impossible to link code that uses Flambda with code that doesn't, as the format of compiled object files differs. For this reason, this new compilation pass is not enabled by default in 4.03. Hopefully, the forthcoming iterations of this work will reduce compilation time and implement new optimizations, making Flambda the better choice for everyone.
The tragedy of optimization writers is that performance-sensitive code that already exists in the wild is often written in a style that specifically caters to the limitations of the existing optimizer, so it will not see much improvement from new optimization passes. The hope is that new optimizations will make it easier to write high-performance code by enabling useful idioms that were previously judged too costly.
Stronger inlining capabilities are particularly useful in the context of a general push in the functional programming community toward deeper layering of abstractions and, in particular, a programming style. Monads let users implement libraries for many programming notions (mutable state, backtracking, cooperative concurrency, exceptions, continuations,...) that are traditionally either directly part of the language implementation, or not supported at all. This can be very flexible and expressive, but it also requires aggressive optimizations to be competitive performance-wise with direct language support for the same features. In particular, monads make heavy use of series of short-lived anonymous functions or data structures. GHC, the dominant implementation of the Haskell programming language, has long needed these optimizations to be competitive on performance because of a different idiomatic style relying more heavily on monads and a lazy evaluation strategy that tends to incur more bookkeeping.
In parallel to the Flambda work, Alain Frisch has long been working on improving unboxing strategies, in particular for floating-point code. Again, the idea is that more flexible unboxing strategies make it easier for programmers to write well-performing floating-point code without having to know all the low-level tricks to write code that is unboxed well. Several interesting changes from Frisch and Vladimir Brankov have been merged during 4.03; more unboxing work has already been merged in the development branch that will become the future 4.04 release. For more details, Frisch has a blog post that explains the general issues of OCaml value representation and the details of the past, present, and future of unboxing strategies. More unboxing ideas can be found in a blog post by Andy Wingo about Guile, a GNU implementation of the dynamically typed functional language Scheme.
Tools for performance reasoning
Another important theme in the 4.x series is the addition of first-class support of attributes and extensions that is being led by Frisch. Attributes are part of the standard syntax that do not directly affect the program semantics, but are handled by external tools and preprocessors to analyze and transform the code, not unlike Java annotations. Extensions are quoted program fragments that cannot be interpreted directly, but must be translated by compile-time preprocessors into valid code. Some attributes (and extensions) are handled by the compiler itself, so they serve as compiler pragmas; 4.03 supports new attributes intended to let users better control performance of their applications.
During one of the compiler hacking sessions, Simon
Cruanes contributed a new [@tailcall]
attribute that
expresses the intent that a given function call should benefit from
tail-call optimization; it results in a warning if the call is in
fact not optimized. Because programmers reason about tail-calls to
analyze the stack-consumption behavior of their programs, it is
important to let them check that their assumptions are
correct.
Jérémie Dimino implemented [@unboxed]
and [@untagged]
attributes to be used in the Foreign Function Interface, which
lets the compiler unbox or untag OCaml data representations before
passing them to C functions, instead of doing the transformation
manually on the C side. This can result in further optimizations, as
other boxing or tagging operations may cancel out.
Finally, the Flambda pipeline comes with an [@inline]
attribute
for a function declaration to require that calls to the function be inlined
and an [@inlined]
attribute to check that a particular call site is
inlined. This should also help programmers reason about compiler
optimizations, which is
a necessary evil when the optimizer becomes more aggressive and
correspondingly less predictable.
Garbage-collection improvements
A final important area of change is the garbage collector (GC) implementation. This is unusual in the OCaml world as the GC code has been performing well and has been fairly stable since its implementation by Doligez in the early 1990s. It is a precise, generational, incremental garbage collector. For more details on the OCaml garbage collector, see this online book chapter.
Doligez worked on improving the worst-case latency for the workloads of Jane Street, which is a finance company relying heavily on OCaml. The timing of this work happens to coincide with work in the Haskell world to decrease garbage-collector pauses for Facebook workloads. Some younger languages like Go tend to have invasive changes to their GC implementation on each new release, but OCaml and Haskell implementations have traditionally remained fairly stable. But performance-sensitive industrial applications can still suggest further changes and industrial users have the funding and workforce to get these delicate changes implemented and accepted upstream.
François Bobot contributed support for ephemerons, which are a generalization of the weak pointers already supported by the OCaml GC. Weak pointers are references that are not counted by the GC as implying that the memory pointed to is still alive. They are useful when caching results of computation: one may not want the presence of an object in the cache to prevent it from being collected if nobody else is using it. However, weak pointers do not suffice to implement associative maps where both keys and their associated values should be held weakly; see this short PDF abstract by Bobot for more details and the Wikipedia ephemeron article for information about support in other languages.
Other changes
The detailed 4.03 release Changelog contains slightly more than 250 change items, which are mostly smaller, unrelated fixes, improvements, or features. We cannot list all or even most of them, but will mention three more notable changes brought by this release.
The first change is inline records, implemented by Frisch with help from Jacques Garrigue, which provide a way to give field names to the parameters of discriminated union constructors. This feature brings usability benefits — names are good for readability — but also had performance justification, as it sometimes allows for more compact data representation than was previously possible. It is also the result of a discussion spanning two and a half years and hundreds of messages: language design through open discussion can be taxing.
The second change is the work of David Allsopp to
provide a wrapping of
Windows native support for
symbolic links,
as part of a general push by Allsopp to make some currently
Unix-specific OCaml applications to work better on Windows. It is
possible to write portable applications in OCaml, but the Unix-using
majority of developers have a hard time resisting the convenience of
symbolic links or fork()
calls.
The third change is a series of improvements by Florian Angeletti to the official reference manual that clarify the behavior of some more advanced features. Most languages' reference manuals or specifications tend to be exhaustive but rather terse, and user communities often fall back to tutorials, books, or blog posts to exchange more beginner-friendly information. Contributors motivated to improve the centralized reference documentation and make it more accessible are an invaluable resource.
Future releases
Major OCaml releases have traditionally occurred approximately every year and half, with some number of minor bug-fix releases in between. With the increased pace of change, this rhythm is becoming more stressful for Doligez, who is the release manager, as each release has more changes and increased risk of regressions. It is also frustrating for the external contributors who want others to be able to use their changes without waiting too long. Finally, it creates a perverse incentive to rush adding invasive changes shortly before the feature freeze, a few months before each release, to avoid waiting for another full release cycle.
The current plan for the coming releases is to move to a shorter release cycle — closer to a bi-annual major release. This requires a significant change in development practices, so it may take a few more releases to come into effect. But one could theoretically hope for a 4.04 release in late 2016.
Index entries for this article | |
---|---|
GuestArticles | Scherer, Gabriel |
Posted Apr 21, 2016 7:16 UTC (Thu)
by isido (subscriber, #6976)
[Link]
Posted Apr 21, 2016 10:06 UTC (Thu)
by ncm (guest, #165)
[Link]
Posted Apr 21, 2016 15:21 UTC (Thu)
by david.a.wheeler (subscriber, #72896)
[Link] (1 responses)
Posted Apr 22, 2016 13:52 UTC (Fri)
by gasche (subscriber, #74946)
[Link]
The history of unboxing optimisations in the OCaml compiler is actually interesting. In the nineties there was a lot of research on type-based optimizations, the general idea being to keep full typing information in the compiler pipeline and aggressively use typing information for optimization. Probably the best-known instance of this approach is the TIL project at CMU, but Xavier Leroy and the Caml group were also experimenting with type-based optimizations in experimental compilers around 1995. They found out that type-based optimizations often involve difficult trade-offs and complex implementations, and that in practice simpler, untyped unboxing optimizations could be easier to work with and more robust in practice. The OCaml compiler thus adopted an untyped unboxing approach, and more generally this informed the design of the compiler pipeline which tends to throw types away relatively early and rely on more local value analyses. (This is a counter-intuitive approach from a functional programmer's perspective, and it also has some downsides compared to a more typeful approach akin to GHC's Core language, but it definitely helped keep the compiler simpler and more flexible.)
You can read about the comparison between typed and untyped approaches in "The effectiveness of type-based unboxing", Xavier Leroy, 1997.
OCaml 4.03
OCaml 4.03
Excellent!
Excellent!
http://gallium.inria.fr/~xleroy/publi/unboxing-tic97.pdf