Assembling the history of Unix
Toomey, an Australian university lecturer, founded the Unix Heritage Society to reconstruct the early history of the Unix operating system. Recently this historical code has become much more accessible: we can now browse it in an instant on GitHub, thanks to the efforts of a computer science professor at the Athens University of Economics and Business named Diomidis Spinellis. The 50th anniversary of the invention of Unix will be in 2019; the painstaking work of Toomey and Spinellis makes it possible for us to appreciate Unix's epic story.
The Unix Heritage Society
Around 1993, while he was a researcher at the University of New South Wales, Toomey began asking on mailing lists and news groups for old Unix versions with the intent to run them on a PDP-11 simulator. He began a group called the PDP-11 Unix Preservation Society, whose mission grew to encompass all old Unix releases and was renamed the Unix Heritage Society in 2000. "I think the title is a bit grandiose," he said in an interview with LWN. "It's not really a society, just me and the mailing list."
Toomey's project faced two obstacles; the first was simply to locate enough parts of each old Unix version to assemble a complete copy. He haunted the newsgroups and mailing lists of old Unix hackers, and he heard rumors of people who knew where to get historical artifacts. Most of his requests went unanswered. He recalls spending five or six years repeatedly asking for specific files, until eventually someone would respond, "Oh, actually, I have it." By chance, Toomey discovered in his own university's computer room a dozen tapes with backups of the 6th and 7th Editions of Unix. The backups weren't bootable—there wasn't even a complete backup of either edition—but the discovery accelerated his project nevertheless.
His second obstacle was the long shadow of AT&T's original copyright. AT&T and other corporations allowed individuals to own copies of Unix, but not to share them. Toomey had found his university's copy of a System V source license, but this only provided a small bit of legal cover to ask strangers to share their vintage files with him. Occasionally, one of Toomey's inside informants might give him a 15-year-old copy of some file, saying, "Just don't tell anyone where you got it."
Whenever Toomey acquired what seemed to be a complete version of Unix, he had to get it up and running, without any documentation to guide him. "You've got an artifact," he said, "It might be a binary or source code and there's no Makefile, you've got no idea what was the right sequence of things to do to build it."
Last year, for example, Toomey and his friends from the Unix Heritage Society resuscitated the first version of Unix for the PDP-7, written in mid-1970. The primary source was a dot-matrix printout containing PDP-7 assembly code, badly printed with notes and corrections scribbled on it. The members of the society converted the blurred copy to digital text with an OCR program, but they knew there were transcription errors that they'd have to backtrack and fix. Undaunted, they proceeded to the next stage: they learned the syntax of PDP-7 assembly code and wrote an assembler to convert the badly scanned text to machine code.
Now, with a set of executable binaries, the team had to store them in a filesystem, and here they hit a circular dependency. They didn't know the binary format of the filesystem for that version of the Unix kernel. The kernel itself implemented this filesystem, but they had to get the kernel to boot in order to use it for that purpose. Toomey decided to use a PDP-7 simulator to reverse-engineer the basic layout of a bootable disk image, and wrote a tool to create such an image containing the executables that he and his friends had assembled. "It's chicken-and-egg, but you work in stages," he said. "You get one little bit working and you use that to leverage up the next bit."
Unix's two inventors have helped him along the way. "Ken Thompson is minimalist in his communication," Toomey said. When the Unix Heritage Society brought up a PDP-11 version of Unix, he sent Thompson a series of emails about it, to which Thompson responded with single-word messages: "Amazing," or, "Incredible." Toomey said that while Dennis Ritchie was alive, he enthusiastically supported the project. "I really miss him an awful lot."
The Unix History Repository on GitHub
It's valuable to preserve snapshots of old-fashioned systems, but these snapshots don't fit modern programmers' methods for exploring the history of an evolving code base. Today, we read history with tools like Git. Spinellis has imported over 44 years of Unix code history into Git and published the repository on GitHub. The project builds on Toomey's accomplishments, but Spinellis wants more than just the code: he is building a moment-by-moment history of its evolution, and line-by-line attribution of each author's contributions.
Unix was developed without any version control at first. When development moved to the University of California at Berkeley in the late 1970s, coders began tracking certain files in an early version control system called SCCS, but even then it was not used for all files. Spinellis reconstructed as much history as he could by importing entire snapshots of early Unix versions into Git as if they were single commits. He researched primary sources like publications, technical reports, man pages, or names written in comments in the source code to attribute particular parts of the code to their authors.
Since publishing the repository on GitHub, Spinellis has continued to refine it periodically. He recently discovered an author unacknowledged in the Git logs whose contributions he wants to add. This March, the copyright holders for Unix Research Editions 8, 9, and 10 granted permission to distribute those versions, so that history can now be integrated into the repository. Additionally, Spinellis points out that he only followed one Unix variant to its conclusion: FreeBSD. Other variants like NetBSD and OpenBSD are just as old and interesting; their stories could be added to the repository as distinct branches.
But why?
Both Spinellis and Toomey enjoy reading old Unix code to see how much power the early programmers could jam into a tiny memory footprint. For example, the PDP-7 Unix that Toomey recovered last year is a minimalist masterpiece. It is a recognizable Unix system, including the fork() and exec() system calls, multiple user accounts, file permissions, and a directory structure, all implemented in only 4000 words of memory."But it's really not the source code that's important," said Toomey. "It's the ideas that are embodied in it." AT&T's efforts to protect the Unix code were irrelevant, he said, because the real value lies in concepts like connecting small utilities together with pipes and implementing the system in a portable programming language.
Spinellis
agrees: when he examined the
7th edition of Unix from 1979
the 1970 edition of Unix, he saw that,
even though it was a bare prototype system, it already contained key
architectural elements of modern Unix, such as abstracting I/O, and
separating the kernel from the command-line interpreter. Within a few
years, even
more powerful concepts became visible: devices that appeared as files, a
hierarchical filesystem, and a shell that ran as a user process distinct
from the kernel. The very first Unix versions contained the basic ideas
that inspired the modern operating system that dominates computing today.
Index entries for this article | |
---|---|
GuestArticles | Davis, A. Jesse Jiryu |
Posted Jun 14, 2017 16:47 UTC (Wed)
by epa (subscriber, #39769)
[Link] (8 responses)
Posted Jun 14, 2017 17:04 UTC (Wed)
by zlynx (guest, #2285)
[Link] (6 responses)
Posted Jun 14, 2017 18:32 UTC (Wed)
by smoogen (subscriber, #97)
[Link] (4 responses)
Posted Jun 15, 2017 6:01 UTC (Thu)
by pbonzini (subscriber, #60935)
[Link] (1 responses)
Posted Jun 15, 2017 6:02 UTC (Thu)
by pbonzini (subscriber, #60935)
[Link]
Posted Jun 15, 2017 13:10 UTC (Thu)
by jond (subscriber, #37669)
[Link] (1 responses)
Posted Jun 15, 2017 14:04 UTC (Thu)
by smoogen (subscriber, #97)
[Link]
Posted Jun 15, 2017 6:37 UTC (Thu)
by epa (subscriber, #39769)
[Link]
Posted Jun 14, 2017 19:37 UTC (Wed)
by rgmoore (✭ supporter ✭, #75)
[Link]
It could be self-hosting if you used any of the versions of Unix in the repository. After all, we consider GNU/Linux to be self hosting even though we're using an up-to-date version as the host rather than Linux 0.01 and a correspondingly ancient version of GNU.
Posted Jun 15, 2017 13:09 UTC (Thu)
by jond (subscriber, #37669)
[Link]
Posted Jun 15, 2017 14:15 UTC (Thu)
by joey (guest, #328)
[Link] (4 responses)
Posted Jun 16, 2017 9:04 UTC (Fri)
by anselm (subscriber, #2796)
[Link] (1 responses)
It seems to me that Thompson is unwarrantedly prolix. I would think that as the author of ed(1) he'd just reply “?” (or perhaps “!”).
Posted Jul 6, 2017 3:33 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link]
Posted Jul 6, 2017 0:33 UTC (Thu)
by vomlehn (guest, #45588)
[Link] (1 responses)
Posted Jul 6, 2017 11:18 UTC (Thu)
by sdalley (subscriber, #18550)
[Link]
?
was without peer, setting new standards for usability...
Posted Jun 16, 2017 7:20 UTC (Fri)
by madhatter (subscriber, #4665)
[Link]
Posted Jun 16, 2017 15:26 UTC (Fri)
by epa (subscriber, #39769)
[Link] (2 responses)
What's the oldest fragment which has 'changed' over time but not been entirely removed, and is still there?
Are there any nontrivial lines of code (shall we say containing five or more words plus some punctuation) which date back to the beginning?
Does git provide a way to answer these questions easily?
Posted Jun 22, 2017 20:27 UTC (Thu)
by DSpinellis (guest, #116958)
[Link] (1 responses)
Posted Jun 23, 2017 13:45 UTC (Fri)
by nix (subscriber, #2304)
[Link]
Posted Jun 16, 2017 21:27 UTC (Fri)
by giraffedata (guest, #1954)
[Link] (3 responses)
Why wouldn't they just contribute this code to the public domain (with words like "contribute to the public domain") or maximally license it (with words like "grant permission to everyone to copy, etc. this code") or even waive copyright (with words like, "hereby waive their rights under copyright")? The code can't be worth anything other than for generating goodwill with such a gift.
Posted Jun 19, 2017 3:05 UTC (Mon)
by fratti (guest, #105722)
[Link] (2 responses)
Another guess that does not assume they know what they're doing: This was the legal department who has no real grasp on the actual value (excluding historic value) of the code, so they're being stingy because it's their job to be stingy.
Posted Jun 24, 2017 13:38 UTC (Sat)
by Wol (subscriber, #4433)
[Link] (1 responses)
Certainly as far as Pr1mos is concerned we know the copyright is held by one of two companies. Neither will sell their rights to the other because the one selling wants a lot of money for what the other considers not worth very much (this argument goes both ways :-(
And because nobody knows for certain who owns the copyright, nobody will do anything that could be a breach of copyright. (Both companies definitely hold a licence that gives them copyright-owner-like status, so keeping ancient copies maintained and working isn't a problem...)
Cheers,
Posted Jun 24, 2017 19:40 UTC (Sat)
by giraffedata (guest, #1954)
[Link]
By the way, I'd like to correct what I said about this promise not being enforceable for lack of consideration (i.e. nobody is giving the copyright owner anything for it). I forgot about promissory estoppel. If someone detrimentally relies on a clear promise, the promise is enforceable even without consideration, and that would certainly be the case if Tooney distributes Unix, thereby violating copyright, because he knew of this promise not to assert copyright.
Posted Jun 22, 2017 5:34 UTC (Thu)
by jtc (guest, #6246)
[Link] (6 responses)
I had always assumed that such important features and concepts (which were, or course, largely responsible for UNIX's reputation as a modern, well-designed operating system) were present in the very first version of UNIX in 1969, or at least in relatively early versions from, say, 1970, '71, or '72. I'm surprised to hear that this isn't the case.
Posted Jun 22, 2017 9:57 UTC (Thu)
by anselm (subscriber, #2796)
[Link]
You may wish to read Dennis Ritchie's paper, The Evolution of the Unix Time-sharing System.
Posted Jun 22, 2017 15:40 UTC (Thu)
by perlwolf (guest, #46060)
[Link] (2 responses)
Posted Jun 24, 2017 13:41 UTC (Sat)
by Wol (subscriber, #4433)
[Link] (1 responses)
That was Pr1mos v18. There was a major step change between v14 and v15, iirc, so it's reasonable to assume v15 was pretty modern too, whenever that was.
Cheers,
Posted Jul 6, 2017 0:37 UTC (Thu)
by vomlehn (guest, #45588)
[Link]
Posted Jun 24, 2017 15:34 UTC (Sat)
by DSpinellis (guest, #116958)
[Link] (1 responses)
Posted Jun 25, 2017 4:18 UTC (Sun)
by zlynx (guest, #2285)
[Link]
Posted Jun 30, 2017 8:18 UTC (Fri)
by tdz (subscriber, #58733)
[Link]
Assembling the history of Unix
Assembling the history of Unix
Assembling the history of Unix
Assembling the history of Unix
Assembling the history of Unix
Assembling the history of Unix
Assembling the history of Unix
Assembling the history of Unix
Assembling the history of Unix
Assembling the history of Unix
Assembling the history of Unix
Thompson responded with single-word messages: "Amazing," or, "Incredible."
Over the years I've noticed that I tend to write more concisely on technical topics than before. Now I see the end point.
Assembling the history of Unix
Assembling the history of Unix
Assembling the history of Unix
Assembling the history of Unix
Assembling the history of Unix
Code archaeology queries
Git can provide the answer with git blame, but it takes many hours to run this command on the whole repository. The oldest surviving piece of code in a 2016 version of FreeBSD Unix was probably written by Dennis Ritchie on 1979-01-10 in the file usr/src/libc/gen/timezone.c. You can see the code in Figure 6 in the article A Repository of Unix History and Evolution1 .
Code archaeology queries
Code archaeology queries
The copyright owners of Unix are being surprisingly stingy. The article claims they granted permission to distribute ancient Unix, but the linked instrument from then (http://minnie.tuhs.org/pipermail/tuhs/2017-March/009354.html) does not actually do that. It says they "agree not to assert" their rights. That's not a copyright license; it's half of a contract. And because the other half of the contract, where someone gives the copyright owners something in consideration of that promise, is missing, it has no legal weight.
Assembling the history of Unix - copyright
Assembling the history of Unix - copyright
Assembling the history of Unix - copyright
Wol
I don't think not being sure they have copyright is a likely explanation for this company promising not to assert copyright instead of actually licensing the code or waiving copyright. I'm sure their lawyers know they can avoid any risk of not actually having copyright just by adding a few words. "To the extent we have copyright ..." "We waive whatever copyright we might have ..." "We make no warranty that we have copyright ..." In fact, such words are in that promise not to assert.
Assembling the history of Unix - copyright
"... when he examined the 7th edition of Unix from 1979 ... it already contained key architectural elements of modern Unix, such as abstracting I/O, and separating the kernel from the command-line interpreter. Within a few years, even more powerful concepts became visible: devices that appeared as files, a hierarchical filesystem, and a shell that ran as a user process distinct from the kernel."
Assembling the history of Unix
Assembling the history of Unix
Assembling the history of Unix
Assembling the history of Unix
Wol
Assembling the history of Unix
Assembling the history of Unix
Assembling the history of Unix
Assembling the history of Unix