[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
|
|
Subscribe / Log in / New account

Assembling the history of Unix

June 14, 2017

This article was contributed by A. Jesse Jiryu Davis

The moment when an antique operating system that has not run in decades boots and presents a command prompt is thrilling for Warren Toomey. He compares it to restoring an old Model-T. "An old car looks pretty, but at the end of the day its purpose is to drive you somewhere. I love being able to turn the engine over and actually get it to do its job."

Toomey, an Australian university lecturer, founded the Unix Heritage Society to reconstruct the early history of the Unix operating system. Recently this historical code has become much more accessible: we can now browse it in an instant on GitHub, thanks to the efforts of a computer science professor at the Athens University of Economics and Business named Diomidis Spinellis. The 50th anniversary of the invention of Unix will be in 2019; the painstaking work of Toomey and Spinellis makes it possible for us to appreciate Unix's epic story.

The Unix Heritage Society

Around 1993, while he was a researcher at the University of New South Wales, Toomey began asking on mailing lists and news groups for old Unix versions with the intent to run them on a PDP-11 simulator. He began a group called the PDP-11 Unix Preservation Society, whose mission grew to encompass all old Unix releases and was renamed the Unix Heritage Society in 2000. "I think the title is a bit grandiose," he said in an interview with LWN. "It's not really a society, just me and the mailing list."

Toomey's project faced two obstacles; the first was simply to locate enough parts of each old Unix version to assemble a complete copy. He haunted the newsgroups and mailing lists of old Unix hackers, and he heard rumors of people who knew where to get historical artifacts. Most of his requests went unanswered. He recalls spending five or six years repeatedly asking for specific files, until eventually someone would respond, "Oh, actually, I have it." By chance, Toomey discovered in his own university's computer room a dozen tapes with backups of the 6th and 7th Editions of Unix. The backups weren't bootable—there wasn't even a complete backup of either edition—but the discovery accelerated his project nevertheless.

His second obstacle was the long shadow of AT&T's original copyright. AT&T and other corporations allowed individuals to own copies of Unix, but not to share them. Toomey had found his university's copy of a System V source license, but this only provided a small bit of legal cover to ask strangers to share their vintage files with him. Occasionally, one of Toomey's inside informants might give him a 15-year-old copy of some file, saying, "Just don't tell anyone where you got it."

Whenever Toomey acquired what seemed to be a complete version of Unix, he had to get it up and running, without any documentation to guide him. "You've got an artifact," he said, "It might be a binary or source code and there's no Makefile, you've got no idea what was the right sequence of things to do to build it."

Last year, for example, Toomey and his friends from the Unix Heritage Society resuscitated the first version of Unix for the PDP-7, written in mid-1970. The primary source was a dot-matrix printout containing PDP-7 assembly code, badly printed with notes and corrections scribbled on it. The members of the society converted the blurred copy to digital text with an OCR program, but they knew there were transcription errors that they'd have to backtrack and fix. Undaunted, they proceeded to the next stage: they learned the syntax of PDP-7 assembly code and wrote an assembler to convert the badly scanned text to machine code.

Now, with a set of executable binaries, the team had to store them in a filesystem, and here they hit a circular dependency. They didn't know the binary format of the filesystem for that version of the Unix kernel. The kernel itself implemented this filesystem, but they had to get the kernel to boot in order to use it for that purpose. Toomey decided to use a PDP-7 simulator to reverse-engineer the basic layout of a bootable disk image, and wrote a tool to create such an image containing the executables that he and his friends had assembled. "It's chicken-and-egg, but you work in stages," he said. "You get one little bit working and you use that to leverage up the next bit."

Unix's two inventors have helped him along the way. "Ken Thompson is minimalist in his communication," Toomey said. When the Unix Heritage Society brought up a PDP-11 version of Unix, he sent Thompson a series of emails about it, to which Thompson responded with single-word messages: "Amazing," or, "Incredible." Toomey said that while Dennis Ritchie was alive, he enthusiastically supported the project. "I really miss him an awful lot."

The Unix History Repository on GitHub

It's valuable to preserve snapshots of old-fashioned systems, but these snapshots don't fit modern programmers' methods for exploring the history of an evolving code base. Today, we read history with tools like Git. Spinellis has imported over 44 years of Unix code history into Git and published the repository on GitHub. The project builds on Toomey's accomplishments, but Spinellis wants more than just the code: he is building a moment-by-moment history of its evolution, and line-by-line attribution of each author's contributions.

Unix was developed without any version control at first. When development moved to the University of California at Berkeley in the late 1970s, coders began tracking certain files in an early version control system called SCCS, but even then it was not used for all files. Spinellis reconstructed as much history as he could by importing entire snapshots of early Unix versions into Git as if they were single commits. He researched primary sources like publications, technical reports, man pages, or names written in comments in the source code to attribute particular parts of the code to their authors.

Since publishing the repository on GitHub, Spinellis has continued to refine it periodically. He recently discovered an author unacknowledged in the Git logs whose contributions he wants to add. This March, the copyright holders for Unix Research Editions 8, 9, and 10 granted permission to distribute those versions, so that history can now be integrated into the repository. Additionally, Spinellis points out that he only followed one Unix variant to its conclusion: FreeBSD. Other variants like NetBSD and OpenBSD are just as old and interesting; their stories could be added to the repository as distinct branches.

But why?

Both Spinellis and Toomey enjoy reading old Unix code to see how much power the early programmers could jam into a tiny memory footprint. For example, the PDP-7 Unix that Toomey recovered last year is a minimalist masterpiece. It is a recognizable Unix system, including the fork() and exec() system calls, multiple user accounts, file permissions, and a directory structure, all implemented in only 4000 words of memory.

"But it's really not the source code that's important," said Toomey. "It's the ideas that are embodied in it." AT&T's efforts to protect the Unix code were irrelevant, he said, because the real value lies in concepts like connecting small utilities together with pipes and implementing the system in a portable programming language.

Spinellis agrees: when he examined the 7th edition of Unix from 1979 the 1970 edition of Unix, he saw that, even though it was a bare prototype system, it already contained key architectural elements of modern Unix, such as abstracting I/O, and separating the kernel from the command-line interpreter. Within a few years, even more powerful concepts became visible: devices that appeared as files, a hierarchical filesystem, and a shell that ran as a user process distinct from the kernel. The very first Unix versions contained the basic ideas that inspired the modern operating system that dominates computing today.

Index entries for this article
GuestArticlesDavis, A. Jesse Jiryu


to post comments

Assembling the history of Unix

Posted Jun 14, 2017 16:47 UTC (Wed) by epa (subscriber, #39769) [Link] (8 responses)

Does this mean they will also backport git (or a subset of it) to version 1 UNIX so that the repository can become self-hosting?

Assembling the history of Unix

Posted Jun 14, 2017 17:04 UTC (Wed) by zlynx (guest, #2285) [Link] (6 responses)

Didn't Unix v1 have things like 6 character limits on file names? That would make storing files by SHA1 hash a bit difficult.

Assembling the history of Unix

Posted Jun 14, 2017 18:32 UTC (Wed) by smoogen (subscriber, #97) [Link] (4 responses)

Maybe they would have git-6 which would use checksum to get the hash codes :). Only 65536 files allowed but that is probably an overflow elsewhere anyway.

Assembling the history of Unix

Posted Jun 15, 2017 6:01 UTC (Thu) by pbonzini (subscriber, #60935) [Link] (1 responses)

You can use SHA1 and truncate it. With two-character directory names and Sid-character file names, you can store 2^32 files/trees/commits. After 2^16 on average you'll get a collision, but you'll probably have filled up your disk first.

Assembling the history of Unix

Posted Jun 15, 2017 6:02 UTC (Thu) by pbonzini (subscriber, #60935) [Link]

Six-character. My phone is a Debian user apparently, or a Sex Pistols fan.

Assembling the history of Unix

Posted Jun 15, 2017 13:10 UTC (Thu) by jond (subscriber, #37669) [Link] (1 responses)

lol I like this, but my immediate thoughts on seeing "git-6" were drawing parallels to IPv6 and I was imagining 128-bit hashes rather than 6 character.

Assembling the history of Unix

Posted Jun 15, 2017 14:04 UTC (Thu) by smoogen (subscriber, #97) [Link]

Well in true Unix fashion I expect the program to be called gt as having to type the i is superfluous use of the right hand in typing.

Assembling the history of Unix

Posted Jun 15, 2017 6:37 UTC (Thu) by epa (subscriber, #39769) [Link]

You could use subdirectories.

Assembling the history of Unix

Posted Jun 14, 2017 19:37 UTC (Wed) by rgmoore (✭ supporter ✭, #75) [Link]

It could be self-hosting if you used any of the versions of Unix in the repository. After all, we consider GNU/Linux to be self hosting even though we're using an up-to-date version as the host rather than Linux 0.01 and a correspondingly ancient version of GNU.

Assembling the history of Unix

Posted Jun 15, 2017 13:09 UTC (Thu) by jond (subscriber, #37669) [Link]

Spinellis wrote an excellent book called "Code Reading: The Open Source Perspective" which I personally recommend; and another called "Code Quality" that I've yet to read.

Assembling the history of Unix

Posted Jun 15, 2017 14:15 UTC (Thu) by joey (guest, #328) [Link] (4 responses)

Thompson responded with single-word messages: "Amazing," or, "Incredible."
Over the years I've noticed that I tend to write more concisely on technical topics than before. Now I see the end point.

Assembling the history of Unix

Posted Jun 16, 2017 9:04 UTC (Fri) by anselm (subscriber, #2796) [Link] (1 responses)

It seems to me that Thompson is unwarrantedly prolix. I would think that as the author of ed(1) he'd just reply “?” (or perhaps “!”).

Assembling the history of Unix

Posted Jul 6, 2017 3:33 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

Well, there is the (apparently unlikely to be true) anecdote of the two-character correspondence[1]. Maybe Thompson will make it happen?

[1]http://quoteinvestigator.com/2014/06/14/exclamation/

Assembling the history of Unix

Posted Jul 6, 2017 0:33 UTC (Thu) by vomlehn (guest, #45588) [Link] (1 responses)

y

Assembling the history of Unix

Posted Jul 6, 2017 11:18 UTC (Thu) by sdalley (subscriber, #18550) [Link]

And his lucid and informative

?

was without peer, setting new standards for usability...

Assembling the history of Unix

Posted Jun 16, 2017 7:20 UTC (Fri) by madhatter (subscriber, #4665) [Link]

I doubt I'd've come across this work without reading this article: many thanks indeed!

Code archaeology queries

Posted Jun 16, 2017 15:26 UTC (Fri) by epa (subscriber, #39769) [Link] (2 responses)

So, what's the oldest fragment of code (say, ten lines or more) surviving unchanged still in FreeBSD?

What's the oldest fragment which has 'changed' over time but not been entirely removed, and is still there?

Are there any nontrivial lines of code (shall we say containing five or more words plus some punctuation) which date back to the beginning?

Does git provide a way to answer these questions easily?

Code archaeology queries

Posted Jun 22, 2017 20:27 UTC (Thu) by DSpinellis (guest, #116958) [Link] (1 responses)

Git can provide the answer with git blame, but it takes many hours to run this command on the whole repository. The oldest surviving piece of code in a 2016 version of FreeBSD Unix was probably written by Dennis Ritchie on 1979-01-10 in the file usr/src/libc/gen/timezone.c. You can see the code in Figure 6 in the article A Repository of Unix History and Evolution1 .

Code archaeology queries

Posted Jun 23, 2017 13:45 UTC (Fri) by nix (subscriber, #2304) [Link]

Again, though, this has run up against the resolution limits of the technique. In the same article we have changes by Ken Thompson in usr/sys/sys/pipe.c at 1979-01-10 15:19:35. Does anyone really believe this was written twenty minutes after the stuff in timezone.c? The dates come from the dates on tarballs, dates from Peter Salus, dates on ancient tapes, even dates scribbled on random ancient printouts. (An amazing achievement!)

Assembling the history of Unix - copyright

Posted Jun 16, 2017 21:27 UTC (Fri) by giraffedata (guest, #1954) [Link] (3 responses)

The copyright owners of Unix are being surprisingly stingy. The article claims they granted permission to distribute ancient Unix, but the linked instrument from then (http://minnie.tuhs.org/pipermail/tuhs/2017-March/009354.html) does not actually do that. It says they "agree not to assert" their rights. That's not a copyright license; it's half of a contract. And because the other half of the contract, where someone gives the copyright owners something in consideration of that promise, is missing, it has no legal weight.

Why wouldn't they just contribute this code to the public domain (with words like "contribute to the public domain") or maximally license it (with words like "grant permission to everyone to copy, etc. this code") or even waive copyright (with words like, "hereby waive their rights under copyright")? The code can't be worth anything other than for generating goodwill with such a gift.

Assembling the history of Unix - copyright

Posted Jun 19, 2017 3:05 UTC (Mon) by fratti (guest, #105722) [Link] (2 responses)

One guess that assumes they know what they're doing: Defensive IP hoarding, possibly. They can threaten anyone who plans to sue them for intellectual property violations that they in turn must be violating some ancient copyright or patent they have somewhere, and as with all IP matters, only a lengthy court battle would answer the question as to whether the claim is valid.

Another guess that does not assume they know what they're doing: This was the legal department who has no real grasp on the actual value (excluding historic value) of the code, so they're being stingy because it's their job to be stingy.

Assembling the history of Unix - copyright

Posted Jun 24, 2017 13:38 UTC (Sat) by Wol (subscriber, #4433) [Link] (1 responses)

Another guess (and this, sadly, is the fate of Pr1mos). They think they are the copyright holders but they don't know for certain ...

Certainly as far as Pr1mos is concerned we know the copyright is held by one of two companies. Neither will sell their rights to the other because the one selling wants a lot of money for what the other considers not worth very much (this argument goes both ways :-(

And because nobody knows for certain who owns the copyright, nobody will do anything that could be a breach of copyright. (Both companies definitely hold a licence that gives them copyright-owner-like status, so keeping ancient copies maintained and working isn't a problem...)

Cheers,
Wol

Assembling the history of Unix - copyright

Posted Jun 24, 2017 19:40 UTC (Sat) by giraffedata (guest, #1954) [Link]

I don't think not being sure they have copyright is a likely explanation for this company promising not to assert copyright instead of actually licensing the code or waiving copyright. I'm sure their lawyers know they can avoid any risk of not actually having copyright just by adding a few words. "To the extent we have copyright ..." "We waive whatever copyright we might have ..." "We make no warranty that we have copyright ..." In fact, such words are in that promise not to assert.

By the way, I'd like to correct what I said about this promise not being enforceable for lack of consideration (i.e. nobody is giving the copyright owner anything for it). I forgot about promissory estoppel. If someone detrimentally relies on a clear promise, the promise is enforceable even without consideration, and that would certainly be the case if Tooney distributes Unix, thereby violating copyright, because he knew of this promise not to assert copyright.

Assembling the history of Unix

Posted Jun 22, 2017 5:34 UTC (Thu) by jtc (guest, #6246) [Link] (6 responses)

"... when he examined the 7th edition of Unix from 1979 ... it already contained key architectural elements of modern Unix, such as abstracting I/O, and separating the kernel from the command-line interpreter. Within a few years, even more powerful concepts became visible: devices that appeared as files, a hierarchical filesystem, and a shell that ran as a user process distinct from the kernel."

I had always assumed that such important features and concepts (which were, or course, largely responsible for UNIX's reputation as a modern, well-designed operating system) were present in the very first version of UNIX in 1969, or at least in relatively early versions from, say, 1970, '71, or '72. I'm surprised to hear that this isn't the case.

Assembling the history of Unix

Posted Jun 22, 2017 9:57 UTC (Thu) by anselm (subscriber, #2796) [Link]

You may wish to read Dennis Ritchie's paper, The Evolution of the Unix Time-sharing System.

Assembling the history of Unix

Posted Jun 22, 2017 15:40 UTC (Thu) by perlwolf (guest, #46060) [Link] (2 responses)

They were all present in version 5, circa 1973. A file descriptor could be used to read/write a file, pipeline, or a device; the shell was a regular program, the only OS support for the shell was fork/exec and copying open file descriptors during fork so that the child could (in user space) change any as needed before doing an exec. These capabilities are all described in Ritchie's ACM paper (1973, I think) and they were surely present from the very early phases - well before the version 5 I first worked on.

Assembling the history of Unix

Posted Jun 24, 2017 13:41 UTC (Sat) by Wol (subscriber, #4433) [Link] (1 responses)

Bear in mind, Unix was a cut-down version of Multics. And these ideas probably came from Multics. I worked on a Multics-derivative from 1982, and for a text-based machine it was very modern, with accounts, a command shell, a shell programming language, etc etc.

That was Pr1mos v18. There was a major step change between v14 and v15, iirc, so it's reasonable to assume v15 was pretty modern too, whenever that was.

Cheers,
Wol

Assembling the history of Unix

Posted Jul 6, 2017 0:37 UTC (Thu) by vomlehn (guest, #45588) [Link]

I wouldn't characterize UNIX as a "cut-down" version of Multics. It was a reaction to what was perceived as the bloat of Multics, but was not intended to be a subset.

Assembling the history of Unix

Posted Jun 24, 2017 15:34 UTC (Sat) by DSpinellis (guest, #116958) [Link] (1 responses)

The reference was to the PDP-7 1970 edition, not to the 7th Edition. It is remarkable that so many far-reaching technological achievements were implemented so early in the lifetime of Unix with remarkably meagre resources. (As an example, the PDP-7 assembly language lacks an "immediate" mode. To store a constant value to a register you first have to initialize some part of memory with that value and load it from there.)

Assembling the history of Unix

Posted Jun 25, 2017 4:18 UTC (Sun) by zlynx (guest, #2285) [Link]

Initialize the memory and load it from there sounds harder than it is though. A load immediate instruction in a lot of older CPUs was just "load from program counter and increment". The "memory" being loaded was just the following byte from the program. Pretty much identical to "load from register X and increment"

Assembling the history of Unix

Posted Jun 30, 2017 8:18 UTC (Fri) by tdz (subscriber, #58733) [Link]

This is so cool. Thank you!


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds