This project aims at applying techniques from deductive program verification and analysis of tree transformations to the problem of analyzing shell scripts, in particular those that are used in software installation. These scripts play a central role in the initial installation of UNIX and Linux machines, as well as in recurrent configuration, installation, and software maintenance tasks. Scripts interact with the state of the operating system, in particular via the file system.
Errors in scripts may have catastrophic consequences due to the fact that they are often executed at crucial moments, like the update of software packages that are installed on a UNIX-like machine. Scripts must work correctly in a multitude of different situations that probably have not been planned by the author of the script. For instance, installation scripts have to work correctly regardless of the choice of packages that currently are installed on the system, and they also have to work correctly in abnormal situations, for instance when the system administrator tries to resume an operation that was previously interrupted.
We see three main challenges:
At the end of the project we want to produce tools that formally analyse and attempt to verify scripts, possibly giving assurance of correctness, or pin-pointing possible errors. This is an ambitious goal - we certainly will have do compromises, make hypotheses on the structure of the scripts, and do some approximations of the file system and the behavior of system commands. These assumptions and approximations will be guided by use cases of installation scripts from the Debian GNU/Linux distribution.
We will start by laying out the formal models of the shell script language and the file system data structure, and by developing a language for expressing properties of scripts to be verified. We will pursue two avenues of formal techniques: one based on tree transducers which are a formal system of transformations on trees (like XML trees, or file system trees), and the other one using deductive program verification techniques as proposed by the Why3 environment. We will also aim at combining these two technologies.