July 25, 2020
In some earlier posts, I talked about ways in which the experience of programming has not improved as I would have expected when looking forward from 30 years ago. There are many reasons for this, one of which could be that I am being unreasonable, but a big reason is that software is often not written for the long-term. We need to think in decades instead of focusing on the next release. Literate programming feels right, but there aren't many practitioners.
The term "literate programming" was coined by Donald Knuth in the 1980s. As he said (Knuth, Literate Programming, 1992, p. 99),
Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.
The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style.
The statement above is more of a philosophical attitude than a definition. In practice, literate programming has come to mean the use of tools of a certain type. The crucial feature is that code and documentation appear in the same file.
Literate programming started with a tool called WEB
, which
Knuth designed and used extensively. Under the WEB
system, a single
.web
file provides both the source code and the documentation. The
tangle
command converts a .web
file to the
source code; and the weave
command converts the same .web
file to the documentation. WEB
was originally written for Pascal, but
CWEB
is also available to work with C and C++, and there's also the
noweb
system, which can work with a variety of languages.
WEB
and CWEB
are closely allied with TeX, another program
written by Knuth. TeX is used to typeset complex documents, particularly those involving
sophisticated mathematics. For more than 30 years, TeX has been the standard tool
for writing in mathematics, physics and computer science. Not much software has had such
longevity.
The TeX system was written using the WEB
framework, and one way to see what both
WEB
and TeX are capable of is to read Knuth's
TeX: The Program, which provides TeX's source code with documentation. The book was
produced from a .web
file; the tangle
command converted it to a
.tex
file, which then went through the TeX program to produce the book. The same
file was used with the weave
command to generate the source code fed to the compiler
to create the executable. Read
The TeXbook for TeX user documentation – although, these days, a book on
LaTeX is probably a better place to start since LaTeX is built on top of TeX, and is
easier to use for most tasks.
The WEB
system uses a few simple commands, and going too deeply into how it
works isn't necessary. Basically, it allows you to write the code in an order that's
more natural with regard to explaining how it works, even if that order conflicts
with what the compiler or interpreter requires. WEB
also has features to help with cross-references
within the code and documentation, building an index, and so forth. One of the nicest aspects of
WEB
is that TeX's powerful layout framework is available to help compose the documentation,
including complex formulas and diagrams
Literate programming sounds great, and it points a finger at a glaring problem that cries out for a solution, but it hasn't been widely adopted. Based on my own experience, the reasons literate programming never took off include the following.
WEB
-like tools don't work well with IDEs. IDEs didn't exist when Knuth came
up with WEB
, so people didn't rely on them the way we do today. Spreading the source code
throughout a document, intermixed with commands related to the documentation, makes it harder
for an IDE to do the things we've come to expect. In theory, it's possible to make an IDE that
works with .web
files, but it adds another layer of compexity. It's a chicken-and-egg
problem; people won't use literate programming tools without a good IDE, and
the IDEs don't exist because the tools aren't used.
A .web
document doesn't look right. Programmers like the code to look a certain way,
and WEB
scrambles that. We take important cues, often unconsiously, from the way the
code looks. Even something minor like the "tabs versus spaces" debate winds people up, and a
.web
file looks much different than ordinary source code.
The WEB
-oriented approach came into being at a time when source files tended to be
more monolithic. Today, we often work with numerous smaller files, and projects may
involve a variety of different languages.
We're more used to doing all of our work on a screen now, without any paper at all. In the 1980s, it wasn't unusual (at least for me) to print out many pages of code to help to understand it. The small monitors of the time made it difficult to grasp the big picture, and scrolling was slow.
Another factor, which may sound silly today, is that it was breathtaking to see your work in a form suitable for the Journal of the ACM, even if the code itself was awful. Today, a paper print-out is comparatively harder to read since we have things like syntax coloring when reading on a monitor.
With rare exception, even a carefully written book is harder to understand and navigate than well-written code with an IDE. The ways in which we want to jump around are too varied to be handled by anything like an index or cross-reference tables in a book – flipping through pages while using all five fingers as a crude LIFO stack only works up to a point. An IDE makes it easy to find where functions are defined, find every place where that function is called, etc.
In any case, WEB
and WEB
-like tools never gained much traction.
It's hard to argue with the aim of these tools, but they haven't worked out.
Unfortunately, the phrase "literate programming" has become closely tied to these tools
and their particular ways of doing things.
Another problem with the word "literate" is the kind of images it brings to mind. The word conjures up the notion that, with enough effort and the right tools, code can read like a haiku expressing the feeling of waking up on a crisp morning surrounded by a dewy forest. The reality is that code is rarely anything like the dew-touched branch of a cedar; sometimes it's more like a pinball machine overseen by a drug-addicted carny on the midway of Hell's State Fair.
It's also true that software and documentation can never be literate in the same way that prose may be. Computer programs can't be made into literature, unless you consider the telephone book to be literature. The idea of numbered gadgets (telephones) that allow us to talk to each other is a gem of an idea, and the beauty of the idea is illustrated by the fact that so many of these gadgets are in use, but a telephone book is not interesting to read. Any gems of clarity and beauty found in source code are often buried in a mountain of plumbing, and the plumbing is every bit as important as these gems.
A more apt comparison might be with calculus textbooks. There are a small number of ideas that make calculus work, but it's hard to fully appreciate them without understanding how they relate to a long list of other topics. The epsilon-delta definition of continuity is a gem that can be expressed in a sentence or two, but understanding it takes several chapters – or entire volumes to really understand it.
Although literate programming tools are not widely used, it's easy to agree with the goal. Exposition is the crucial ingredient of well-written software.
Since "literate" is taken, and because it sounds impracticably lofty, a new term is needed. Compelling programming aims for quality of exposition, but without limiting the toolset. Fleshing out the meaning of what it means to write software that's compelling is the subject of the next post(s).