Literate Programming

July 25, 2020

In some earlier posts, I talked about ways in which the experience of programming has not improved as I would have expected when looking forward from 30 years ago. There are many reasons for this, one of which could be that I am being unreasonable, but a big reason is that software is often not written for the long-term. We need to think in decades instead of focusing on the next release. Literate programming feels right, but there aren't many practitioners.

What is Literate Programming?

The term "literate programming" was coined by Donald Knuth in the 1980s. As he said (Knuth, Literate Programming, 1992, p. 99),

Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.

The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style.

The statement above is more of a philosophical attitude than a definition. In practice, literate programming has come to mean the use of tools of a certain type. The crucial feature is that code and documentation appear in the same file.

Literate programming started with a tool called WEB, which Knuth designed and used extensively. Under the WEB system, a single .web file provides both the source code and the documentation. The tangle command converts a .web file to the source code; and the weave command converts the same .web file to the documentation. WEB was originally written for Pascal, but CWEB is also available to work with C and C++, and there's also the noweb system, which can work with a variety of languages.

WEB and CWEB are closely allied with TeX, another program written by Knuth. TeX is used to typeset complex documents, particularly those involving sophisticated mathematics. For more than 30 years, TeX has been the standard tool for writing in mathematics, physics and computer science. Not much software has had such longevity.

The TeX system was written using the WEB framework, and one way to see what both WEB and TeX are capable of is to read Knuth's TeX: The Program, which provides TeX's source code with documentation. The book was produced from a .web file; the tangle command converted it to a .tex file, which then went through the TeX program to produce the book. The same file was used with the weave command to generate the source code fed to the compiler to create the executable. Read The TeXbook for TeX user documentation – although, these days, a book on LaTeX is probably a better place to start since LaTeX is built on top of TeX, and is easier to use for most tasks.

The WEB system uses a few simple commands, and going too deeply into how it works isn't necessary. Basically, it allows you to write the code in an order that's more natural with regard to explaining how it works, even if that order conflicts with what the compiler or interpreter requires. WEB also has features to help with cross-references within the code and documentation, building an index, and so forth. One of the nicest aspects of WEB is that TeX's powerful layout framework is available to help compose the documentation, including complex formulas and diagrams

Theory and Practice

Literate programming sounds great, and it points a finger at a glaring problem that cries out for a solution, but it hasn't been widely adopted. Based on my own experience, the reasons literate programming never took off include the following.

WEB-like tools don't work well with IDEs. IDEs didn't exist when Knuth came up with WEB, so people didn't rely on them the way we do today. Spreading the source code throughout a document, intermixed with commands related to the documentation, makes it harder for an IDE to do the things we've come to expect. In theory, it's possible to make an IDE that works with .web files, but it adds another layer of compexity. It's a chicken-and-egg problem; people won't use literate programming tools without a good IDE, and the IDEs don't exist because the tools aren't used.

A .web document doesn't look right. Programmers like the code to look a certain way, and WEB scrambles that. We take important cues, often unconsiously, from the way the code looks. Even something minor like the "tabs versus spaces" debate winds people up, and a .web file looks much different than ordinary source code.

The WEB-oriented approach came into being at a time when source files tended to be more monolithic. Today, we often work with numerous smaller files, and projects may involve a variety of different languages.

We're more used to doing all of our work on a screen now, without any paper at all. In the 1980s, it wasn't unusual (at least for me) to print out many pages of code to help to understand it. The small monitors of the time made it difficult to grasp the big picture, and scrolling was slow.

Another factor, which may sound silly today, is that it was breathtaking to see your work in a form suitable for the Journal of the ACM, even if the code itself was awful. Today, a paper print-out is comparatively harder to read since we have things like syntax coloring when reading on a monitor.

With rare exception, even a carefully written book is harder to understand and navigate than well-written code with an IDE. The ways in which we want to jump around are too varied to be handled by anything like an index or cross-reference tables in a book – flipping through pages while using all five fingers as a crude LIFO stack only works up to a point. An IDE makes it easy to find where functions are defined, find every place where that function is called, etc.

In any case, WEB and WEB-like tools never gained much traction. It's hard to argue with the aim of these tools, but they haven't worked out. Unfortunately, the phrase "literate programming" has become closely tied to these tools and their particular ways of doing things.

Terminology

Another problem with the word "literate" is the kind of images it brings to mind. The word conjures up the notion that, with enough effort and the right tools, code can read like a haiku expressing the feeling of waking up on a crisp morning surrounded by a dewy forest. The reality is that code is rarely anything like the dew-touched branch of a cedar; sometimes it's more like a pinball machine overseen by a drug-addicted carny on the midway of Hell's State Fair.

It's also true that software and documentation can never be literate in the same way that prose may be. Computer programs can't be made into literature, unless you consider the telephone book to be literature. The idea of numbered gadgets (telephones) that allow us to talk to each other is a gem of an idea, and the beauty of the idea is illustrated by the fact that so many of these gadgets are in use, but a telephone book is not interesting to read. Any gems of clarity and beauty found in source code are often buried in a mountain of plumbing, and the plumbing is every bit as important as these gems.

A more apt comparison might be with calculus textbooks. There are a small number of ideas that make calculus work, but it's hard to fully appreciate them without understanding how they relate to a long list of other topics. The epsilon-delta definition of continuity is a gem that can be expressed in a sentence or two, but understanding it takes several chapters – or entire volumes to really understand it.

Compelling Programming

Although literate programming tools are not widely used, it's easy to agree with the goal. Exposition is the crucial ingredient of well-written software.

Since "literate" is taken, and because it sounds impracticably lofty, a new term is needed. Compelling programming aims for quality of exposition, but without limiting the toolset. Fleshing out the meaning of what it means to write software that's compelling is the subject of the next post(s).

Contact