Move Carefully and Build Things
August 16, 2020
Earlier posts introduced the idea of compelling programming without defining it, beyond the general idea that it's something good and making the observation that literate programming is on the right track. One way to get a handle on what makes for compelling programming is to examine some features of outstandingly well-written software.
There's no point in rehashing every presentation that's been done on best practices. I've been critical of Agile, Scrum and the like. There are lots of good books and websites on development strategies like these, and they can provide useful insights. What I object to is the idea that there's a magic path to perfection, if only we could discover it and stay on it. The gurus of Scrum (for instance) haven't said anything that hasn't been said before, at least in some form; where they may add value is as motivational speakers by repackaging and presenting ideas so that they are clear and memorable.
Robert Martin, Clean Architecture, p. 5, says that
The goal of software architecture is to minimize the human resources required to build and maintain the required system...design quality is simply the measure of the effort required to meet the needs of the customer.
There's nothing wrong with this statement, but it has a limited horizon. It's not always clear who might be an eventual "customer", or even what "the required system" will be expected to do. It's particularly unclear in the world of open source, or if the time horizon is measured in decades.
At the most basic level, the goal of compelling programming is be "compelling" in the ordinary sense of the word. When code is compelling, it's used because doing so is easier than not using it. Obviously, it will save time if a project is able to rely on easily accessible pre-existing work of high quality, but there's more to it than that. If a project starts with work that's worthy of respect, then it's psychologically more difficult to shovel a pile of garbage onto the foundation — if the drywall in your house is in good shape, then you'll take more care when you paint it than if it's wavy and the trim work was poorly done.
Not only is there a psychological factor, but a good foundation steers any superstructure onto the path of quality for technical reasons. When working to a deadline and without good libraries, it's easy to step onto the wrong path, either for reasons of expediency — this was quick and it works, even if it might cause problems down the road — or for reasons of ignorance — a race-condition...what's that? With good libaries, these kinds of mistakes are less likely because making them requires doing more work than accepting the choices made by the libraries, particularly when those libraries are the result of man-decades of effort.
The ways in which code may be used forms a continuum, from being treated as a black box, to being used as means to satisfy curiosity about how it works. Apply three labels to different parts of the continuum.
Most uses fall somwhere on this continuum, but the continuum leaves out something crucial: the people who might contribute to the project. If a project is to thrive, then it must be accessible to contributors, and something on the book end of the spectrum is required. This is not to say that a literal book is necessary, but there must be some high-level exposition in addition to the code, no matter how well-commented the code may be. Well-written code is expository writing, and the larger and more complex the project, the more true this becomes.
One reason we don't see more exposition is that it's hard work. Whether individuals are driving the culture, or the culture is shaping individuals, it's not unusual for programmers to feel that this kind of work is unpleasant, a waste of time, or beneath them. People say foolish things, like "commenting is an anti-pattern," although these kinds of statements are often a form of contrarian trolling or a misguided attempt at signalling sophistication.
The debate about when and where to comment is ancient, but the need for high-level exposition throws a light on the tensions at work. The longer and more detailed the documentation, the more difficult it is to keep the documenation and code in sync. The entire project becomes more tightly wound, and the techniques and tools approriate to the task of documentation are unavoidably idiosyncratic.
Providing documentation for projects whose only envisioned users are on the library end of the scale is often easier. This type of project typically has a clearer and more limited goal, and it will have a narrow access pipe that restricts the flow of information. Limiting the information flow tends to limit the number and complexity of "chaos interchanges," leaky abstractions, and the like, and this makes the documentation easier to write. When code expresses a collection of loosely coupled ideas, then the documentation can be kept close to the source code and tools like Doxygen are more likely to be sufficient.
The most interesting area is in the middle of the spectum: code as building blocks. There are different levels in this middle area. On the library end of the scale, the user hopes to treat the building blocks like plastic Lego blocks. Lego sells a huge number of different blocks, and you can build an amazing variety of things, but if the blocks are trimmed and reshaped, then they're not Lego blocks anymore. The more trimming and shaping the user does — or melting and recasting in the extreme case — the closer to the book end of the scale that user is.
These blocks have two important qualities: how "unitary," hard or impervious they are, and how "fungal" or dendritic they are. Fungal code has mycelia extending into other areas; we say that it's entangled, or tightly coupled, or has leaky abstractions. Modifying non-fungal code is like working with plastic Lego blocks: they snap apart easily and can be reassembled just as easily. Working with fungal code is like doing surgery on Spock's Brain.
When using a given body of work, for every problem there will be natural conceptual points of cleavage. One problem may allow the blocks to come apart and be reassembled cleanly, but another problem may require delicate surgery. This is part of the reason that we are continually reinventing the same thing — it's the same, but not exactly the same. Compelling software should take into account the different possible points of cleavage, and it may be appropriate to provide the "same thing" expressed in different ways.
Sometimes the apparent imperviousness is merely "false opacity" due to a lack of documentation (or because
the code is poorly written). Other times, the unitary nature is inescapable, like a steel ball
bearing that can't be broken into pieces with any useful relation to a new whole. For example, most
sorting algorithms are unitary in this sense; the next conceptual level consists of the fundamental
expressions of the language: if
-statements, variable assignments and the like.
The fungal and unitary axes are not entirely orthogonal. Part of the reason a bit of code forms a unit may be that the mycelia within the code are unavoidably entangled and there's no reasonable way to decouple the logic and interrelationships. Other times, there is some choice involved, and the result could be anywhere on the hardness range from styrofoam to steel. In many ways, this is the primary tension facing software designers. Either choice, more mycelia and greater hardness versus loosely coupled bite-sized pieces, can feel right. The benefits of modularity and loose coupling are widely touted, but tightly wound fungal code is often more efficient and may be easier to work with and once you've gotten over a learning hump, and assuming that there won't ever be a need to pull it apart — a large and dangerous assumption. When a project is loosely coupled, the individual pieces are easier to understand and work with, but there are more of them, and that can make the whole jumbled and disjointed.
The way basic algorithms, like sorting, may be implemented illustrates the tension discussed in the previous paragraph. Ideally, it should be possible to pull the sorting code out of a project and use it elsewhere without also pulling out a heap of steaming entrails that needs to be shoehorned into a Frankenstein monster. On the other hand, because it's so much more efficient, the right choice may be to implement the algorithm in a way that binds it to the "steaming entrails," forming a hard unit. Even when code is made hard by choice, compelling software has "build options" that allow it to work in a way that may be less efficient, but is easier to disassemble and understand.
Another take on this choice is a rule: don't misuse focus in a way that is not sustainable. Someone's "back-propagating big data sorting algorithm with packet level garbage collection" may be fast and efficient, but if nobody can understand it and use it, then the job is only half done. In light of the fact that computers today are thousands or millions of times faster than those of 30 years ago, a less efficient implementation will often (but not always) be plenty fast.
Compelling software is broadly useful and easily reusable. Software often starts with a few key ideas, and the initial implementation shows the power of these ideas with the greatest clarity. Part of this clarity is due to the sparseness of the ideas surrounding the key concepts. For practical reasons (i.e., getting the job done), it may be necessary to add layers of complexity on top of these key ideas, but there should always be an easy way to revert to an implementation of only the most central ideas.
An example that was mentioned earlier relates to a couple of the points made above. I once wrote some software, first in C to get the ideas right, then in assembler to make it fast. The C was a slower, but also easier to understand. For that project to be compelling, maintaining both versions would be necessary, perhaps in several incarnations with different points of contact between the C and assembler.
To be useful and reusable, the software must be accessible, and this requires choosing a language and coding style. Sometimes this is easy. If all potential users live in the Python universe, then Python is the obvious choice. Whatever language is chosen, for the result to be accessible, it's tempting to make it a rule to avoid doing anything that can't be translated into other languages in an obvious way. On the other hand, code written in a language like Haskell is, in my limited experience, more likely to be self-explanatory than C++ code — once you understand Haskell, which is no small thing.
Writing compelling software involves many inescapably idiosyncratic choices: the implementation language(s), the documentation framework, the repository through which the code is presented, expectations about how the user may reuse the code, and where the user is on the spectrum from library to book, to name only a few choices. No single tool, like WEB for literate programming, will suffice; we'll need an entire toolchest.