Language, Expression and Design



November 2013

Why the term 'literate programming' is leading us astray

by Chris Zheng, on midje-doc, testing, documentation

I don't think gunning for sacred cows is the best blogging practise for the long term. But my post yesterday on REST gave me over 3000 hits that day... which is about 3000 hits more than I ever got in my short blogging career. So I thought I'd try again with an even bigger taboo.

Donald Knuth.

To be honest, I haven't actually read any of his books. But my dad has and we have spoken about a lot of the principles at length so its good enough for now. This is more or less of a copy-and-paste job from the documentation on lein-midje-doc, a library that I wrote to address this problem. I wrote the following about test files:

The best description for our functions are not found in source files but in the test files. Test files are potentially the best documentation because they provide information about what a function outputs, what inputs it accepts and what exceptions it throws. Instead of writing vague phrases in the doc-string, we can write the descriptions of what a function does directly with our tests.....

.... The irony however is that when a readme says: 'for documentation, please read the test files', the common consensus is that the project developer is too slack to write proper documentation. However, if we are truly honest with our own faults, this occurs because most programmers are too slack to read tests. We only want to read pretty documentation.

I spent a good month or two thinking about the problem, a week hacking the library together, another week on the documentation and got like 10 hits for the whole month. So I'm laughing at the irony here. I don't think I'm wrong about people not reading test files. I'm just completely wrong about people actually wanting to read pretty documentation. With this post, I'm going one step further than what I wrote in the documentation:

No one wants to read documentation, even if it's pretty or if it's up on the github project page. Most of us just want to read controversial blog posts full of pictures and video. If that post can be shortened to a twitter quip of less than 140 characters, even better.

Its a trivial, trivial world out there.... Sigh....

Apologies. I digressed. Now, back to Donald Knuth.

One out of Knuth's many legacies was the coining on the phrase Literate Programming. This has been a very popular concept, especially of lately. The main idea is that the code is written in a way that allows both a machine and a person to understand what is going on. Most seem to agree that it is a great idea.

From Wikipedia

Literate programming is an approach to programming introduced by Donald Knuth in which a program is given as an explanation of the program logic in a natural language, such as English, interspersed with snippets of macros and traditional source code, from which a compilable source code can be generated.

The literate programming paradigm, as conceived by Knuth, represents a move away from writing programs in the manner and order imposed by the computer, and instead enables programmers to develop programs in the order demanded by the logic and flow of their thoughts. Literate programs are written as an uninterrupted exposition of logic in an ordinary human language, much like the text of an essay, in which macros are included to hide abstractions and traditional source code.

I LOVE the idea. However, I disagree on the name. I believe that Knuth did serious harm to our understanding of the concept of code that is both human readable and machine executable when when he gave it the name of Literate Programming. Humans and machines are fundamentally different and rely on completely different methods of communication:

  • Communication to Machines are usually very linear and procedural. It involves giving them a specific set of instructions. First Do This, Then Do That.... Machines don't really care what the code does. It just executes whatever code it has been given.

  • Communication to Humans usually take a very different form. We wish to be engaged, inspired and taught, not given a sequence of instructions that each break down to even smaller sequences. The best documentation are usally seperated into logical sections like an overview, table of contents, list of figures, topic chapters, subsections. There are text, code, pictures, even sound and video. Documentation structure resemble trees, with links between content that connect related topics and content. They do not resemble program code and therefore should be created independently of the machine code itself.

In short: Only machines are programmed. Humans are engaged, inspired and taught. When we write literate programs, we place the primary importance on machines, not humans.

Instead, we should be thinking about writing executable documents or rather, the concept that I'm trying to promote, which is testable documentation.

Documentation are written like a woven lattice for humans. The fundamental structure of programs and documentation are very different from each other. Therefore, thinking that documentation can be automatically generated from doc-strings is a mechanistic approach not a humanistic one.

We should be writting for people, not machines. Our mindset and tools for writing code should reflect this as well.

Introducing MidjeDoc

midje-doc plugin attempts to bridge the gap between writing tests and writing documentation by introducing these novel features:

  • To generate .html documentation from a .clj test file.
  • To express documentation elements as clojure datastructures.
  • To render clojure code and midje facts as code examples.
  • To allow tagging of elements for numbering and linking.

In this way, the programmer as well as all users of the library benefits:

  • All documentation errors can be eliminated.
  • Removes the need to cut and copy test examples into a readme file.
  • Entire test suites can potentially be turned into nice looking documentation with relatively little work.

Here is a video of the demonstration of a workflow using midje, midje-doc and live-reload. For those with short attention spans that wish to cut to the chase, skip to around the 7:20 mark.

Apologies to the 10 people who actually bothered to read the github readme and the library documentation. There was nothing new in this post that couldn't be found in those places. I did warn you that it was a cut and paste job. Like I was saying before... its a trivial, trivial world out there =)

comments powered by Disqus