Here's a classic paper that I have enjoyed for a l...
# linking-together
p
Here's a classic paper that I have enjoyed for a long time. It really captures why it is so hard to learn your way around a program unless you are able to work with someone who wrote it or has been maintaining it. It also captures why having a rotating group of programmers who all work on the same code base without extended mentorship leads to a complete breakdown in conceptual integrity within the code base. Programming as Theory Building https://pages.cs.wisc.edu/~remzi/Naur.pdf
p
It's fascinating that it keeps coming up as a touch point in other conversations. I think it's definitely possible to recover the theory of a program in many cases, but it's often a huge undertaking. After reading the Lions book I felt like I had much of the theory of Unix v6, but that was after a huge amount of time and effort and with the aid of amazing documentation. But, if one accepts that the theory of a program is much better maintained by passing it on from one person to another, what does that imply about how to create software? Should we stop trying to make programmers plug compatible? Never let a team who maintains a program get smaller than two? Accept that a programmer we have who has the theory of our program is more valuable to us than one who knows the language but does not have the theory of our code? If so, should we start offering people ways to advance in their career without changing employers every two years, so we can keep them on the program they know and to which they can already add the most value?
k
On the other hand, there's some creative destruction that happens every time a programmer switches projects or a project folds up. Admittedly, most of the touch points were added by me 😅 As I said in one of them, it's one of 3 major influences of mine (along with the Parnas paper you mentioned at https://futureofcoding.slack.com/archives/C037X8XMFB3/p1660888406320979) The way I think about it is, it's a lucid statement of a problem, not a situation without a solution. We have technical tools to manage the problem: better languages, version control (which provides a Schelling Point for crucial information in commit logs), automated tests (Schelling point in test names), creative ways to introduce new Schelling points for a reader like Literate Programming and Aspect-Oriented Programming or Common Lisp's
advice
facility. We also have at least one counter-example of a program that does uncommonly well at preserving its theory over generations of contributors: with thanks to @Konrad Hinsen[1], `emacs`[2]. [1] http://akkartik.name/archives/foc/thinking-together/1650464577.515109.html#1650702843.175699 [2] https://merveilles.town/@akkartik/108796658699867329 I think the key reason lack of theory is still a pain point is social: lack of incentives. If a program breaks users complain. If a program's theory breaks, nobody complains immediately. So it can rot quite badly before anyone notices, badly enough that the tech debt is impossible to repay.
p
Now I have to ask. What's the third major influence? 🙂
k
#1 is Christopher Alexander via Richard Gabriel: https://github.com/akkartik/mu#credits
Heh, I actually gave Naur top billing when I wrote that! I think he's more relevant to that particular project, but CA articulated the ethos I aspire to.
j
Yeah, this is by far my favorite programming paper. So full of insights and yet so misunderstood. I wrote a post a while again attempting to explain and defend it https://jimmyhmiller.github.io/incommunicability
p
@Jimmy Miller nice blog post! When you say a theory cannot be communicated, are you saying that when someone who has the theory is passing it on to an apprentice, they are doing something more like training or teaching than explaining or communicating? My experiences have somewhat mirrored Naur's when he talks about having to work closely with someone over time and coach them and have them make proposals and help them understand what is right and wrong about their proposals. It's a form of communicating, in a way, but it's not just explaining.
j
Yeah theory is a know-how. It can't be stated in propositional form. But others can gain that theory through like you said, working closely with someone over time. It is incommunicable, not untransferrable.
k
Naur's paper is one of my personal classics as well. Especially since "theory building" is very much my official job description (I am a researcher in theoretical physics), though in a different context. I have been thinking about the commonalities of building theories in science and developing software for a while. A summary: https://science-in-the-digital-era.khinsen.net/#Formalization
k
@Jimmy Miller Thanks for that context on Gilbert Ryle! It helps see where Naur is coming from. My attitude still is that "impossible" is hard to prove, "forever" is a very long time. I totally agree with the premise that "an essential part of any program, the theory of it, is something that could not be conceivably be expressed, but is inextricably bound to human beings." However, we're also only concerned with communicating theories to other human beings, not AIs or dolphins or aliens. It seems plausible that one human being can recreate in their head, at a distance, what some other had in theirs. This is the premise of all distance learning. It's the utopia Neal Stephenson's "Young Lady's Illustrated Primer" is pointing at. We already use a key part of learning from someone without interactivity quite pervasively in our society: exercises. 90% of my chess knowledge comes from a slim volume of I believe 16 chess games called "Chess Mastery by Question and Answer," where Fred Reinfeld peppers the reader with questions after every couple of moves. What do you think of this move? What would happen if White played _? Ramanujan may have been a genius, but a lot of his unique ability also came from happening upon a textbook of math exercises at a formative time. I'm totally willing to grant that there are aspects of muscle memory in juggling or bicycling that you can't learn just from reading books. But I wouldn't put programming in the same category. This is a useful conversation, because it helps explain to me the class of activities Naur is right about: • that involve muscle memory • where a good interactive curriculum doesn't exist yet • where the distance in mental space between teacher and student is too vast
j
I'm totally willing to grant that there are aspects of muscle memory in juggling or bicycling that you can't learn just from reading books. But I wouldn't put programming in the same category.
I would put it in the same category, very much as @Personal Dynamic Media said above, re: training/teaching rather than explaining. Much of what makes for good programming in the large (architecture, decisions about API shape, knowing what/how much to build when) is a matter of intuition that can't be reduced to a few rules of thumb I could write down. The best way I've found to transmit that stuff is by sitting together with someone who is doing the work.
k
@Jack Rusher Would it change anything if I
s/reading books/following books with exercises
? I think much of the possibility of reading materials still lies untapped. They can be conveyor belts for accelerated change (http://akkartik.name/post/silfen-paths) I'd claim you in particular could make awesome interactive things for people to load up into Emacs, that reward intense study. But it's a heck of a lot of work and often doesn't seem worthwhile.
j
Your chess example to me actually reinforces the point. How are people gaining a theory of chess? Not by memorizing propositional knowledge, but by performing moves and reflecting. The theory in the case of chess playing isn't contained in that book. What is contained is a series of steps, that, coupled with our innate human abilities, can cause someone to gain a theory. That's the point. The theory is the know how. The theory only exists when instantiated by a human. Following books with exercises can help you gain a theory because it builds know-how. Naur's point is that since theory is a know-how, and know-how's rely on our skills and history, you won't have the same theory as the author.
k
The theory in the case of chess playing isn't contained in that book. What is contained is a series of steps, that, coupled with our innate human abilities, can cause someone to gain a theory.
I don't disagree there. Perhaps we're saying the same thing. My claim is that it is possible for the original authors to make it possible for future readers to build up a theory of a codebase in their heads.
Naur's point is that since theory is a know-how, and know-how's rely on our skills and history, you won't have the same theory as the author.
I don't particularly care that it's exactly the same. Is it close enough to make the same choices as the original authors? Codebases are tolerant of some amount of error. (http://akkartik.name/post/modularity) There's a spectrum here. Simple changes I make to my codebase will happen the same way whether I do them today or tomorrow. There's "one right way" to go with the grain of what's been built so far. More complex changes are more fragile, and I might do things one way today and another way if I'd attempted them yesterday. We all change, and the codebase isn't always right. Putting all these nuances together, my claim is: it is possible for a codebase to convey knowledge about itself to the extent that a diligent follower makes the same modifications as the original author in situations where the original author's actions would be relatively stable over a period of time.
j
@Kartik Agaram The best way for someone to learn a "know how" is to have a practical problem in the context of a project they care about, live with the pain of the problem for a bit, then find a solution. It's very hard for a book to provide the right prompts at the right times to induce someone to have this experience in their work life. An artificial set of problems in a book can be a version of this, but I would argue that it's strictly weaker because the book cannot tailor the prompts to what the student already knows. More generally, I would say that programming is largely a craft activity that has much more in common with carpentry than the culture around programming admits, and that we should encourage apprenticeship more than we do.
k
I can get behind that. To try to restate your point, it's possible, in principle, maybe, if we work really hard, that we can create nonlinear, choose your own adventure experiences that convey a lot of the theory of a codebase in the narrow way I characterized above. But they can't compete with the rich experience outside the single narrow codebase that you would get when interacting with the right human to make changes to it. 💯 Please tell me if I'm still blind to something you said. Recorded artifacts are strictly inferior to the right human. Right now a tiny fraction of people find the right mentor, while the vast mass of humanity does without. Scaling up mentorship is more important a problem to solve than helping people in the absence of a mentor. However, both problems are important. I'd prefer not to choose between them.
j
I don’t particularly care that it’s exactly the same. Is it close enough to make the same choices as the original authors?
In many cases, yes. It is good enough. But being off by just a bit adds up over time. This is what Naur claims is the basis of decay in software. We also, have to realize the lossyness of passing on this information down the generations. The first person may get it approximately right enough, the next less so and less so all the way. That’s how we end up with massive legacy codebases for which no one has a theory.
More complex changes are more fragile, and I might do things one way today and another way if I’d attempted them yesterday. We all change, and the codebase isn’t always right.
Yeah exactly. The codebase doesn’t contain the theory. We do. And we build and grow that theory over time. What is and isn’t a small change is relative to the theory we have. What is and isn’t a good change is relative to the theory we have, the purpose of this code, how it relates to the world around us. We are building that theory by programming.
Putting all these nuances together, my claim is: it is possible for a codebase to convey knowledge about itself to the extent that a diligent follower makes the same modifications as the original author in situations where the original author’s actions would be relatively stable over a period of time.
Yeah. I think Naur would agree. The point is that the theory isn’t what is in the codebase. It is what is in the people. We can teach people, even via written documentation and they can form a theory that is good enough for many purposes. But the theory is in them, not on the paper.
j
Does this suggest that projects should be accompanied by programming exercises? "Try changing sqlites interpreter to a classic relational algebra tree-walker instead. What was lost in the change?"
j
This has a lot of parallels to this idea in legal knowledge representation that what you are encoding is not the law, but one or more persons' interpretation of it. Ensuring that people share the interpretation is the whole point of the tools I'm working on. But I have always had difficulty with the idea that there is know-how that cannot be communicated. The other version is implicit knowledge that cannot be made explicit. That it is communicated more efficiently in a non-declarative way seems not the same thing as uncommunicatable. You could still declaratively state all the principles and how they interact with one another. It's just horrifically difficult to do, and inefficient as a means of sharing the knowledge. Which is a good reason not to bother trying, so I agree with the prescription. I just don't think the disease is impossibility. Just deep inefficiency.
k
I have heard stories of teams in various contexts (business, public policy, ...) use computer modelling as a means to reach precise agreement on their interpretation of some model. I'd love to see a first-hand experience report on this, all I have seen is vague references. I don't see it happening in science, in spite of a long history of computer modelling. Discussion in science happens around narratives about models (with mathematical equations if appropriate), but not about precise formal implementations.
j
@Jason Morris i think it is more than inefficiency. Let's take efficiency out of the picture. Let's say you know how to juggle and are given an infinite amount of time to write down every proposition you know about how to juggle. Let's say I don't know how to juggle. I am given an infinite amount of time to read and memorize every proposition you wrote down. Now if asked questions about juggling, you and I give identical responses to all questions. Do you know something about juggling I don't? If knowledge how is reducible to knowledge that, then you shouldn't. But you in fact do know something I don't. You know how to juggle. We can prove it by having you juggle and then having me attempt. Despite knowing all the same facts as you about juggling, I still can't juggle. You know something I don't.
w
Ah! This thread is good, and I’m mid move so can’t really comment. But as far as learning by doing goes, for me personally, the Witness takes up the challenge better than any other work of philosophy or fiction.
c
@Konrad Hinsen this might get a bit philosophical territory but maybe it is already so… For the paper and this thread 🧵 are pointing into a similar direction. It tries (again) to better express the relationship between personal experience (consciousness) and shared experience. The ever tempting question: can experience be shared? Can commutation happen? I feel reminded of the chapter/part in the GEB Book were hofstaedter explains the difficulties with translations from one natural language into another. Regarding mathematics and programming being a practice of that, it seems so tempting to say it allows for “clear” expression of certain things and thus collaboration or a shared experience should be simple or at least easier. But in reality this is not always or at least not overwhelming the case. Because you can solve a problem in mathematics in different ways there is no “mathematical” reason on why to solve a problem in this way or another. When reading about the experiences of mathematicians like Grotendieck or Gödel, at least for me it became clear how much they rely on their intuition. In Daoism there is the Book of the Tao Te Jing, the first line reads: the DAO which can be explained is not the real DAO. Frustrating as it may be I do think that this is a deep truth and very much connected to what is talked here (paper the problem of representation etc..) about.
k
@curious_reader From a philosophical point of view, talking about "in principle", it seems clear to me that the Tao Te Jing (and others) is right. From a practical point of view, what matters is how well communication can be made to approximate full understanding. We have mentioned at least three levels of communication here: one-way (e.g. written documentation), dialog, and shared practical experience (working together). I see Naur as emphasizing that the first level is most often not sufficient.
w
@curious_reader you’re covering a lot of ground. Maybe we can sum part of it up with the image of a person pointing to the moon. The gesture is simultaneously helpful for guiding the viewer’s gaze toward the moon yet inadequate in itself and certainly distinct from the moon. A viewer could imitate the gesture without knowing what they’re pointing at. But between pointing from multiple locations (triangulating) and some “find the moon” exercises, the viewer can get good at finding the moon even when the connection is obscured in some way. I’m reminded of one specific area of the Witness that explores this idea. (Many areas do and in many different ways.) In this particular area, you start with a direct connection between a shadow cast on a puzzle panel and the correct path you should trace on the panel. As the series of puzzle panels progresses, intervening objects obscure the connection in a bunch of different and gradually more significant ways until, near the end, you see a panel in full sunlight. The connection is now nowhere immediately obvious.
k
This thread has a connection with the recent FoC episode: https://futureofcoding.org/episodes/057. In particular the discussion of mediums and a meta-medium around the 1 hour mark. The claim that you can't ever really experience one medium in another seems analogous to the idea that you can't ever get at the precise theory in someone else's head. Which creates the image in my head of codebases as self-contained universes that aren't quite commensurable with any other codebase. Or of every person as a unique medium. What it is like to be Peter Naur, say. Side-stepping the question of perfect fidelity, I'm thinking about error bounds in approximating one medium with another or one person's theory with another. There's a bit in the podcast about Finnegan's Wake and whether novels really are a static medium compared to Smalltalk. I wonder if that discussion and this thread has been conflating 2 dimensions: • How large the universe of a medium is. For example, a drawing program with 4 colors has a smaller universe than one with 24-bit color. • The space of operations within easy reach. A codebase makes certain operations easy. Some of them are obvious to a new reader, others may require understanding some theory in the author's head. Yet others may be far away regardless. The codebase doesn't really help with those; you might be better off starting from scratch (though it may require a lot of knowledge to even assess that correctly) The relative fractions of these categories will vary from codebase to codebase. A school project likely enables less than the early implementation of awk or vi even if the LoC are comparable. In these terms, perhaps we can agree that even if novels have as large a universe as dynamic media (or at least both are infinite in size), computers make more of it easily accessible to someone who is not James Joyce.
j
Re: juggling, I am not persuaded. Sometimes, the thing you want to know I put into a model of words, which you consume by comprehension. But the words are not the knowledge. You have to comprehend them to generate the knowledge you want. Other times, the thing you want to know I put into a model of demonstration, which you consume by imitation. My demonstration is not the knowledge. You have to practice to generate the knowledge you want. In both cases what I know is made explicit, and in both cases the knowledge is communicated. Whether a communication is in natural language, or requires practice to be useful, is irrelevant to whether it is explicit and communicable. Words are not special.
j
@Jason Morris It sounds like you are agreeing... Or maybe just disagreeing with the word choice of communicable? Communicable in the sense Naur was using it just meant "Can be put in propositional form". Not can be transferred to other people.
j
Yeah, I disagree with the word choice. I also find the distinction somewhere between unhelpful and imaginary. But I've been allergic to a lot of philosophy for a long time, so don't mind me.
r
The tricky thing about a programming language is that it it serves as two separate media at once. The first, and most clearly stated, is the interface between human and computer, where a programmer (though now potentially AI) shapes text that governs a computer’s behavior. The second is between fellow programmers, where the program as artifact is shared to be read. Today I read Programmer as Reader by Adele Goldberg, which goes over the difficulties encountered by a programmer attempting to gain a theory of a program and how the Smalltalk-80 interface provides affordances for them. She differentiates the task between 4 layers - UI, functionality, structure, and language - each of which demands their own questions from the reader. Despite a program being textual, the context it exists in usually isn’t. Better designed systems, like Smalltalk-80 here, take advantage of this to assist in theory building, though part of the job is always left to the writer.