This is a very random thought - but something that...
# thinking-together
t
This is a very random thought - but something that I've been occasionally wondering for some time now. If we have lambda calculus as a model of functional languages and Turing machines as a model of imperative languages, what would be a good model for programming systems that have "document" as the basic underlying structure (Subtext is an example of this) - i.e., you have some tree structure and the program evaluates by modifying this document - appending new things or rewriting evaluated bits. (Lambda calculus itself is basically a tree, but what if this also allows imperative document edits?) Could this be something like a "Turing machine" that works on trees rather than tapes? There would be "current location" which can move in various ways around the tree and modify it. If your document has references (perhaps you can have
../../foo
to refer to
foo
of a parent of a parent), the machine would have to somehow walk up the tree, remembering that it wants to copy a value back to the original location - and then walk over the tree back to put the value in place of the reference). Is this something completely silly or something that lots of people have already done but under different names?
j
I might be missing some nuance here. But what you are describing sounds to me pretty similar to term or graph rewriting systems. https://en.m.wikipedia.org/wiki/Rewriting#Term_rewriting_systems They are Turing complete formalism. And there has been plenty of working using them for transforming documents, but also general evaluation. They do work exactly as you described. You have a tree structure and you modify it by rewriting things.
k
Supporting @Jimmy Miller's suggestion because that's exactly how my Digital Scientific Notation works. It's a term rewriting system embedded in a Wiki-like graph of cross-linked pages.
j
Note also that Lisp was originally conceived of as an automated form of term rewriting, the keyword “lambda” having been borrowed based on an incomplete understanding of Church’s paper. The first Lisp that was actually based on the lambda calculus was scheme.
d
you're getting some "already done under different names", so I'll add mine! my PL is really just a graph rewriter.
k
@Robin's recent idea seems tantalizingly related, but breaks my brain to think about..
t
Term rewriting is a nice reference I did not think of! I guess one difference between those and what I've been thinking about is that I imagined that you'd have a special "current" location in the tree (like instruction pointer...). You can certainly do this with term rewriting systems too though, if you just have a special term like
[X]
that marks the term/tree node
X
as being the current one. I guess term rewriting systems are basically how people define operational semantics of programming languages. It's strange people do not talk more about the connection between the two!
@Robin's idea reminded me of something I wrote about in a post inspired by spaces in cities (see https://tomasp.net/blog/2023/vague-spaces/) There are some thoughts about how programs live in a different kind of space than cities (which have fixed space they have to fit into, whereas program spaces can expand - but spreadsheet space expands in only limited ways - you cannot create arbitrary amount of space in any particular location - which I guess this idea was getting at?).
... but using some kind of term rewriting system as the basis for document-like programming systems seems like a nice way of doing things - and it looks like there's lots of (some, at least) people here thinking in this direction!
k
Maybe term rewriting systems should have something like a "current node". Rule application order is something usually swept under the rug. It's there, but everybody hopes it doesn't matter, and it's usually implicit (part of the rewriting engine) rather than explicit (part of the rule set).
r
@Kartik Agaram I think there might be a connection here too. I don’t know if I can speak to document-based languages in general, but at least for subtext, there is the loose connection in that both are based on spreadsheets. Beyond that non-abelian spreadsheets serve as mathematical models, although they aren’t models of computation specifically. They actually take computation mostly for granted, although I think that can be an interesting perspective too. When I was first reading Tomas’ question it occurred to me that non-abelian spreadsheets could be thought of as a model of the ‘document’ part of ‘document-based programs’. @tomasp Part of the point is absolutely that the space these things is is fixed, has particular characteristics, and is not created arbitrarily. I’m not sure if this makes them less like regular programs though. Generally the space of a non-abelian spreadsheet is far more expansive than the two dimensional space of a spreadsheet or paper or city. For one the “two dimensional” non-abelian spreadsheet has uncountably many cells, whereas a normal spreadsheet has countably many cells. And this only gets worse in the countably infinite dimensional case you need to account for syntax trees.
g
@tomasp 1) WRT to "current node", we already know how to do this using textual code and have many programming languages for this purpose. So, divide-and-conquer says that all you need to do is to map "trees" onto textual code, then you're done. 2) Here is my (probably over-simplified) understanding of term rewriting augmented by the concept of "current node": 1. Programmer writes an AST (the "inhaler AST") 2. Programmer writes a 1:1 corresponding rewrite AST that dove-tails with the inhaler AST. (Or, programmer annotates the above inhaler AST (this, though, violates the principles of KISS and human-readability)) 3. Term rewriter app inhales linear text and makes a CST (concrete parse tree) by culling the AST driven by the inhaled text (commonly known as "parsing") 4. Term rewriter app walks the rewrite AST using the newly-created CST. 5. Term rewriter app unparses the rewritten walked-tree into (new) linear text ("code"). 6. Run the generated code. If that's even close to what you're imagining, then I contend that this is "easily possible" to do using modern technology whilst loosening the thumbscrews, i.e. using OhmJS plus a nano-DSL written in OhmJS. I call it "t2t" (text-to-text transpilation) and am actively experimenting with it and its implications (meta-syntaxes, metaprogramming, etc.). More info/blather/discussion if interested.
k
@guitarvydas The problem with your proposal is that it doesn't fit the way rewriting-based documents are used in practice. They are more like spreadsheets. The fundamental action of the computer is not "run a program" but "update everything after the user has made a change to the document". The point of a "current node", as I see it, would be to make it clearer to the user what exactly happens during such an update. It's more of a user interface than a programming issue.
g
@Konrad Hinsen Continuing to ponder, some half-baked thoughts: 1) "run a program" and "update everything after the user has made a change to the document" are the same thing, except maybe differing in speed ; if your machine is fast enough, there is no point in trying to optimize the update by keeping update pointers, just re-do the whole thing in one fell swoop so fast that a human user perceives them to both be the same ; you want the action(s) to "feel" like a REPL 2) possible relationship: "... current node ... a user interface than a programming issue ..." ; Lisp lists are "trees". I use a Lisp debugger (Lispworks) that lets me single step through Lisp code ("trees") in the same way that an assembler debugger lets me step through lines of code ; does this sound like "current node"-iness?
k
@guitarvydas I guess we first have to agree on what we mean by "program", in particular if and how it is distinct from "input data". At the bit level, there is no such distinction. Both program and input data are bit patterns in memory that the computer acts on. But in what I see as the most common usage of the term, a "program" is something long-lived that reads "input data" for one of its many execution phases. With that distinction in place, it's the rewriting engine that is the program, and all the rules and terms/graphs to be rewritten are input data. Just like Excel is the program and all of the spreadsheet, including the formulas, is input data. But if you look at it from the point of view of semantics, it's the rewrite rules that are the program. Which are changed all the time in a computational document. Describing the interaction between a human and a computer via a rewriting-based document is running generated code is not wrong, but it's not helpful either unless you are thinking about writing a rewrite rule compiler. As for 2), that's a very nice example. But I haven't seen anything similar for rewriting. I tried building my own, but gave up because it turned out not to be that useful. I came up with different debugging tools, none of which traces the work done by the computer.
g
@Konrad Hinsen, You touch upon a good point. My feeling about "what is a program" is somehow different and I'm struggling to put it into words. Observation / pondering: control flow is not data. Smalltalk's encapsulation of data is not sufficient to isolate control flow, like how Unix pipelines isolate control flow.
k
@guitarvydas Smalltalk is an interesting case. Control flow in Smalltalk is entirely implemented in terms of the method dispatch algorithm. That's similar to rewrite engines in that it's hidden from view. All you see in the code is messages sent around. That's somewhat unrelated to your observation that data encapsulation doesn't imply control flow encapsulation. Or maybe it is related. I am decided for now.
g
@Konrad Hinsen Aye, and there's the rub. In my books, Smalltalk does not do message-passing (!). Smalltalk does method-calling. Method-calling involves blocking. Blocking is usually implemented as a state machine that performs low-level synchronization. In my mind, message-passing is "fire and forget". Blocking mixed with FTL (faster-than-light) rewriting ("referential transparency") works when your medium is paper, but, is overly-restrictive when your medium is reprogrammable electronic machines (aka "computers"). Continued...https://programmingsimplicity.substack.com/p/control-flow-is-not-data?r=1egdky
FWIW, some brainstorming, trying to get back to the original question https://programmingsimplicity.substack.com/p/tree-current-node-brainstorming?r=1egdky