Any ideas? <https://mobile.twitter.com/FKohlgruebe...
# linking-together
f
b
You might want to search for ‘structured editors’ or ‘structured editing’ or ‘projectional editing’
f
I know Structured Editing, but what I'm looking for is something different. In SE, you perform edits on an AST directly. While this prevents syntax errors, it's also the reason why SEs have a reputation for being hard to use. What I'm looking for is a language that's edited as plain text but stored as a tree/graph once editing is done. Think of it as having a petty printer transforming your code into text when you open it for editing and a parser that turns the (edited) text into a structure when saving your program.
y
In that case, maybe Unison
i
Maybe isomorf? https://isomorf.io
g
What's the advantage to not storing as a text file? If parsing's fast enough, I don't see much.
I think reasonml and refmt initially wanted to let individual devs on a shared project check out code and write in whatever syntax is most appropriate to them, storing it independently still as a text file in git, but I'm not sure if they're still going down that path.
I think if you're trying to make a code storage DB that is more entangled to the code than simple text, then you trade off the ability to change the underlying semantics of the language. If your storage is not tied in any meaningful way to the code, then you might as well use text?
Text languages can have multiple parsers for different contexts, and it might not make sense to unify all that into one really complex system, but keep standalone tools.
❤️ 1
I think an example I am familiar with is clojure's simple initial java compiler implementation, and the various efforts to write clojure-in-clojure to replace it that failed over time. Pieces of those are still used for some more complex tasks, but they can't beat the simple compiler for just compiling programs.
If you tried to tack on more complexity to clojure's java compiler implementation that was never designed for those use-cases, you would fail a different way. If you try to design a system for too many use-cases from scratch, it will never ship.
f
@Gary Trakhman The short answer is separation of concerns. Using the same representation for everything requires compromises all over the place. Some examples: - because the presentation repr equals the storage repr, code style and formatting are shared between people. These are personal preferences and requiring consensus leads to a lot of bike shedding discussions. - changing the syntax of a PL breaks existing programs because execution depends on syntax. All mature languages have syntax they'd like to change but can't because it'd break existing code. - building accurate tools (static analysis, diff, ...) is difficult because it requires complex analysis of the source text. Valuable information like that a function has been renamed (in a diff) or how many usages a variable has (static analysis use case) are encoded implicitly (and language-dependant). A tree/graph could make this info explicit and easier to use. I'm not against text representations generally, but for structured data like programs, a "natural representation" offers many benefits. Text does make sense for user interaction (screen, keyboard), but should be interpreted only once and as soon as possible.
❤️ 1
e
The only bidirectional graphic <==> text language on the future of computing spreadsheet that i see is Luna. That is the Polish project that has tried with a pretty good sized team for several years now to conquer this very ambitious task. In a conventional text language, you design a syntax (user space) which is mapped to an intermediate space (the AST) which is then mapped to some target language + runtime. So one is juggling 4 things at once, and 3 mappings: (user -> AST, AST -> runtime+target, runtime->target OS). Each new platform requires a new runtime, and platform conflicts force you to go back and change the runtime, which might cause the code generator to change. So there is a back and forth motion as the target creates a back-pressure, sometimes all the way to the syntax. When you add in the additional requirement that the textual code can be mapped to a nice looking graphical for with user inputs controlling it, you add 2 more mappings, the internal model (or text) -> graphical translation, and the user input -> internal model (or text). I think evidence points that designing the graphic side before pinning down your underlying language (Eve and Luna projects) is an expensive way to do it. During language evolution all of these layers are changing very frequently, and keeping all of these different things working smoothly together will exhaust even the most enthusiastic team. I am a lone voice in the wilderness with this position, but i firmly believe one should pin down the programming model and the "source of truth" language first, and make sure it is solid before attempting to add graphics. Some projects like Jai can go backwards from AST to the source code, as his AST has sufficient information, but most languages cannot reconstruct the source from the AST. In most languages the text form is the "source of truth".
f
@Edward de Jong / Beads Project I believe that pinning down a language just doesn't work. Programming languages constantly change and traditional SW dev strategies like Waterfall don't work because you just can't build things perfectly on the first try. We should make changing our PLs as easy as possible to allow improvements to be made. Our choice of the "source of truth" representation (that's used for storing programs) has a big impact on how easy our PL can be evolved. Different representations (separation of concerns) allow breaking changes in one representation (e.g. removing syntactic sugar in the user-facing syntax) without breaking other layers (e.g. the storage representation).
💯 1
y
@Gary Trakhman
I think if you’re trying to make a code storage DB that is more entangled to the code than simple text, then you trade off the ability to change the underlying semantics of the language.
Isn’t it the opposite? If your code is in a text file then you can’t change the syntax (or at least it’s costly as in the switch from Python 2 to Python 3), but if you have a custom database it could have a version identifier and the new version of your PL could automatically apply migrations.
g
It's just that you'd have to maintain ASTs for all the versions, whereas one-off tools can be thrown away or worked in isolation, uncoupled to the canonical representation.
here is kind of an extremely tangential conversation around a similar issue: https://github.com/ocaml-ppx/ppx_deriving/issues/153 ocaml-migrate-parsetree is a project that tries to unify multiple AST versions so people can write tooling around it, and it's not always going that well.
I think at a root level I can make an analogy from text files vs RDBs. RDBs are great if you have or want to have complex queries, but they're not great if you're constantly changing the schema. Text files (csvs, json) are great for generic tools like grep, but it's hard to relate across them. I think the current state of things where compilers version themselves, parse files and recreate data structures in memory constantly is not a bad tradeoff.
I think optimizing for breaking changes in the syntax is kind of weird. It's only a bottlenecking design problem once languages become really popular, but they get that way by being really light-weight and convenient at the start.
y
@Gary Trakhman In Lamdu, changes in visual syntax do not require changing the database. And for changes we did that did affect it we did implement migrations.
g
neat, yea.
i
@Felix Kohlgrüber just saw this, I am. It’s actually using a datastructure which is a graph with multiple categories where you can enter from one into the other, so a “syntax token” (can be more than just a piece of text token) can be different across multiple languages so translation is just casting into that categories proper structure. A bit hacky but works for now. I wonder what kind of features would you look for in a tooling like that?
f
@Ian Rumac Are you talking about a project of yours? I'd be interested in details / links. I'm looking for a program representation that's a solid basis for building all kinds of tooling (e.g. static analysis, program transformation / synthesis, semantic diff, version control, dependency management, ...). My hypothesis is that a program representation has a big impact on the effort needed to create high-quality tools. For example, finding usages of an item is trivial if they are explicit edges in a graph data structure and much harder if they need to be decoded from a textual syntax. Text encodings don't seem to offer the abstractions needed for programs. Things like primitive types, sets, lists, links (both ambiguous like interfaces and unambiguous like hash-based links) and so on might be better abstractions to represent programs. Finally, even though I want another program representation for better tool support, retaining the ability to edit programs as text seems to be important. Keyboards are text entry devices after all and Structure Editors have shown for decades that moving away from text editing creates serious UX issues.
i
Yes, my pet project, Lotus Lambda 😄 ! No links yet, I’ll try to have a demo up in a month or two tho
My hypothesis is that a program representation has a big impact on the effort needed to create high-quality tools. For example, finding usages of an item is trivial if they are explicit edges in a graph data structure
Exactly! I represent it as tree that transforms into graphs/lists/whatever your heart desires in the category you’re editing. Starting with defining data type, then queries/mutations over data, then defining architecture blueprints because all that data can be reused - to generate code, manage versioning, diffing, client/server communication, wrapping/refactoring en masse, documentation, whatever you like. For now I don’t want to touch editing AST and code, I want to generate wrapping around the code so that your code is the same as it is now and you just edit the pure logic without worrying about the boilerplate and types and yadayada.
Keyboards are text entry devices after all and Structure Editors have shown for decades that moving away from text editing creates serious UX issues.
I just learned that lesson 😅 I made the editor like this( pic) instead of doing the text editing, this introduced so much UX issues I had to fight it cost me like 2 all-dayer-all-nighter weekends, now I’m back to writing a text editor with forced structure, hopefully this or next weekend will get me back on track, just need to write a parser.