This seems like a different approach to VPL, I fou...
# thinking-together
d
This seems like a different approach to VPL, I found the beginning of the talk boring and the presenter might not now enough from Data Flow VPL. But his approach is different to all the VPL's that I have seen. You can start at min. 12 with the following link if you want to avoid the introduction

https://youtu.be/edQyRJyVsUg?t=716

🤔 2
f
I found the talk he references at the end really interesting:

https://www.youtube.com/watch?v=K0Tsa3smr1w

👍 3
this repo by the presenter also looks great
i
Yeah, Shaun Lebron is one of my favourite people in the Clojure community. His work on Parinfer is just inspiring.
As for the OP talk about Clojure editors.. there's a moment in the talk where it goes off the rails and folks in the audience start throwing out criticisms of visual languages. One of them that seems really sticky is "We only have tools for working with text — git, etc". • Graphics professionals have version control, diff, and other tools needed to do good source asset management. Some of these tools are awful, but some are downright fantastic. • There's no reason you couldn't make a visual language that nicely serializes to a git- and diff-friendly text encoding. Max/MSP and Pure Data are 40% of the way there. Luna is 70% of the way there. • It's bullshit that "can't use it with diff" is a nail in the coffin of visual languages, but "can't see live state" isn't a nail in the coffin of text languages. There's such a failure of imagination when it comes to programming tools. Thank goodness we have this community!
💯 1
👍 2
Now watching the Q&A at the end.. folks saying "Nobody has come up with a way to do recursion / iteration"
🤦
I'm glad I watched this talk, in that it gives me all the more resolve to just go build stuff so that when people come up with these basic af criticisms of visual programming, there are more counterexamples, so that the discussion can get past this point and get to more interesting stuff like "Are there reasons to not treat the evolution of the state space as the time axis?"
🍩 1
d
Yeah I hate the argument: 'We only have tools for working with text'. Fuck it we should NEVER evolve and create new tools, we should stick with the tools that we have forever!
🍰 1
f
@Ivan Reese Do you have concrete examples of vcs / diff / ... tools used by graphics professionals? I'd be interested to learn about these.
Text as a storage representation is indeed universal in a sense that text-based tools work with all data that's text based. Git works the same whether you work with Python source code, tex-files, or any other text-based data. A universal representation is important because it enables compatibility across tools and also means that users need less tools to achieve their goals. Just imagine you'd have to learn a new version control system and editor each time you start using another programming language. On the other hand, we're often dealing with complex structures that don't map to text well. Programs (in all languages I know) are a nested tree structure and a lot of complexity is caused by having to re-create this structure from primitive text. In VPL, a natural representation would probably be some kind of graph structure and I suspect that encoding this into a textual representation would be possible, but not really usable. So what if we could have a structured (tree/graph-based) universal representation? People could write general-purpose editors / version control systems / ... and a lot of issues related to parsing / serialisation would become a lot easier since the structure of the domain would map to the storage representation more naturally. A lot of people seem to think that there's no alternative to text that's as flexible and universally usable. I instead see a lot of issues with text (parsing, encoding of presentation / code style / line limits, data types, efficiency, accuracy of diffs, ...) that could possibly be resolved by switching to more expressive hierarchical / node-based formats. A "universal" format is important, but that doesn't mean that it has to be a primitive stream of characters. Thanks for reading until the end. This topic probably deserves a blog post...
🍰 1
🍩 1
i
When we say stuff like...
Text as a storage representation is indeed universal in a sense that text-based tools work with all data that's text based.
... I know what you mean, but there's a subtle implication that (eg) all compilers operate on all programming languages. (Sit down, LLVM.) That is of course not true, and you make that point, but we need to be careful that we don't accidentally hold VPLs to that absurd standard of universality.
It's possible to make a universal interchange format for VPLs. You just need to throw away anything relating to the human-centric aspects of that language, and only store the computation-centric aspects. Of course, we don't want to do that, and we don't need to do that.
If you want git, and you want diff, all you need is a way to save your VPL program as a text file. Existing visual languages already do this — most of them use JSON. Nothing new needs to be invented to solve this problem. (Go ahead and invent new stuff to make this better — just don't let this "VPLs don't work with text tools like git/diff" argument stand unopposed.)
As for artist diff/vcs/etc tools... This is one of the more popular image diff tools: https://www.kaleidoscopeapp.com. There's a bunch of them, with different UIs depending on what you're trying to learn by doing the diff. In my company, we care less about determining similarity between arbitrary images, and care more about tracking provenance, so we lean more heavily on VCS (which, for artists, is known as DAM — digital asset management). DAM is a bit different from what we're used to with VCS, where git/Github dominate and hg/svn/perforce/etc float on the periphery. Adobe made a stab at a DAM with a tool called Bridge, but it had a really rocky start and didn't really take over the way their other tools have. So instead of there being one dominant tool made by Adobe (or Autodesk), you have a lot of smaller players like Evolphin, Filecamp, and a few hundred others. Lots of companies also roll their own solution (that's what my company did) because there's a lot of benefit to integrating with your other custom tools and workflows, and the revision/derivation-tracking part isn't hard, because pretty much everything leans on the filesystem.
Smaller teams just use Dropbox, which gives you revision history and sync, and then just use something out-of-band for discussion/issues/etc. The bigger DAM tools just feel like overgrown versions of Dropbox with more power-user features.
d
I have been asking myself lately, is git that great? Every time I do a code review I have to see the code and execute in my mind. That’s exactly what live programming solve for us, so how maybe being able to see changes in a running system would be better, or a side by side comparison of the running system before and after the change
❤️ 1
I really like git for tracking changes and see what files I have modified since I started working. But I think it could be better integrated into the editor or into a VPL
👍 1
i
Yeah. It can be a "Yes, and" rather than a "No, but". We can make a VPL that works beautifully with git, that also works beautifully with DAM tools, that also works wonderfully with the "Versions" feature on Mac, that also has its own sync service, that also has a rich, persistent, forking undo/redo history system internally. The fact we don't have that yet is 100% due to execution, 0% due to inherent, fundamental differences between VPLs and text languages.
👍 1
j
@Daniel Garcia What about this project seems new / different to you?
d
That the blocks are not connected to show the data flow. Are just there and the code down infers the connections but never in a permanent way
Around here @Joshua Horowitz

https://youtu.be/edQyRJyVsUg?t=1234

I'm not sure I like it, but it's just a different idea
j
Agreed! Thanks for pointing that out.
s
Glad to join.. text files are one of my favorite topic to rant about discuss 😄. @Daniel Garcia I agree it would be more useful to have multiple views of the diff - instead of just source diff we also want one or more 'effect diff'. I mean, even let folks pin views to commits that they think are useful? One point wrt git that didn't come up above is that it's not just 'text' based, it's 'text file' based. So it has no notion of identity of entities within a file, only file identity is preserved. Now for text languages, we try to come with all kinds of hacky/clever diff algorithms to see which lines map to which other lines, but wouldn't it be more useful to have proper history of finer grained entities (functions, etc.)? I believe this can be simulated for non text PLs by serializing some IDs or perhaps just using files for identity preservation. We have to give up the idea that the text diff view of this serialized format must be useful - you'd always want to look at it through your custom viewer. I think this is fine.
👍 1
i
You could get around the "git is for files, not functions" problem by making a text editor that just treats whitespace lines as file boundaries. Boom, function-level granularity for git. </sarcasm>
e
If you look at the computer hardware, we have either ARM or Intel architecture CPU's in 99.9% of all computers. This architecture is at its core, a one dimensional instruction pointer flying at light speed through the code segments. It hops via branches, and then pushes/pops in and out of subroutines. This jumpy nature is extremely hard to make into simple motion. Your visual language to be understandable thus must stray very far from the machine. This is one of the reasons why visual languages have been such a bear. It is an intrinsically very difficult problem, and the closer one gets to the CPU the more jumpy and chaotic things will appear, thus some serious tradeoffs and abstractions have to be introduced. There are aspects of program development that are very amenable to graphical representation, and i am of the opinion that a graphical approach can be of benefit especially in the area of tweaking parameters to make things look nice, and quickly laying out interfaces. There are wonderful prototyping tools that really should just go the extra mile and generate good code.
g
I'd be curious to see existing solutions for merging VPLs. git works partly because teams generally split up their code into smaller files. That breaks up the merging problems a lot. Most VPLs I've worked with dump the entire tree of code to one file. Not super friendly for merging with others. Another issue seems to be how easy it is to fix conflicts. This is often relatively easy for text based code. In my experience it gets harder the more structure there is to the data. References from one VPL node to another are often via GUID or other paths. I haven't thought about why I seem to run into those issues less with text code. Maybe it's because the expressions of those connections are less common in text code? Or written in a way that's less brittle, probably because it's harder in text in the first place? As for images, I've never seen an image merge program. I've only every seen an image comparison program. They are used to discard one image over another, not to merge conflicts. The sarcastic suggestion that every function should be considered a separate file might actually suggest a possible solution. In games this problem comes up a lot because the level data (where all the objects are on a level) is often either some binary format, or JSON/XML etc but is not easily mergible. A solution by one team was to store levels as a collection of files, one file per object, they claim this helps the merging issue. http://the-witness.net/news/2011/12/engine-tech-concurrent-world-editing/ Maybe a similar solution would help VPLs?
f
@gman I think the core problem with diffing/merging (V)PL code is the mismatch between the data structure of the code and the data structure the diff works on. The data structure that most naturally represents code is some kind of a tree/graph. This data structure is usually stored by encoding it in a binary or textual (JSON/XML) format. The encoded data structure is a flat string / array of bytes that still contains all information of the initial data structure, but needs to be decoded/parsed to use that information. We then usually take these encoded data structures and perform diffs on them. This works well with changes in the initial data structure that only lead to local changes in the encoded data structure (e.g. changing a
1
literal to a
0
), but fails in more complex cases (e.g. changing the order of elements, wrapping an element, ...). It's like comparing two strings by encoding them as a images first and performing a pixel-by-pixel diff on the images afterwards. Even small changes like adding a character would lead to large differences in the encoded images. We should really move past text-based diffs and use AST / data structure diffs instead IMO. It's also interesting to note that more accurate diffs also reduce the number of conflicts. For example, if user A adds a parameter to a function and user B changes the return type, a line-based diff would yield a merge conflict. A diff that works on an AST-like structure could detect these changes accurately and combine them without producing a conflict. Splitting code into smaller files seems like a hack to me. It forces git to only search for changes locally (within the file), which might indeed help getting smaller diffs for local changes. On the other hand, non-local changes like combining two files into one become even harder. If we'd perform diffs on data structures instead, local changes would lead to small diffs quite naturally...
s
We should really move past text-based diffs and use AST / data structure diffs instead IMO.
Yeah, agreed. This comes up every now and then so that it almost feels like we all agree on this, but yet I haven’t seen this anywhere. Why is that? Is it too hard to implement? Are there implementation challenges that I can’t see (because I haven’t tried)? Or maybe is there a good solution I just haven’t heard about that just needs to become more popular? Or is it because it’s too daunting with all the text-based tools (git etc.) in place that nobody really believes we can pull this off? This seems like a smaller, much more manageable version of the grand “let’s revolutionize programming” problem, so if that is not happening for whatever reason I wonder if there are any insights to gain for the even bigger challenges we discuss here.
👏 1
d
We all agree on this, but yet I haven’t seen this anywhere. Why is that?
I'll add some dissenting opinion for you 😄
❤️ 1
AST Diff I've tried using diffs of ASTs. The issue is that an AST representation tends to be verbose both structurally and with an excess of redundant meta-data that is otherwise perfectly comprehensible in the underlying text. The thing people may be missing is that the underlying text is the most elegant and human readable version of the behaviour that the language designer and programmer could achieve. If the AST was clearer, that would be the underlying text. Just like @ says with diffing vpl text serialisations, ast visualisation is not wysiwyg. It takes a lot of effort to look at an AST and actually match it to the code you wrote. I think the potential value is as an information layer on top of the text diff. You can use AST information to indicate that a value's type has changed in a passage of code despite the underlying text not changing. This however is very much in the realm of "neat idea" i.e. I can't recall a situation where this would have been super valuable.
🤩 1
Semantic Merge A related idea that often comes up that I have been disappointed with - it didn't enable anything too useful. At the low level, semantic constraints makes more things unmergeable e.g. can't merge a boolean and number value. I haven't been able to make semantic knowledge about higher level constructs produce a better result. You might be able ensure syntactic correctness but the real problem at the heart of merging is preserving intent. I have found that broken text can better preserve conflicting intent and provide a better basis for a resolution than coerced correct syntax. I think this is because: - merging text = merging the actual material that is changed - merging asts = merging a higher level interpretation of the material Surprisingly, I believe the underlying material that people are editing better captures intent and the high level interpretation can disguise it. I think AST merge lets you identify more types of conflict but rarely new types of resolution. If you want radically better merge you need radically better information about the intent of the change in addition to the change itself. This is something that could be achieved by a new class of editor that captures more about a user's editing intent in addition to the actual changes to AST or underlying material.
💡 1
m
It takes a lot of effort to look at an AST and actually match it to the code you wrote.
to be honest, it takes me some effort to read usual text diffs, some highlighting, as GitHub and Git UI clients do, improve this a lot. So maybe by AST diffs we should always imply an ability to render those nicely?
d
The typical thing you lose in an AST visualisation is infix notation and layout. It's much worse than syntax highlighted code.
Take those away and math/logical expressions become barely readable.
f
@Stefan Good Questions. I guess that the reasons are somewhat historical / cultural (for example in the Unix Philosophy: "Avoid stringently columnar or binary input formats"). A lot of people have probably tried working with binary formats and failed because tools for viewing, searching, editing, diffing, ... weren't available. It also seems like most binary formats are very domain specific which indeed requires custom tools for the tasks I listed before. These formats also cannot be used beyond their original use-cases which means that they're not universal. So the interesting question to me is whether it'd be possible to create a "universal" general-purpose binary format that could be used to encode domain specific data structures. Something that allows to encode trees, graphs, lists, primitive types, ... would be nice. You could then write tools for editing, seaching, diffing, ... these "universal" files. Applications could encode their data structure in this format like they currently do with text. I don't know whether something like this exists, but even if it did, you'd still have to convince people to use it. This is probably the hardest part.
d
@Felix Kohlgrüber you are basically talking about knowledge graphs. You might like to spend some time with semantic web technologies (rdf, sparql etc.) to understand the horror 😈 IMHO it's an example of how increasing generality/universality makes solving valuable problems harder, not easier.
❤️ 1
😭 1
💡 2
f
@duncanawoods (replying to AST Diff) I agree that ASTs aren't a good option for representing code to the user, but this may be because they weren't designed with that use-case in mind. ASTs are a representation suited for the compiler and not for a human. I wonder how a "user-centric program representation" would look like. Off the top of my head: keep comments / docs; only nest as much as possible (1+2+3 as one node with three children instead of nested binary nodes); ... I don't know whether this has been explored in academia or anywhere else, but please leave a link below if you know something. Also, you could still use a presentation layer on top of this data structure that renders it in the most readable form. This would allow much more flexibility in what presentations can be used and would also put an end to the debates of code style / formatting, etc.. Style / Formatting should be a user setting and not something that's part of the program representation.
@duncanawoods (replying to Semantic Merge) hmm... I have trouble thinking of a concrete example where semantic merge conflict resolution is worse than text-based merge resolution. Could you probably provide one? Thanks. I agree that capturing intend or change is an interesting option as well. CRDTs and friends look interesting, but I'm not an expert in that domain.
d
@Felix Kohlgrüber (AST Diff) It feels like the concept reduces down to text diffs with syntax highlighting and gofmt type tools which is pretty much where we are now.
f
I'll check knowledge graphs out, thanks for the tip!
s
Thanks, @duncanawoods and @Felix Kohlgrüber — these are great points. On that note: I do know http://xmailserver.org/diff2.pdf, but can you point me to papers or articles about diff algorithms for graphs or trees?
d
@Stefan (graph diff) I have looked for graph diff/merge without any luck. In my case I used Operational Transform for diff/merge across graph data-structures. I could use user's edit histories so I didn't need a graph equivalent of longest-common-sequence and had a better chance to preserve intention.
@Felix Kohlgrüber
a concrete example where semantic merge conflict resolution is worse than text-based merge resolution
The typical "worse" result would be a type conflict leading to a dead-end whereas a text merge can just smush things together. It might be wrong but closer to the solution. I'll try and give the simplest example. Assume three way merge within lines: Original:
var x = 3;
User A:
var x = 4;
User B:
var x = "3";
Text merge:
var x = "4";
Semantic merge: conflict - can't merge string literal and number literal Essentially I needed pairwise merge rules between different types of semantic transformation. Text merge only needs to implement one pairwise merge rule for all text edits. In this case, we could combine text and number edits by stringyifying numbers but doing this type of thing is relaxing semantic constraints rather than taking advantage of them so it now seems like a hard way to do a dumb text merge.
s
Oh, just read about this: https://github.com/apple/swift-evolution/blob/master/proposals/0240-ordered-collection-diffing.md — haven’t looked at it yet, but looks like I just answered my own question.
i
... Just gonna pop in here and say... RDF and related ideas (like EAV) get short shrift! The semantic web was a good idea.. * smoke bomb *
... * un-smoke bomb *
So the interesting question to me is whether it'd be possible to create a "universal" general-purpose binary format that could be used to encode domain specific data structures.
UTF-8? * re-smoke bomb *
💣 3
f
@duncanawoods Thanks for the example!
s
> We should really move past text-based diffs and use AST / data structure diffs instead IMO.
Yeah, agreed. This comes up every now and then so that it almost feels like we all agree on this, but yet I haven’t seen this anywhere. Why is that?
I think you can't retrofit this idea onto systems built around the plain text medium. For any language, the language designer has put a lot of thought into the text syntax with specific goals such as comprehension. How editors and version systems treat the text files is not something that is designed at all - it's already established and an important context that the designer works within. The AST is designed, but not for the same purpose as the surface syntax. When talking about AST/graph based versioning, we're really talking about a context shift - a deeper shift in the primary surface medium - i.e. some structured editing medium instead of plain text blob editing. Really we want a top to bottom redesign here - languages designed for structured editing combined with a common medium and tooling for viewing and manipulation. I don't think anyone has nailed this yet to get the fluidity and flexibility needed for multi purpose expression. As an example of a frame with a different medium - if we were to say we're going to implement a language to express programs as 'spreadsheets', we're immediately thinking about what the rows here or columns there could represent, and not the high level plain text syntax. Diffs and versioning also falls out of the medium - we think of cell oriented diffs. Another aspect with structured editing is that versioning could be integrated with the editing experience. The idea is you want to preserve identity of the cells/nodes/items while you manipulate them. The possible power for semantic merging does seem higher than text. E.g. you might be able to encode that multiple items added to an entity are all merged and global identifier renames merged properly with other changes. But we'd want any custom merge logic to also be embedded in the medium.
👍 3
s
Some more resources I found: • [Fine-Grained and Accurate Source Code Differencing: GumTree](http://courses.cs.vt.edu/cs6704/spring17/slides_by_students/CS6704_gumtree_Kijin_AN_Feb15.pdf) • [Algorithms for Graph Similarity and Subgraph Matching](https://www.cs.cmu.edu/~jingx/docs/DBreport.pdf) • [A Differencing Algorithm for Object-Oriented Programs](https://www.cc.gatech.edu/~orso/papers/term.orso.harrold.ASE04.pdf) • And then there is the git data model that's potentially interesting as well as CRDTs, which have tons of papers.
👍🏼 1
a
To answer @Stefan’s original question as to why AST Diffs aren’t more widely used I’d offer a few points. First — I’ve tried doing this before and some of the less intuitive technical challenges come as the structure of the ASTs changes over time. Your diff and merge algorithms then need to have a notion of migrations between each subsequent version of the language— even additions of new types / relationships can cause a problem for the algorithms. This unfortunately brings along a terrible scaling characteristic — the algorithms for each language gets much more complex as time goes on — never something you want to build in. Meanwhile Git works the same way on every type of text file and it hasn’t changed much in 10+ years of reliable operation. Second — you need to consider what problem this is actually solving and what representation the user is expected to work on? Is it a text reconstruction of the AST you check in or is it the AST itself? If it’s text then you likely have to store that / diff it too or formatting is lost on each commit. So all you do is bolt on complexity to Git. Also it’s non-trivial to regenerate code from an AST — especially code you’d want to work with as a developer. In general — complex solutions are hard to maintain, collaborate on and advance. With dozens of languages to support and thousands of versions it’s no surprise that text has won. Think in terms of natural selection — same level of fitness for a fraction of the cost is going to win out over time. I think AST diffs will be important in the future but not for SCM use cases. They’ll be used to help people code alongside generators without worrying about losing their manual contributions. That domain is sufficiently constrained and provides enough value that I think we’ll see production tools using it within 4-10 years.
👍 2
s
Can we rely on identity preserving manipulations in the tooling over general graph diffing? Consider that we don't use diffing to determine which data record was changed in the database - the model is there is that we're modifying existing structure via some targeted commands. Same thing for programs or whatever we are representing. Since we're manipulating through a representation, clicking or typing on stuff - could we just preserve the identity of the objects been manipulated and intent of the author by appending to the original structure?