We've published another research report that might...
# thinking-together
p
We've published another research report that might be of interest to this audience. It's an exploration of what "local-first" software might look like, why it's important, and some of our explorations therein. https://www.inkandswitch.com/local-first.html
👍 1
❤️ 9
s
Very excited to dig into this! You Ink & Switch folks know how to publish long articles... Will find time for it soon
s
Great write-up! Captures particularly well what we lost with moving to the cloud. I hope for a future where not just data but also the tools that we use to view and manipulate data as well as the 'sources' live interlinked in the same underlying distributed system.
With the Kanban app, for example, one user could post a comment on a card and another could move it to another column, and the merged result will reflect both of these changes.
Want this, extrapolated to 'code'.
r
This is one of my favorite articles in recent memory, thanks for sharing it and to everyone who put it together! The burning question to me is whether market forces will force cloud/web application developers to support local data storage? I tend to think not, that larger companies, who are the main drivers in funding software development, will choose collaboration features over local data storage features. And I'd even go further and say removing local data storage is probably a benefit because it helps protect intellectual property.
A simple thought experiment that shows how difficult it is to combine collaboration and local data storage: 1. I have a file and URL that anyone can use to edit it, 2. I edit that same file myself offline, 3. Someone else deletes that file online, 4. I go online and the file gets deleted locally via sync, 5. Without going back online, how do I get the version of that file back with my changes? I'd argue until there's an answer to that question, that's not an adequate local data storage solution. (If the paper has a solution that I missed please point it out!)
Just to be crystal clear about why that's so important: If I can't do that then the data can be deleted out from under my nose (via sync), and I have no reliable way to get it back (because being able to access an older online version isn't something I have full control over). The only local data storage solution I care about is when I have 100% control over whether I can access the data, whenever I want, forever.
s
If we're keeping the entire history (like git), recovering the file might not bee too hard? The article talks about multiple parallel versions and you being able to reject incoming changes. https://hyp.is/cxLjzmztEem5O6v508vzkw/www.inkandswitch.com/local-first.html
This is also highlighted as a problem needing more research in the HCI section. If a document doesn't have one authoritative version, what does it mean?
r
Yeah I guess an automatically sync'd git repo is probably the best solution we currently have. That works decently well, but surfaces a needle in the haystack problem. It's hard enough to find the version of a file you're looking for with good clean commit messages written by a human, it'll be even harder with automatically generated ones.
s
If I squint a bit the model this leads to looks like an append-only log of all changes. Each person has their own local logs, which they can replicate, backup any way they choose. Other artifacts such as 'documents' are just materialized views of a subset of merged logs from different persons. In this way different versions of a doc are views that select different trails (transclusions?).
💯 1
r
(PS I couldn't see the annotation here, it says I have to be logged in? https://via.hypothes.is/https://www.inkandswitch.com/local-first.html#annotations:cxLjzmztEem5O6v508vzkw)
s
Sorry, it's just a highlight, not any other annotation. Highlights the paragraph "Most CRDT research..."
🙏 1
Google doc edits have zero commit messages. Yes I can 'view history' and see who changed what part of the text. I don't think this is too bad. Larger changes being grouped under a commit message definitely seems useful though.
r
Yeah, I think there are two competing ideas here: Collaboration and local data storage. The Google Docs model is great for collaboration (everyone always sees the most up-to-date document) but bad for data storage (someone can just delete something that's important to me). I'm trying to figure out if there's a solution that solves both of these well.
👍 1
Personally I prefer the git model to the Google Docs model, e.g., when and how I sync is entirely up to me. Which is a model that prioritizes local data storage. But the obvious problem with git and other version control systems is that they are hard to use, and probably only really viable for programmers.
s
I like git/hg but the manual work required often gets to me. I work on two machines and every time I switch there's some pull/push tedium.
I'd also like to keep many changes private, only publishing the 'rolled-up' change as a unit - can't do that.
r
Yeah agree 100% about all the manual work. I mediate this by having tons of tiny scripts to automatically do some repo maintenance, but 1. asking most people to do this is ridiculous, and 2. even with the scripts it's still a PITA
Practically the git model just doesn't scale for most people beyond the handful of things they primarily work on. E.g., asking people to manage their whole hard drive via git would be insane, that illustrate flaws in the scalability of the system.
(Side note, not 100% sure what you mean by keeping some changes private: But a combination of local branches and re-writing history seems like it would solve that?)
s
If I make a very large number of very small commits for my own sake, it may not be useful to publish them as is but rather roll them up into one large change. Yet I don't want to lose the little changes I made. Private is perhaps not the word I should use. Rather, we want to view and publish the change history with different levels of granularity depending on context. From my perspective, local editor undo history is just a special case of change history and should also be synced between my devices.
r
How does
git merge --squash
fall short for this?
s
Technically this seems possible with git but consider the work needed to switch from the squashed to the unsquashed view of the same stream of changes.
👍 1
r
Yup makes sense. I actually don't even know how to do that 🙃 (never had a reason to). But I know it's supposed to be possible.
p
Oh, just catching up from this morning. @shalabh, we have put code into this system. it's called farm. a report is forthcoming.
❤️ 1
All work in Pushpin / Farm is stored in append-only logs of your own work. For the same reason Git rebase is something you can only do before you publish commits we haven't got a solution for "repackaging" work into cleaned up commits. I think this is an interesting problem that none of us at the lab have a strong conceptual foundation around.
As for "deletion", every document in our system (which we call hypermerge) is produced by using automerge to combine append-only hypercore logs which we then distribute by various mechanisms. You always have the full history over everything.
Our current implementation "greedily" applies all changes it can find from all users to every document -- that's fine for a research prototype, but what we really want is more sophisticated control over when and how you adopt "work" shared with you by other users.
We've done a bit of experimentation in the pixelpusher project around that (inkandswitch.com/pixelpusher.html) but there's a lot more to do there.
All our code & projects are BSD licensed and available on GitHub and I'd be happy to orient anyone interested in exploring it.
Oh, I should note WRT code that I think the U of C work going on under Ravi Chugh's group is very exciting -- I can't wait to represent code via mutation operations over ASTs instead of text operations on a string of bytes.
I'm hopeful our work and theirs will eventually find a way to link up.
s
Yes structured editing seems to fit in with the hypermerge ideas (where 'structure' implies elements higher level than characters and lines). I think there are very interesting directions to explore here that move way from files. E.g. if you could write function definitions without files: a function body it would link to other function objects in the history directly, rather than using unresolved names that get bound later. This ties into the idea from Joe Armstrong about 'all functions global searchable key/value database'.
p
Well, everyone's excitement here is good motivation to keep editing the other posts and get them into publication shape.
s
👍 Looking forward to the farm report. I did star the github project last time I heard about it (when your previous article came out, I think). I think a hypermerge like substrate could solve 'the problem' quite well, as opposed to git, dropbox, gdrive etc. which are ad hoc and only solve part of the problem.
c
and this comment brings an interesting aspect to light: https://news.ycombinator.com/item?id=19816180
of course we have technical issues but social/organisational issues are more pressing imo 🤔
s
That HN comment is completely off-base IMO. There was and still is plenty of money made with non web software so the premise itself is shaky. Further, I don't see why a subscription based model is at odds with the local-first idea. If anything, it appears that many workable business models might work successfully with local-first. A trivial one that comes to mind is based on the app-store model - your data is local-first but operated on by 'apps' you buy (or subscribe to), download and run locally.