This will likely interest the folks here: <https:/...
# share-your-work
g
❤️ 1
k
This pitch for Frest resonates with me!
g
Lots more to say. As an abstraction for developers, for a start.
k
I've thought about this a lot without progress. It's still not clear to me it is solvable.
g
Sure, it’s solvable. You use the power of the relational database. You tag every row in every table with: • a “at” timestamp; and • a model id Separately, using other tables, you track how models relate to each other. This one was branched in this way from this one. You’ll note that you can imagine several different ways for model branches to work. Assume model B is branched from model A. What several options wrt how changes in A are seen in B: • B’s view of A is frozen at the time it was branched, and no other changes in A are visible in B; • B sees updates to A, but its own values overrides the values in A; or • B sees updates to A, and values in A override values in B If you reflect on that for a bit, you realise sometimes you need one, sometimes another. In my blog, I discuss what that looks like. Note that whatever you have, it can all be resolved with relational queries. Indeed, you can have something akin to Postgres’s Row Level Security, which is an automatic addition to any query. It would not actually be difficult to instrument a relational database so that in order to query it, you have to supply a model value, and the tables automatically acquire the extra criteria they need to only show the values from that model and its antecedents.
k
Are models the same as relational db tables?
Oh I see, each row has a model id. I don't see any reason to say "sure" about this 😆 You're recommending saving a multiple of data for multiple forks of models. That seems like the big source of risk here. I'm glad you're working to de-risk it.
g
The model ids pick out entirely separate versions of the same database, all with the same schema. Each one is a model. Then, you have merge operations to support branch-merge semantics. You mention risk. You can handle authorisation with the same Row Level Security semantics. This means you express relationally which roles have which permissions with which models, and enforce that using RLS.
You would also provide functions on models. So models can be “like this one, except I get to insert code before/after the values are found to change those values in whatever way I see fit”. IOW, you can have virtual models. A virtual model would be how you’d handle things like merging multiple models in a lazy manner.
k
When I said "risk", I was thinking only about performance issues. How large the database grows, how long it takes to clone one model from another, or how long it takes to query a specific model alongside its antecedent models (which is how I interpret "virtual models"). No matter how fast these operations are, they scale in proportion to the number of models one has, a number you're advocating to grow without any bound that I can see. I have some experience with trying to use a Datalog-like append-only model inside an SQL database, that's where these concerns come from. As a general principle of rhetoric, if you seem to be saying something is easy or obvious but the reader considers it difficult or non-obvious, you lose something. Persuasion requires agreeing with the concern and then describing how you would dispel it. This sort of reasoning is why I tend never to claim anything is easy by my approach. I express doubt in an effort to hoard credibility with the sort of discerning audience I'm trying to attract.
g
We live in an era of absurdly fast machines with absurd amounts of storage. The industry has taught us to think everything is slow, in part I think so we stick with limited services on expensive hosted services. Ahem. Two responses: 1. I’m describing a semantics. It’s a pretty novel and useful one. That comes at some cost in terms of runtime and storage. That may or may not work for various applications; 2. I’m particularly interested in tools for individuals and small businesses. You could keep all your personal data, and the data for your carpet laying business with 10 employees, in a database system along the lines of what I describe. It would be “inefficient” in storage and computing time, but it would be respectful of what you really want to know and do as an individual or small business owner, around the data that are important to you. I’m afraid I don’t understand the rhetoric question.
k
My point is I had some experience where what seemed like a small amount of data blew up once I added versions and tried to always append rather than mutate rows. On absurdly fast machines with absurd amounts of storage. I suppose I'll wait until you have something to try out before finding out if it works for the sort of case I care about. So we can forget about rhetoric and wait for the code 🙂
g
Like I say, this would work for some applications. Amazon probably wouldn’t care to use this across all their data. The carpet laying business is a different story.
I’d love to see what your use case and look into why it broke. The most obvious way of implementing this, properly indexed, shouldn’t make data retrieval more than say twice as slow, I would think.
k
Yeah. Unfortunately this particular thing was in the context of a job and I don't have access to the code. It wasn't a lot of data and it was a simple CRUD webapp. But the database did need to be queried online. Performance was not an issue when we were mutating rows, but very quickly became an issue once we started appending rows. Perhaps I didn't know what to index. But that too is a problem your users will face, unless you somehow magically index the right things. Perhaps the risk is that appending rather than mutating makes us more sensitive to indexing strategy. I can easily implement a database with mutation, but perhaps I need to learn something before I'm as effective at an append-only database.
g
Given my interest in smaller apps, I think we could happily have the database do its own indexing automatically. I’m kind of puzzled why this isn’t at least an option in major databases.
In case it’s not clear: the primary key on a table with a synthetic id column, an at column and a model_id column would be (model_id, id, at).
k
I think one difference with my experience may have been that I had to look up the most recent version id for rows. If you always know the precise model id that you need to look up, that may work well and my experience may not be relevant. If you're looking up by a fixed model id, does that mean that any time one creates a new fork they need to copy all the rows from the past into the new model id? That would be a slow operation, right?
If you do it a hundred times, the database would grow by a factor of a hundred. I suppose we could do something like what git does, let you check out multiple models. Checked out models would be fast to query but occupy linear space. Models that aren't checked out could be compressed to occupy sub-linear space.
g
You would probably want some forks to be copying, but I imagine mostly they would be lazy. So when you look for person with id 1, in model A, we would look for something like: SELECT * FROM person WHERE (id, model) = (1, A). If A was branched from another model, you would need to extend that with either a recursive CTE or you query for something like: SELECT * FROM person WHERE id = 1 AND model_id IN (A, B, C) assuming B, C are the antecedents of A in the branching hierarchy, and then you choose which model’s version of the record you should have based on the branch mode or whatever you call it. To be clear: you could automatically add that logic behind the scenes. A developer using a Frest system would do the moral equivalent of: SET model = A; SELECT * FROM person WHERE id = 1; (“moral equivalent” here because I’m very keen on making Datalog the query language of such systems…)
k
Yeah. That's the kind of thing that introduces some risk it'll get slow. So there's a trade-off here. You can complicate the query so performance becomes harder to reason about, or you can create a UX where people have to check out, and it'll make sense to people that those operations are slow. I think I'd lean towards the latter approach. I'd rather provide predictable performance and transparently expose slow operations to people.
g
Excel is also much slower than doing the same calculations in assembly language.
k
Branching and merging models is a frequent requirement in scientific computing. Most people still do it by hand, meaning badly. The only management solution I am aware of is git. You put all your code and all your data into git, if necessary using git-annex for big files, and then you have branching and merging. The UX is often a pain, in particular when your data is not in some text format. So I agree that doing this properly promises to be a huge productivity gain. I just don't see how Frest is the solution. Sure, all data can be mapped onto a relational model. But to turn that into good UX, the mapping has to make sense to a user. For much of the data I deal with, that's not obviously possible. Though I agree that for my personal data it should be OK.
g
It remains to do it to find out. Excel and Access are the data manipulation models most amenable to understanding by non-programmers. I am fairly convinced that functional and relational models are the best for non-programmers to follow. I believe that if you give the user tools to see the provenance of what they’re looking at — and this is an append-only first model, where most things have a complete history — then you can make this work. It will certainly work for some folks, such as yourself. But I think the owner of the carpet laying business can make simple and effective use of this feature without having to do anything too confusing.
I am currently trying to sort out enough income that I can dedicate myself to building an app that will demonstrate these ideas. It would be a FileMaker/Access kind of app that integrates not just local tables, but remote information sources — a single Datalog query interface where you can join your email to the content of local files to your calendar to any and all online data sources. Know an Angel investor who wants to back the idea? This would be a phenomenal application within which to deploy an AI…
k
I agree 100% that a good model of computation doesn't have to be for everyone. And a relational programming system without database cruft (SQL etc.) sounds like a good match for many use cases.
👍 2