Is anyone aware of a programming language or libra...
# thinking-together
d
Is anyone aware of a programming language or library that efficiently solves the problem of incremental reactive recalculations involving collections? I'm a fan of libraries that support reactive updates, such as Assisticant, KnockoutJS, MobX, Vue.js and SwiftUI, but I don't know of one that contains the algorithm I want. I'll explain the problem by example. Suppose you have: 1. an "observable" list of a million items, and you insert or remove an item somewhere in the list 2. a filtered list based on the million items showing perhaps a thousand of the items 3. a projection of the filtered list (map/select) So, when you insert or remove the item, the library should efficiently (and automatically!) propagate the change through the filtered list to the projected list. If the new or removed item is filtered out anyway, propagation should stop so the projected list is not notified of a change. Ideally, change notifications should be deferred in some way so that if several changes are made to the same list item in rapid succession, the derived items (2 and 3) would only be notified once.
💯 1
🤔 1
r
I’m trying to solve this exact problem right now! I’m very close to trying to roll my own solution. I’m thinking of shifting from observables to a more decoupled event system. I find they are creating unintended side effects and are difficult to debug (and I need events for undo functionality). Anyway, let me know if you find a solution - or would like to help build one!
👍 3
c
Are you looking for something in-memory or a database?
r
In memory
e
I am sure that in Excel which can handle about a million rows before it tops out, that they are using some paginated system, whereby if something changes in a chunk the whole chunk is rebuilt but otherwise intermediate chunks with their subtotals is maintained. A lot of the code in Excel is about handling the scale of the databases that people are throwing at Excel. The lower scale functionality is pretty easy, having written a spreadsheet before. Very important to cluster the data otherwise your CPU will almost come to a halt. This is a very tricky area that takes lots of testing to verify that your algorithm interacts well with the CPU. Since you have a very specific set of requirements you will need to roll your own; the conventional systems are not built for this kind of load.
d
I'm starting to investigate the use of Lenses to address this problem, but I don't have any concrete solutions yet. I just discovered this week the existence of a long running "bx" community (bidirectional transformation) who work on these problems. EDIT: I checked, and the BX people mostly use Haskell, so their code probably won't help you, due to their library dependencies.
🤯 1
i
I believe transducers in Clojure were designed to allow you to build data structures and operations that can be composed in this way. But if perf is your main concern, it probably should be easy enough to roll your own in any language. Us front end devs have to do very similar things to efficiently render large table views in the browser.
d
In BV's "Future of Programming", he talks about how memory and processors are made of the same material (transistors), so eventually it should all just be a bunch of processors. I think this is the kind of application where something like that would work: where a bunch of units are constantly a function of others. So like if you could allocate arbitrary chunks of memory and "assign" an operation relative to some other chunk. Maybe that's more like PLA than a bunch of "processors", but maybe that's the goal: being able to reduce pure functional operations down to programmed logic on the hardware that's constantly updating, without all having to be cycled single-file through a CPU.
d
@Ivan Reese If it's easy enough, could you enlighten us? I've been thinking about this problem occasionally for several years and never fully cracked it
d
transducers in Clojure are unidirectional. In David's context, you can transform the model to the view, but you cannot modify the view and propagate the changes back to the model. Lenses are like transducers, except that they are bidirectional, which is needed for David's "reactive updates". That's why I was talking about "bx" or bidirectional transformation. Lenses look easy enough (I'm going to try my first implementation of them soon enough). But the requirement is for efficient bidirectional transformation of a million items. Surely that requires more thought than the usual simple Lens implementation?
c
I mocked something out that should effectively do what you're looking for:
Copy code
type Paper = {
	id: string
	subject: string
	date: string // ISO
}

const collection: Record<string, Paper> = {}

// This is the query I want to index:
// Filter for nutrition items this year, range 20-40.
Object.values(collection)
	.filter((item) => item.subject === "Nutrition" && item.date > "2020-01-01")
	.slice(20, 40)

// First, lets translate this into a composite index.
const filterIndex: Array<[string, string, string]> = [] // [subject, date, id]
for (const item of Object.values(collection)) {
	// uses binary search to insert in sorted order.
	addToIndex(filterIndex, [item.subject, item.date])
}

// Translate your query into subscriptions.
const subscriptions = [
	[
		"date",
		"2020-01-01",
		() => {
			/* Update callback */
		},
	],
	[
		"filterIndex",
		20,
		40,
		() => {
			/* Update callback */
		},
	],
	[
		"subject",
		"Nutrition",
		() => {
			/* Update callback */
		},
	],
]

function updateItem(id: string, update: Partial<Paper>) {
	// Emit on the old key-value because this will be removed from result set.
	for (const key in update) {
		subscriptions
			.filter(([a, b]) => a === key && b === collection[id][key])
			.forEach(([_a, _b, callback]) => callback())
	}

	const beforeIndex = removeFromIndex(filterIndex, collection[id])
	Object.assign(collection[id], update)
	const afterIndex = addToIndex(filterIndex, collection[id])

	// Emit on the new key value because this will be added to result set.
	for (const key in update) {
		subscriptions
			.filter(([a, b]) => a === key && b === collection[id][key])
			.forEach(([_a, _b, callback]) => callback())
	}

	if (beforeIndex !== afterIndex) {
		// Emit an update for all listeners on filterIndex between before and after.
	}
}
😕 1
This pattern is something I discovered when I was building a datalog prototype. Using reified indexes on your queries makes subscriptions a lot easier.
I guess my point is: it sounds like you want a reactive database.
e
In general it is not easy to make controls bidirectional. There are host of ergonomic decisions that are asymmetrical, and I don't believe any general system could handle a wide range of widgets. I just posted a dual simultaneous temperature control sample, where you can set the temp any of two ways, and it feeds back to the model which then automatically re-renders the widges. But you still have to manually code how you want the cursor to behave, and the scaling factors for the mouse, and how you want to clip movement. User input is rather different than rendering a control. It is good however, to try an make things as bidirectional as possible, as it would save a ton of code.
j
https://github.com/TimelyDataflow/differential-dataflow/ is state of the art, although the UX can be iffy if you aren't up to speed with Rust. https://opensource.janestreet.com/incremental/ is also good. It can handle updates to nested collections, unlike differential, but can't easily handle maintenance of nested loops. I wrote a bit about applying similar ideas to UI - https://scattered-thoughts.net/writing/relational-ui/ - and a friend of mine is building something in the same family for javascript/react, not sure how mature it is yet - https://datalogui.dev/ I'm also working on a language that is designed to be efficiently incrementally maintained, although I haven't actually mapped it to the incremental layer yet - https://scattered-thoughts.net/writing/imp-intro/
👍 1
I also did a bunch of work for materialize.io which is a proprietary SQL database that compiles down to differential dataflow. There's some interesting stuff on the blog about eg incremental maintenance on non-abelian aggregates.
👍 2
i
@David Piepgrass You're not actually looking for bidirectionality, are you? That seems like an additional wrinkle introduced by Doug Moen. In the case of unidirectionality, you'd have some function responsible for insertion into the large list. That function also runs the filter on each newly inserted item, and if an item passes the filter, add it to the filtered list too. If you're doing this in terms of some reactive library, the library should (caveat emptor) be smart enough to not re-filter the entire list on every change, and instead just apply the filter to new items. To batch together rapid updates, you'd want a debounce or a throttle — high likelihood your reactive library of choice has those. Otherwise, you can just roll this yourself — you just need some way to schedule code to run after a certain amount of time has passed, and a place to store some intermediate state. As for recalculating the projection, mapping should be just as easy as filtering — just do it to each new item as it comes in. Again, all the Rx libraries I've seen are smart about this. Finally, for reducing, efficiency will depend on the properties of the operation: is it associative? Commutative? Approximate? Can you build a small intermediate result that's easy to incrementally modify and recompute? Etc. This talk gives a nice summary of some good strategies: https://www.infoq.com/presentations/abstract-algebra-analytics/ It's entirely possible I'm missing details that make this problem far harder than I'm imagining. But I feel like I've done this exact thing a handful of times, both with an without reactive libraries, so hopefully this helps somewhat. One more bonus link — all the features you've asked for are documented visually/interactively here: https://rxmarbles.com/
💡 1
w
Associative? Commutative? Approximate? — In other words, qualities of the kind of aggregation make the difference. Filtering, though a special kind of aggregation, can be easy to recalculate or hard depending.
i
Hmmm @David Piepgrass in general or UI-wise? Check out how jetpack compose does this behind the scenes -

https://youtu.be/Q9MtlmmN4Q0

- pretty cheap inserts in the UI tree, if that’s what you’re looking for. Not sure I understand the question. But as far as I get it, your logic (observe, filter, map) is not UI related, so the UI could belong to any library , the logic is behind it. You could do that in Kotlin (Kotlin Flow), in Dart (RxDart and Freezed), in JS (Uhh RxJS & Collections.JS if you wanna use a baked solution), most languages have their own implementation.
You could even do DOM patching on your own
d
@Ian Rumac I am interested in a general solution for efficient recomputation - UI updates are the most well-known use case, but far from the only one. Compose is interesting... it is demonstrated as a thing for building UIs but I wonder if it could be used for other things. Their Gap Buffer approach might not scale well, but given a different data structure like AList (http://core.loyc.net/collections/) the worst-case performance should be better (though average perf may be worse). However, it does not appear to solve the "filter on large collection" scenario.
i
it’s not just for UI-s, states and effects are also memoized and reused instead of being recomputed. From your examples, thought you were talking about UI 🙂 Otherwise, it’s simple reactive programming, it’s just up to you how you’ll implement it. I’d do event-based granular updates to minimize recomputation tbh, if it’s just a list sending transactionable updates is the best way.