I'm tugging on this thread today. Pensieve's 2D su...
# devlog-together
k
I'm tugging on this thread today. Pensieve's 2D surface isn't actually very good at giving me a sense of space like say https://www.inkandswitch.com/capstone. I just spent some time trying to cross-link stories far apart in space and blowing my stack, copying ids laboriously into a scratch space outside pensive, then pasting it into hyperlinks in the right place in pensieve. Then I realized, I should just use new kinds of links. The value of hyperlinks is when they're inline, whereas links just line up at the bottom of each note. But I currently can't linkify arbitrary text anyway. (Potluck style automation might help me get there.) For now, links are the best way to do hyperlinks in Pensieve. I also get back-links for free.
One problem I've been having with my note-taking app is that it grows sluggish over time. This is an unavoidable tension: simple programs tend to be inefficient, and optimizations tend to make programs less approachable. Meditating on

https://www.youtube.com/watch?v=rX0ItVEVjHc

helped me speed things up significantly -- by reading data redundantly from disk. In the process I realized that garbage collection is missold as "automatic memory management". All memory management is manual. There's no substitute for thinking about your program's memory footprint. More details in 🧵
My note-taking app consists of multiple text-editor widgets arranged on a 2D surface. I tend to add notes to the surface but leave them lying around. Over a month or so of use things would get sluggish and I'd have to start closing notes. Watching the Mike Acton talk got me to focus on this, and on ways to improve its memory footprint without complicating things too much. Lua doesn't have very good profiling tools, so I can't easily see where the bottleneck is. But I eventually hit upon a plan that worked very well: when a text editor widget goes out of view, just blow away the data backing it. When it comes back into view, read it from file. That's it! It took me a while to convince myself that this would be a net win. I'm replacing reads from memory with reads from disk. The key realization: it's a trickle of reads to disk replacing an unbounded amount of stuff that's just always hanging around in memory. I think this is a data-oriented solution in the footsteps of Mike Acton even though it didn't delete any fields from structs, or switch to struct of arrays or anything like that. Lua doesn't really give me control over how cache lines are packed or avoiding pointer indirection, but I was following the key principle Mike Acton articulated: you always have to manage your caches. Emphasis on the "you". No tool can do this. This experience has changed how I think of garbage collection. I no longer think of it as "automated memory management". In my new worldview memory management is always manual. GC merely allows you to reduce the ceremony of deallocation. Where in C++ I had to maintain a bunch of destructors and explictly call delete, with a GC I can just assign a variable to nil to blow away everything inside it. This seems like a powerful insight. GC isn't magic. It's still the programmer's responsibility to point out what can be reclaimed. This experience shouldn't be so surprising. I've been around enough to know the theory. I've heard the war stories about how Google ran successive generations of its search frontend off disk, then memory, then disk again. I think the surprise stems from seeing this dynamic in action in a tiny program with a memory footprint of 10-20MB. There's still plenty of room to trash your processor's cache (0.5MB in level 1, 8MB in level 3)
w
The key realization: it's a trickle of reads to disk replacing an unbounded amount of stuff that's just always hanging around in memory.
So you were doing some unhelpful work? Disk reads to refresh data that wasn't actively being used? In a way GC issues seem to be a symptom of the general problem that certain program aspects, like performance, are pretty opaque.
GC merely allows you to reduce the ceremony of deallocation.
A good way to put it. A problem with deallocation and naive GC is that the concern is too fine-grained. Things like stack allocation and garbage generations can help. I recall Lua handles it's closures relatively cleverly using "upvalues." The gist, if I remember, being that (1) you don't even need to make closure if a function stays within its the lexical scope of its creation, and (2) when you do close you only keep references to necessary variables around other things, the rest of the lexical environment of the function definition can go away.
k
So you were doing some unhelpful work? Disk reads to refresh data that wasn't actively being used?
The key commit is https://git.sr.ht/~akkartik/pensieve.love/commit/af7fba3263fbbdd1168ce7c941f9c6f79fca63d8 I have around 9GB of notes on disk (including all my mail for over a decade, chat logs, stuff like that). But only a tiny sliver of them are on the surface at any time. Currently around 120 columns. Far less than 50 notes/files per column, average file size per note is probably less than 5KB. Before this change, I was loading everything on the surface at startup and holding it until I closed notes/columns from the surface. With this strategy, the heap footprint according to
collectgarbage('count')
(https://www.lua.org/manual/5.1/manual.html#pdf-collectgarbage) was between 10 and 20MB. Initial load seemed instantaneous. The commit above eagerly throws away file data that isn't in notes showing on screen right now. When the viewport moves, I load any newly visible files, and throw away any files that are no longer visible. I continue to retain all metadata. It's just the raw strings that take up a lot of space. Now my footprint is halved. But the key metric is how often I find myself distracted by what I'm doing to fix my note-taking system. Today I didn't perceive any sluggishness where earlier things would slow down any time I was editing a note with more than a couple of sentences. I think the full GC might be a lot less frequent now. (I'm still seeing some signs of collection every second or so; see screenshot.)
j
Are you sure the problem was memory use and not some crazy quadratic complexity algorithm that starts to choke the process when you have more than N items?
(N.B. Whether or not memory was the cause of the slowdown, the observation that GC is just one strategy for deallocation is correct.)
k
Hmm, no I can't completely exclude a CPU bottleneck. I tend to assume GC if a program has non-deterministic pauses.
j
Maybe I'm not understanding the situation 🤔 It seemed like you were loading your full dataset into some structure in memory (presumably a table, given the language). If you don't then get rid of that table, what would trigger GC?
k
Yeah there's a few things I'm leaving out. 1. My programs in dynamic languages (and hopefully those of others) are just constantly generating a slow trickle of garbage. Cons cells in Lisp, lists in Python, tables in Lua, they're just so darned convenient, I use them everywhere. 2. I've made other caching decisions. For example, rendering the prose requires translating strings into LÖVE's Text objects (which I imagine contain font glyphs). I don't want to perform this computation on every frame, so I try to save the text objects. But keeping them around for the entire data set is prohibitive. So I keep them for just currently rendered text and recreate them as needed on events like the viewport panning or edits. That adds to the constant churn of garbage. 3. I do edit the data set and add new notes to it. So the surface is just naturally growing from day to day. Perhaps I should just spring clean my equivalent of 'browser tabs' more often. I think I started on this because the absolute numbers seemed so small, but it's totally reasonable for a minimalist program to limit the amount of data on surface.
(I just noticed the video at the top of this thread has somehow gotten deleted. Fat finger? Slack shredders approaching? Anyways, you can see it at https://spectra.video/w/051fef50-a121-40c0-bf3c-ee0730fb4a07)