Alrighty, here's the post I mentioned I'd make in ...
# share-your-work
d
Alrighty, here's the post I mentioned I'd make in #CC2JRGVLK: I want to create a computing ecosystem that solves most of the problems in what I call 'unregulated I/O'. It is quite possibly mad. Or it might work, and I will be surprised. It takes design cues from Oberon, FlatBuffers, IPFS, Git, Rust. It also sounds dangerously close to the kind of "great idea" a compsci undergrad would come out with. Yet, I am running out of reasons why this isn't possible (at the very least). This is why I want your opinions 😅 That's all I'll say here - rest is in the 🧵
👀 2
🔥 1
So, I've got a problem with this thing I called 'unregulated I/O'. Here's what I mean by this: • Unix set the standard of modelling files as byte arrays in the 70s. • Likewise, storage I/O, IPC and RPC is mostly done via byte array. There are some exceptions - for example: ◦ The OS normally abstracts away most packet handling up to the transport layer. ◦ Windows has dabbled with wacky ports (COM1 et al.) • This means that application programs have the responsibility of validating the binary data loaded via I/O. ◦ Improper validation accounts for the vast majority of attack vectors (especially if we include memory management bugs). ◦ Most modern applications employ widely-used libraries to minimise the amount of custom validation they have to perform, which is fair, because more eyes are on the libraries. Nevertheless, SQL injection and buffer overflows still happen in 2024. Exploit accessibility seems (to me) likely to increase with the employment of LLMs.
I'm proposing that the Unix model should be replaced with something more secure by design: • An abstract data model should be established for I/O: ◦ The OS should abstract away a reasonable amount of validation. ▪︎ Syscalls in applications would be typed. For Rustaceans, I/O calls would yield something like a
std::io::Result<T>
. ◦ The available types should include those that application programmers want to get I/O'd data into ASAP: scalars, arrays, maps, tuples/structs/enums (the latter of which should be Rusty). ◦ We would certainly require encapsulation, and (possibly) higher-kinded facilities like mixins. ◦ All data in this system should be represented this way, including programs themselves. ▪︎ This means that program source code is "already an AST". ▪︎ Plain text would not be used for source code. ▪︎ UI development for these 'structured languages' must be improved. (maybe I should've said Scratch was an influence? 😏) ▪︎ These ASTs should be transformed by the OS into machine code (which can also be represented with this model: a
.s
file becomes an array of instruction enums). ▪︎ Eventually, the OS running this should be able to self-host in this way. ◦ Applications should barely ever concern themselves with any kind of binary data, though this is of course impossible to prevent in a Turing-complete environment. • Data, as stored, should be content-addressable: ◦

Joe Armstrong

has plenty of reasons why this is a good idea (especially for greenfield). ◦ The equivalent of a 'filesystem' for this ecosystem would instead be what is effectively a hashmap with wear leveling. ▪︎ Or, Optane could be revived (some hope). Would be nice to design around this possibility. ◦ 'Files', now more accurately 'objects', are stored by a hash of their contents. ▪︎ Important note: this is not object-oriented computing. We don't want to be piping vtables. ◦ The need for encapsulation means that our 'filesystem' effectively becomes a Merkle tree. ◦ In order to prevent massive hash cascades when writing to storage, we would need to employ mutable references (in a similar manner to symlinks). ◦ Fast random-access updates to very large objects could be achieved with a hasher suited to incremental verification, such as BLAKE3.
Here are some fun implications of such a design: • New programming languages would be required. • Deduplication of data becomes trivial. ◦ On the subject, we'd need to be mindful of granular we are with the storage of heavily-encapsulated objects. ◦ Denormalisation should probably happen when eg. objects' raw data is less than the size of a hash digest (at the very least). • Transfer of large objects over a network can be heavily optimised. Downloads effectively become a
git pull
. • Core web technologies such as HTTP, HTML, CSS & JavaScript are no longer kosher, because they are based on plaintext. ◦ "This obsoletes the web" is a silly thing to say, but could be fun in the pitch. ◦ All of these formats could be transformed into the strongly-typed model presented above, though. • Tabs vs spaces is no longer a concern, because formatting is no longer a concern for plaintext. That's now the UI's responsibility. • Entire classes of attack should be all but eliminated (eg. injection). • The types used in the data model can themselves be represented in the data model, and we can relatively easily implement internationalisation for code: ◦ Here's a horrible illustration:
Enum Type { i8, i16, i32, i64, u8, u16, u32, u64, Array<Type>, Map<Type, Type>, Tuple<Type, ...>, Enum<Array<Type>> }
◦ These types don't have canon names, and I don't think they should. ◦ They do have hashes, though. So we can refer to types by their hash. ◦ We can then map human translations for these types and their encapsulated members in any number of natural languages:
Map Translations<Tuple<[Locale, Type]>, Array<String>>
I appreciate this is a lot, so if you've taken the time to read this, thank you ❤️ Please shoot with your questions and comments. I've got some visual explainers for this stuff lying around which I'll probably add too.
❤️ 3
...oh, and: much as I've searched, I can't find a project that's attempting to create an entire ecosystem out of these principles (even if it's just using VMs rather than an entire OS). If you know of any project doing this, please let me know, because I suspect they're probably doing a better job.
k
I see the main problem of your project in the wish to design a complex system from scratch. Such projects have basically always failed, for running out of steam before accomplishing anything useful. One of the insights from John Gall's "Systems Bible" (highly recommended!) is (chapter 11): "A complex system that works is invariably found to have evolved from a simple system that worked" with the corollary that "A complex system designed from scratch never works and cannot be made to work. you have to start over, beginning with a working simple system." That's in fact how today's computing systems evolved over a few decades. The result is a bit of a mess, but it works. And it's so big by now that it cannot be replaced, only evolved.
1
d
Are you familiar with unison lang ? As you mentioned they aren’t attempting to create an entire ecosystem, but I think has a lot of overlap with your ideas
k
Unison and IPFS are indeed the two main existing projects that have the most overlap. Neither tries a from-scratch approach. But unfortunately, the two don't really coexist well either, having their own content-addressing scheme each. Another language in that space is scrapscript.
d
Thanks @Konrad Hinsen @Daniel Garcia - those are exactly what I'm looking for 🙏 I actually don't want to have to "make something big", because yeah, I've also seen countless examples of things of this scale failing (or worse, leaving a stain on its surroundings... (cough) Windows Registry). I don't want to have to make an OS, but having the entire software ecosystem playing to the same conceptual tune is going to make things all the more sound - if that makes sense. Making a VM of it, in the same way as Unison or Scrapscript are doing (if I'm understanding them correctly), is where I'd want to start too. So I think I'm going to reach out to the authors of both and ask what they think about scaling them up.
👍 1
l
@Doug Thompson So excited to have found another one with similar ideas! YES! It seems "soo obvious", yet we're stuck with architectural in-optimalities stemming from decade old choices! And yes, @Konrad Hinsen , you do have a valid point. It requires some unusual longterm determination to see it through. You might get a more established foundation starting "simple" and growing into complexity along with usage, but real simplicity requires a deeper level of thought, where, when it fully connects, it stays simple even with scale.
😄 1
k
I wish I knew how to achieve "real simplicity". Sometimes I wonder if the rare documented cases of widely used and long-lived technology remaining simple (e.g. TCP/IP) were just a matter of luck. As you say, designing large systems that are simple requires a lot of thought, but also, in my opinion, several iterations on feedback loops. It's nice to think that it can be done, given enough time, but I can't think of many successful examples.
👌 1
👍 1
d
I really struggle with simplifying because it involves finding that exact sweet spot where you've dumped all the decorative bits but haven't cut away things are essential. Both feel good - dumping and cutting always feels good if you think it's about homing in on the core clean aspects of something, but it's sooo easy to go too far and chop off too much, simply because that good feeling isn't the same as the good feeling of Finding That Essence. And if you're developing something, by definition, you're starting with nothing and working up to the something that's in that sweet spot, so going in the other direction. You need a First Goal that's small enough but essential enough. I spend hours and hours thinking about this kind of thing, and switching between coding and "checking where I am in the Bigger Vision".
❤️ 2
And it's funny how "pragmatic" and "clean" often align, which is when you know you've got something important.