Random shower thought about hierarchical directori...
# linking-together
k
Random shower thought about hierarchical directories. Not as extreme as many ideas here, but still seems radical to a Unix weenie like myself. https://mastodon.social/@akkartik/102893835264728821
t
Dirconf is an interesting idea; my first thought is that it's a peculiar way to implement a database. Also apparently not mentioned in the original article is that each record, however small (e.g. a single number), will take up ~4kb because of the file system block size, meaning that large configs could be thousands of times heavier than necessary. What format would this structured file be, and why not use that for your config directly? (And would this be an instance of the xkcd #927 problem?) Is there a reason to name the program that does this
cat
and not something else? In my view, this is all poking at the problem of (a) serializing in-memory data structures, and (b) that serialized data is untyped. The end result is that everyone invents their own system for recreating structure and type. The fundamental problem being solved is that organized data needs to be shared, either between processes, between process invocations, or between entirely different machines. The same problems come up when processes communicate with pipes— one spits out text, and the other inverts the spitting-out process. In general, how do you cause a pattern of organized data like the one you currently have to appear in the memory of a computer program at different place and time? In the system I am working on, which is (for the time being) less ambitious than an OS, there is no concept of a "program" as distinct from a function. You could make a self-contained "piece of software" which is then directly composable with other "software", and the medium of information exchange is in-memory data structures. It would be nice if we could do something like this in an OS; send a data structure to another program/invocation directly. Why are programs less composable than APIs? Why is it exactly that the boundary between programs is not identical to the boundary between functions? Why exactly do we serialize data across that boundary? Is that an unavoidable requirement? Could the OS make it transparent?
d
I've fully implemented the 'dirconf' idea in Curv. It's used to represent Curv source files in general, and the Curv configuration file specifically. I don't require each each integer to occupy a separate file (occupying 4K on disk) as Tim said. Instead, there are several different Curv file formats. One format is a regular text file, "*.curv", containing a Curv expression. (You can think of Curv as JSON extended with function values, if that helps.) Another Curv source code file format is a directory, which is an alternative syntax for a Curv record expression. Each directory entry is interpreted as a record member. PNG image files are also Curv source files, typically occurring as members of a directory-style Curv source file, which is how I get raw binary image data into Curv source code.
❤️ 2
e
Very good points there, you are headed in the correct direction, as the UNIX concept that every device and system object is treated as a file, has been embedded as an assumption in people's thinking, and indeed an untyped stream of bytes is a pain in the rear. The whole recent history of JSON as the new data structure of choice for messaging is about preserving some structure across machine boundaries. Too bad JSON is a crude, very poorly defined data structuring system.
k
@tbabb At least for my original post, it was a thought experiment. Unix holds that everything should be a file. Plan 9 points out that not everything is a file in Unix, and takes it a bit further. OP takes that further still. What format should be native? It doesn't matter! Having a strong default that has native support seems worth trying. There's an opposite and equally valuable set of thought experiments for exploring what happens when not everything is a file, and different things have distinct APIs.
If there a reason to name this
cat
?
Because it uses just the syscalls
read()
and
write()
. Even if every int took up 4KB there's some use case where that's acceptable. However, in Linux, directories already inline files for the first 4KB to some number of inodes, IIRC. Also, one of the two ideas I alluded to was to use a single file for storage but allow reading inside it as if it were a directory. That doesn't have the storage inefficiency. But of course that was a lot to squeeze into 500 characters so nobody is expected to actually understand what I meant 🙂
👍 2
g
is this what powershell is/was all about. apps pipe structured objects to each other instead of text https://en.m.wikipedia.org/wiki/PowerShell