File systems and much else should be databases 1995 <http o Future of Coding #thinking-together

File systems (and much else) should be databases (...

Kartik Agaram

08/21/2019, 11:31 PM

File systems (and much else) should be databases (1995) http://okmij.org/ftp/papers/DreamOSPaper.html

This paper is an attempt to imagine what an OS would look like and how it would work if looking for a word 'foo' in something and deleting/closing/stopping this something, -- be it a paragraph of text, a network connection, a subscribed newsgroup, a process -- would all require roughly the same sequence of mouse clicks or keystrokes, and would be understood and interpreted in the same spirit by the operating system.

ASCII configuration files abound, for a very simple reason: they can be modified with any text editor from
ex
and
edlin
upwards, and can be viewed and created even without an editor, with a
cat
command.

These two paragraphs feel contradictory. Text files already provide roughly the same sequence of operations to update. Can anybody tell what the article expects the benefits of databases to be? I see something about the CPU cycles to parse text files. But that seems to perversely ignore the complexities of using a database. Another take: your file system already has much that looks like a database with journalling and so on. Mission accomplished? Personal note: I've looked at several papers this week and been tempted to go implement them. This is a big shift compared to the past 5 years, while I've been slowly putting the foundations of the Mu computer together. It feels like a weight has been lifted off my shoulders, and I can start prototyping new ideas again. Anybody want to play with adding a stupid database before a file system? The Mu computer currently has no access to its local file system, so we have a blank slate to play with.

Felix Kohlgrüber

08/22/2019, 8:09 AM

I think he's talking about unification / consistency. Almost all applications work with structured data of some sort (primitive types like strings and numbers and collection types like lists / trees / graphs), but most apps use their own storage representation and UI for editing the data. His idea is to create a database that directly supports structured data. Then, all apps could use this database and a small set of tools could be used to edit the database data in a consistent fashion. Text files and editors provide a uniform editing experience for text (= sequences of chars), but if that text encodes structured data, the edit operations on that structured data aren't consistent anymore. For example, hierarchies (trees) are represented in vastly different ways in text files. In markdown lists, hierarchy is determined by indentation. In PLs with c-like syntax hierarchy is expressed by blocks enclosed in curly braces. The same applies to lists etc. Because different text files represent structured data differently, they also require different editing operations. Adding a new item to an html list requires different editing operations than adding an item to a markdown list, even though it's conceptually the same operation. In summary, I think that a "file system" / "database" should provide a set of primitives that can naturally represent common data structures. Sequences of bytes organized in a tree seem limiting to me. To put it another way: You wouldn't want to work with a PL that only offered trees of byte-strings, so why would you accept that as a file system?

👍 1

☝️ 1

alltom

08/23/2019, 12:30 AM

These two paragraphs feel contradictory. Text files already provide roughly the same sequence of operations to update. Can anybody tell what the article expects the benefits of databases to be? I see something about the CPU cycles to parse text files. But that seems to perversely ignore the complexities of using a database.

Another take: your file system already has much that looks like a database with journalling and so on. Mission accomplished?

Maybe you’re right. There’s a lot in the OS and our UIs that isn’t editable as text. I wonder if the author‘s dissatisfaction comes from that? If deleting the line in the ‘ps’ output that represented a process killed the process, would that make them happy?

Daniel Bachler

08/23/2019, 8:20 AM

I haven't read the paper yet but I think such efforts often come from the frustration that so much data is opaque to other programs. Databases have a shared set of primitive types and a limited set of combination semantics that can be shared across languages. If programs would use a serialization format that would be inter-operable on the units of work of the application it would be much easier to create rich interactions between applications. Imagine for example that image formats instead of being binary blogs with exif style metadata sections that are xml encoded rdf embedded in binary, the entire object model would be persisted into a database - let's say sqlite. Let's say further that no files would exist in the OS and instead applications would have to add additional tables and entries in those tables to store their data. It would then be much easier to create queries that want to find image files that were created in a certain period, with a certain camera and lens. Or it would be easier to query for emails in your local email client because every message, contact etc would live in the shared database. Unix style OSs solved this issue by treating everything as a file but thus you loose a lot of structure and ability to discern data from metadata. The real world problem that I would see with such a system is that this is only really powerful if different applications share types as much as possible, but in reality applications rarely share type definitions even on a superficial level even for the exact same concern. I think that great effort would have to be spent to try to avoid a proliferation of types that would in the end not make it much easier to deal with data because every app would represent it's data in different tables in the database and queries would have to accommodate all these different forms. WinFS was a Microsoft effort around the mid 2000s that for what I can tell went into this direction but was unfortunately abandoned. I wrote about this back in ... 2005: http://danielbachler.de/coding/2005/07/03/how-we-will-be-crushed-by-terrabytes-of-data.html (I just noticed that these old articles didn't transition the markup correctly, sorry about the mess)

👍 1

Open in Slack

Previous Next