A separate thread based on <@UJK8MKYAZ>’s comment ...
# linking-together
y
A separate thread based on @Daniel Hines’s comment https://futureofcoding.slack.com/archives/C5U3SEW6A/p1560278279045900?thread_ts=1560278176.045100&amp;cid=C5U3SEW6A I’ve thought about WinFS on and off for years, often hearing the same refrain from others who yearn for what WinFS might have been. And now I’m turning on it: if a filesystem-level database was that good an idea, we’d have it already. I’m being deliberately provocative here because I genuinely want to know what problems WinFS could have solved that 1. aren’t just as solvable at app/userspace level (sorry, I’m not an OS expert) 2. are based on what we know of WinFS’s actual API and capabilities, rather than capabilities of other databases that we wish a filesystem would have. (I’m also interested in apps using other databasey filesystem features, but it’s a slightly different discussion)
👍 1
d
“It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures.” —Alan Perlis
Files are impenetrable blobs - I literal have to parse them to understand anything about their contents.
Text isn’t information, not until it’s understood.
But a database is different - as soon as I know it’s lingo, I can ask it any question it’s capable of answering.
All your apps store their data in some sort of file, but that file is completely hidden to every other app (and to you - you have to know the path or grep for it)
It makes a lot more sense to me to organize things via much flatter namespaced trees that are queryable with SQL or Datalog or some other well understood semantic.
y
Which is why any sane engineer uses a preexisting file format and I/O library to handle all that for them. I understand the argument here, but my point is that the benefits mostly disappear at practical level, because 1. if WinFS’s model doesn’t completely match your needs, you need to move your app’s Format away from what WinFS assumes/understands 2. total platform lock-in
As for queryability by the OS, Spotlight seems to work OK. Separate indexes seem to work OK. They’re not perfect but they have major pragmatic benefits.
What I’m going for here, if it isn’t clear: I understand the theoretical benefits. I’m talking about actual product benefits that relate to how people actually use computers.
(At this point I have to confess that I don’t know exactly why WinFS died. My suspicion is that it turned out to have major problems at the practical/product level. However, it may be a totally different story.)
goes and googles it
Totally cheating here by pointing at discussion of The Article instead of just The Article, but The Article is far too long for me to read and digest right now, and also the discussion turned up a lot of really good little points: https://www.reddit.com/r/programming/comments/1svf88/why_winfs_died_long/
… especially the big Rob Pike quote.
s
Things that are not solvable at app/userspace level are cross-app data links. E.g. one photos app organizes your photos, another journal app (made by a different developer, without knowledge of photos app) lets you journal. How do you embed a photo in your journal? The filesystem answer is export/import and you end up with two copies with no record of the relationship. The WinFS answer is that the journal app can just reach into the db and link directly to the photo item. Now you can find uses ("where is this photo used?"), enforce policy ("don't let app A use any photos labelled X") and have updates to the photo (possibly via a third photo editor app) automatically flow to the journal.
👍 2
k
But a database is different - as soon as I know its lingo, I can ask it any question it’s capable of answering.
Depends on what you mean by "its lingo". • If you mean "its schema", then your statement is akin to saying you know what a function does because you know its signature. (Side note: functions can take string arguments, and databases can contain JSON blobs.) • If you mean "what information it contains", then you're right, but you've bitten off something a lot harder. Parsing files is a miniscule fraction of the work involved in understanding what information a data store contains.
🙌 1
d
There’s a relationship between schema and meaning. It’s not perfect, but I’d take it over nothing any day.
I’m OK with blobs, I just don’t think they should be the default.
Database > XML or JSON > unstructured text. I think that should be our default order of preference.
k
Files are not nothing. I think you're massively overstating the benefits of a basic schema over the visual parsing my eye does when I look inside a file. And you're massively understating the kinds of problems both can run into in real-world situations. Your inequality is probably/maybe right (comparing "database" with "XML or JSON" feels like a category error), but all the things on it are squished close to the origin on the spectrum, and the eldritch things from the abyss programmers have to often deal with extend all the way to infinity. So it feels unimportant.
I've never once been faced with a crappy, sprawling morass of a software project and found myself saying, "I wish we had a more structured file system; I'm sure the nincompoops who created this would have done much better with a structured file system."
❤️ 1
y
@shalabh Can’t OLE/COM/DCOM do all that? (Forgive me, this is me depending on ancient Windows knowledge that’s hugely out of date.) The point is: Yes, that has to be handled at the OS level. But none of it requires an RDBMS underneath. The reason I mention DCOM is that I think that all the uses you mention were possible in Windows already.
I had to refresh my knowledge of what OLE/COM can do. OLE first appeared in 1990. Here’s an overview: https://docs.microsoft.com/en-us/cpp/mfc/ole-background?view=vs-2019
s
@yoz Ah yes, I think the motivation behind OLE was similar. But it was more of a bolted-on, opt-in solution rather than a omnipresent substrate. Indexing and search would still have to be ad hoc solutions though? Still 'technically possible' on top of a typical file system.
y
Yes, but describing it as “technically possible” implies that it’s a painful hack in contrast to the good, proper solution that WinFS enables. At least, that’s the implication I’m reading from your phrasing. (Please feel free to disagree!)
And this cuts to the heart of why this topic fascinates me. The “RDBMS in the filesystem” solution is theoretically purer in certain ways. However, even if you were starting your OS from scratch (rather than trying to fit WinFS into the Windows ecosystem 😱 ) I bet that both the incidental and essential complexity would far outweigh the benefits. Keeping the FS simple, and moving indexing and search off into their own ad-hoc modules, sounds like a better engineering solution for the use cases you describe. (To me, anyway. I bet there are a few good reasons why I’m wrong, and I’d love to hear them.)
k
(I'm interested in the subject as someone dabbling in building an OS from scratch.)
y
And I should pause for a moment just to thank everyone for participating in this thread and entertaining my rabble-rousing for a Big Software Architecture Idealism argument
s
I wouldn't say painful hack, but perhaps a less optimal outcome. TBH I'm not so sure this problem can/should be solved entirely the persistence layer (I'm also not a big fan of the relational model). However I do feel when a feature is reimplemented in 1000 different ways in the higher layers, there might be benefit in pushing down a generic version of that pattern into the lower layers. In this specific case I see the features of transclusion and links - these notions should be provided by an OS starting from scratch (not just for persistence but also in the UI) because we're fundamentally dealing with mostly interconnected ideas and media, rather than disjoint and isolated ones.
👍 1
IOW, it's not the RDBMS that makes this interesting, it's that the OS provides a way to express rich linking between artifacts and these links are available for generic tooling. I see the hierarchical filesystem as also providing one way to express links via directory containment. Consider how many generic tools make great use of this feature: mv dir1 dir2 # Move all files in .. zip dir1 # compress all files in .. rm dir1 # delete ... find dir1 -name '*txt' | xargs grep 'hello' # search all files in ... Compare with an imaginary OS that only provides flat key/value, no directories: you can implement directories on top of key/value by storing them as value blobs, but they're not visible via a uniform interface since each app invents its own formats. So you can't have the powerful generic commands like above. Moving on this continuum in the other direction, we have an OS where richer interlinks than just 'directory containment' are available. This seems fundamentally different in what in enables. I don't think of this as more pure, but rather more aware of the relationships between artifacts. Perhaps more powerful generic tools become possible: zip all_dependencies_of(some_file) locate sources_for(some_binary) find linked_from(my_journal, all_images()) etc.
❤️ 4
e
WinFS was Microsoft's copy of the original Apple File System from System 7, which had the concept of a file containing unstructured data (the data fork), and then a 2 level hierarchical database of structured data (the resource fork). When Apple invented the resource fork, it solved a huge problem that had plagued computer systems, which was that you wanted to add metadata to a file, maybe a custom icon, or a set of preferences for the program, or maybe some login info; and you wanted to guarantee that the meta data travelled with the other data without being separated or lost. It was a brilliant system, and allowed Apple OS to have far fewer files, and not have hidden names with dot in front like Unix uses. It was considered one of the greatest innovations of the Apple OS. However, when the internet got going, and people started doing FTP of files, the resource fork, which could not be represented well on the intermediate Linux servers would get corrupted/mangled/stripped. So Apple discontinued using resource forks in OSX, and no longer uses any resource forks in their programs, instead using a fake folder system for applications for example, where the OS presents a folder as an App. Microsoft took so many years to implement WinFS, because they took it way further than Apple, that people forgot it inspiration, as Microsoft was not content with a mere clone of the simplistic Apple system, and created a huge complex and powerful system. However, they saw that Apple dropped their multilevel file system, and scuttled WinFS. I think Bill Gates commented that he regretted that they wimped out on this change, and i think from a purely technical standpoint it would have been wonderful, but let's face it, the 50 year old Linux file system is a millstone around everyone's neck, and bucking it isn't easy, because Linux must be 90% of all web servers, and Linux releases are mostly rehashes of different versions of thousands of components, and the core system changes like a glacier. The Apple system came with a program called ResEdit, which allowed you to register a four letter code for your resource type, and add in a editor for it to that program, and you could share your editor modules, and by having all the companies register ownership of a 4 letter code, they created a whole ecosystem of custom data editors. This approach has been promoted recently by the guy that invented Macromedia director, but without manufacturer support this type of cooperation never gets off the ground. There is no question in my mind that a standardized data type registry with shareable editors could revolutionize access to data. Too much of our data is stored in unshareable, uneditable format. Our code at least has a textual form, and if you stay within the same language there is tons of code you can borrow, but data is stuck more or less in the era of .csv files, as a spreadsheet is the dominant data storage mechanism, as there is little standardization in databases, and transfer between databases is always problematic. A lot of companies make a healthy living converting data between formats.
c
Windows (NTFS) does have a feature called Alternate File Streams that allows arbitrary metadata to be attached to files, but it was also dropped in prominence because users didn't understand that the metadata was lost when transferring the file on USB (FAT) or as an email attachment. The feature is still there but it is rarely used.
👍 1
g
I always felt Apple's resource fork was a good idea poorly implemented. They could have kept the idea without having it be a separate stream of data. All they would have had to do is mandate a file format. Another file format clearly inspired by Apple resources was Electronic Arts IFF format (https://en.wikipedia.org/wiki/Interchange_File_Format) and that lots of other file formats took inspiration from but they didn't make the fatal mistake of being separated metadata that only works on one OS. That seems like the most obvious problem of WinFS, how do I transfer a document to someone else. It's the same issue Mercury OS seems like it will run into.
e
Users just don't get why a chunk of a file disappears when they copy the file. All because the email system was inherited from 50 year old Unix. The one thing that Apple did well though was you could register your file format with them, reserving the 4 letter code, and then other people wouldn't conflict with your resource type, and there was a semblance of coordination which hardly ever happens in the computer biz. The complete lack of standardization has plagued our industry. From the encoding of characters, to byte ordering, it has been a maddening profusion of conflicting standards which make data hard to transfer. And data is often very valuable. We talk about code all the time in this forum, but some companies don't have valuable code so much as valuable data accumulated.
g
I'd argue getting standardization is about has hard as boiling the ocean. There are 7 billion people and however many million developers. If no one can write any data until some committee decides on a standard then there will be no progress. You can look at web browsers. It takes years and years to push forward the next change. Even languages, maybe some wholely owned language makes lots of progress quickly. The committee run languages take forever to progress. An OS could mandate a chucky format like IFF and ask devs to register junk ids. But, an OS couldn't decide what goes in each chuck. Now-a-days yhey might be able to mandate a structured format in each chunk like JSON or something with enough metadata that you can guess what the data is but I doubt that would have flown in the 80s when we passed data on 160k floppy disks.
amiga tick 1