I think the above question is better split into tw...
# thinking-together
n
I think the above question is better split into two, so here's a separate prompt: What is the difference between an exclusive data bucket and a tag? Is one inferior to the other? How do you tell them apart, behaviourally? (My definition of "bucket" here means simply an exclusive location that a datum is considered to "live" within, whereas a "tag" is not a residence. Example of buckets: the "folders" of a hierarchical file system. Example of tags: the labels placed upon Github issues or the #hashtags of social media posts.)
☝️ 1
c
I am 100% sure these map to different processors in your brain. The exclusivity of a bucket allows you to leverage your Location powers. As things like the Mind Palace technique, and the trick of returning to the place you lost your train of thought, show, leveraging our sense of physical location is very powerful. Tags are also brilliant, but they leverage a different thing, which is our natural ability to categorise and stereotype. This is mainly useful for describing queries and commands. I remember having an argument about 15 years ago with somebody who was saying that filesystems should be rewritten to be based only on tags, as it's more general. They made the (good) point that it's absurd to distinguish between /usr/photos/wedding/ and /usr/wedding/photos/. I still struggle to explain why I hate the idea so gutturally, but I do. I can't relax until I can say "that's there's, that's *here*" etc. Basically the ideal system for me is built on a bedrock of buckets, but with the ability to arbitrary query it based on properties, including arbitrary user defined tags. Something's location, i.e. it's filepath can be considered an implicit tag but it should be recognised as fundamentally different. I am actually making a system like this; basically a Lucene+Tika powered add-on for a filesystem. Impossible to know how universal my preferences are 🤷
👍 2
🤔 1
j
There was a good bit written about this distinction in the early 00s, trying to tease out the differences using the terms taxonomy and folksonomy. Some of that work might be of interest to your concerns. @Chris Knott Do kangaroos go in
/animals/australian
or
/animals/marsupial
? Is a hot dog a sandwich?
🤔 1
c
I think the point is, it doesn't matter, there's value in it going Somewhere if that allows you to make use of a sense of location. Where does a spork go in a kitchen? I bet it varies by person, but I bet all of those people would remember where they put it, in their own kitchen. In someone else's kitchen they would have to start querying and searching, for sure.
👍 3
I recently sorted out my Lego so been thinking about this a lot 😅
😂 4
n
@Chris Knott So in the "models for information" thread, @ibdknox was talking about how he prefers the idea of "namespaced tags", which look a lot like file paths, so you can write something like
/usr/photo/wedding/
. The distinction from exclusive buckets though, is that you can reference a single datum under multiple different paths! How does that idea feel to you?
@Jack Rusher Those terms are leading me to some interesting resources. Thank you!
🍰 1
s
@Nick Smith You mention a lot of different things as part of the “bucket” category, so it’s a little difficult to come up with a single, universal answer to what the differences are. Mathematically, a set and a tag can both be just a binary relation without any explicit properties assumed. Then they could be the same, behaviorally. A list implies some sort of order on multiple relations of that kind, which a tag doesn’t (although you’d always have to order elements tagged with the same tag in some way to display them). Tables and databases add more assumptions on the relations they model. Are they reflexive? Transitive (which you need to model hierarchy)? Etc. Tags probably feel more flexible because they don’t assume anything about the relation they model other than that there exists one (perhaps if it’s one- or two-way?), while “buckets”, especially when seen as containers, do assume that these relations have additional properties which makes them less “flexible”. At the end of the day though, the “purity” of tags is diluted by practical concerns, e.g. how to display all elements tagged with the same tag, and then you’re forced to ensure the relations have certain properties, and suddenly you’re back in container-land, just that the same elements can now also be in several containers at once. And then you run into questions of identity: if I change that element in one category, does it change everywhere, or does it just change in the current context (was it just a copy)? Also, not hard-coding any properties of these relations basically just shifts the burden on the user; they still need to remember what a tag means to them and keeping different kinds of meaning apart. That’s when they suddenly re-introduce containment with hacks like namespaces or paths. Technically, modeling fewer assumptions on relations gives you more flexibility in exchange for performance, as the system can’t make certain assumptions (e.g. order) which could be used for optimizations (e.g. indices). Thankfully, the way the system seems to work from a user’s perspective doesn’t necessarily have to be the way it’s implemented. That probably is what good abstractions are about — they’re not just hiding something, they’re hiding a dramatically different structure that you wouldn’t expect from how the system behaves — and hopefully the way it behaves is easier to understand than how it actually works.
❤️ 5
n
@Stefan I think we're going off into the weeds a bit here. I'm (personally) not interested in talking about specific data structures or data models in this thread. That kind of discussion is best reserved for the thread I initiated yesterday. I listed a couple of examples only to hint at how the abstract notion of a "bucket" can manifest.
“buckets”, especially when seen as containers, do assume that these relations have additional properties which makes them less “flexible”.
I think the actual interface that users are presented with is going to be the main driver of assumptions about what a bucket means. My definition of "bucket" here means simply an exclusive location that a datum is considered to "live" within. That's extremely broad, and my hope for this thread is that we can work out the relevance and utility of buckets in comparison to tags.
And then you run into questions of identity: if I change that element in one category, does it change everywhere, or does it just change in the current context (was it just a copy)?
I actually believe this to be the only difference between buckets and tags. In the presence of mutation, the two concepts may behave differently, and without mutation, they are equivalent. I've not yet had time to develop detailed reasoning as to why this might be the case. I'll hopefully post something here in the next 24h.
p
Set ~= Tag isn’t it? As for List / Table: they have an extra layer of connection via integers (list 1D, table 2D), but you could replace this “mesh” with adding proper next/prev information (tags) everywhere. Integers themselves can be modeled! You should check out Idris! list of [1,2,3] VS set of numbers(1,3,2) + nexts( (1,2) (2,3) ) + prevs( (2,1), (3,2) ) + the automation which adds these pieces of information always on change. To me, naturally, if we would like to be really really precise we would only use Sets + adding out extra layers of information to the models by ourselves, but Integers are so useful and make us so capable to hack around with a degree of certaincy without fluff. However, using integers exposes logic you can use to reason about your code. On the other hand, building on Integers means you are exposed to “vulnerability”, because you don’t use the “real interfaces” Integers themselves really do have behind. (eg. successor aka “next”/ successor^-1 aka “prev”)
n
We're off-topic again! I'm going to delete the mentions of those concrete data structures from the question. They are not really what I was trying to talk about at all here.
p
Ah. Ok. Then my answer is: nothing. 😄 But maybe I don’t get the idea.
n
I added an extra note to the question clarifying the conceptual difference.
👍 2
c
I think the key distinguisher is the exclusivity. This is what maps to our natural understanding of location and identity. An object can only be in one place. If otherwise identical objects are in different locations we perceive them as copies. I think this persists even if mutations on one affect the other. We perceive the "other object" as being magically affected "at a distance", we can't perceive it as one object with two locations. The notion than an object can only be in one location at a time, overpowers the related notion that objects can easily change their location and persist their identity. This is not quite the same as containment. The features of physical containment are transitivity and anti-reflectivity. If A contains B, then B cannot contain A. If A contains B, and B contains C, then C is also inside A (indirectly). A filesystem obeys these physical properties, but something like Python's list does not obey either. This is why users get confused by lists that contain each other, and objects being changed when they are in two lists. It is also why symlinks can be so confusing. (You thought you were deleting a copy? Too bad sucker!)
💯 1
s
Let’s go off-topic to the other side then… :-) When you say “interface” I hear “how users understand it”. Then we can talk about my favorite topic: image schemas. These are cognitive patterns that we all use to structure our understanding. A “bucket” is an instance of the container schema. Things are either in it or not, and it has a boundary (which we might or might not be able to describe precisely). That means that the container is of a different quality than the elements it contains. A “tag” is an instance of the link schema. It just means there is something that connects the things at the two ends like a rope (imagine the force you feel when two items are tied together with a rope; and it is such physicality that gives meaning to the abstract concept). The two items connected to each other don’t automatically have different “status” like in the container version, although they could have through other schemas in effect simultaneously. We can cope with the same things being in different containers at once, usually through frames, which are larger contexts of experience. E.g. when you say “cut the flesh” and you are in a restaurant, you evoke completely different images than when you’re a surgeon in a hospital. And it even works out ok on the surgeon’s night out to the steak house. ;-) Now if you use a tag to signify membership to a group, e.g. all blog articles tagged with “technology”, then it’s used much more like a container and likely understood based on the container schema. There are also more complex image schemas that could be relevant, for instance whole-part, which is a little like a container, but where the elements don’t just have to be present, but also need to be arranged in a certain configuration for the whole to emerge. E.g. if you disassemble your car and still have all the parts, it’s not really a car anymore. (That’s also behind the layers example Kartik made in the other thread.) Or take the center-periphery schema. This is how we understand gradual or fuzzy relations. Some things are “core” or central, and if you change them the thing is no longer the same thing, e,g. a tree where you cut the trunk is still a tree, but it’s not the same tree anymore. Other things are peripheral and can change, but if they do the thing still stays the same. E.g. if the tree loses all leads in winter it’s still the same tree. Oh, did you have a haircut recently? No problem, still you. The difference between “bucket” and “tag” can be as simple as container vs. link, depending on how you use it. If the identity of an object changes when it’s modified in different places, does it mean it wasn’t supposed to be the same thing in the first place? This is at the core(!) of programming issues around value vs. reference types and affects your system design. What parts of the elements tagged or put into buckets can change without making them a different thing?
1
👍 3
c
I think I disagree that tagging is the same as linking. Tagging to me is a kind of a categorizing/grouping. It's a type of non-exclusive container. Then I would put "bucket" as a separate image schema of exclusive container, which to me feels like a v important distinction. Mistaking a non-exclusive container for an exclusive container is surely to root of those spooky action at a distance gotchas.
1
g
deep in the weeds here, but i think that the core of the ambiguity here is about identity vs value. in my head a value is something you can “get to” by eg a query or a computation (paths, links), whereas an identity refers to more of a physical object you might find in a place (something you can mutate). if you change a path or a computation, you get a different value. if you change “a thing”, the values update (properties of that thing update? language is hard). that kind of implies that a paths and computations are “things” that “give you” values, which strikes me as both exciting and scary (too meta?)
if that’s the case, buckets are things that contain things whereas tags are things that return or evaluate to values (lists of things). i think? i think the system needs to address both copy & paste and linking but it needs to do so in a closed way (algebraically)