Has anyone been searching for an ideal model for information Future of Coding #thinking-together

Has anyone been searching for an "ideal" model for...

Nick Smith

07/23/2020, 7:28 AM

Has anyone been searching for an "ideal" model for information? By this I mean an abstract model (not a DRAM/disk representation) with a vocabulary that allows a human to read, specify, and understand information with ease. We have existing models like RDF ("Semantic web") and property graphs (e.g. Neo4j), and in programming languages we have object graphs (OO languages), trees (functional languages), and tables (relational DBs). None of these models seem to have arisen as "the one true answer", partially because they are all difficult to reason about when operated on programmatically (or so I claim). I want a model for information that allows us to easily understand both static information structures (snapshots) and programmatically-maintained information structures (an evolving, stateful system). The ideal model for information might define concepts like "pointing" (edges) or "nesting" (groups), and each concept will have a clear meaning. The ideal model will also have a semantics for mutation, e.g. you will know when you define a relationship like

A points to B

whether

can be deleted or modified in some way, and how this affects the

points to

connection. The model will also have a notion of stewardship of information, i.e. it should be easy to observe the source of a piece of information, and to know whether it is mutable or just a snapshot. (Yes, I believe the ideal model for information should have a vocabulary to describe distribution and trust). Has anyone here thought hard about this? I've been churning on it for a month or two recently, and for many months over the last few years; I can't proceed with my own PL project until I get fundamentals like this right, since the model that I develop will fundamentally determine what a "program" even is. I think a key reason why discovering "the future of programming" is so hard is that we think too much about code and not enough about information. Code exists purely to transform information, and if you don't make strides on models for information then I don't think you can make strides on programming languages. I have some ideas about a model, but I have no insights from others to validate it against. My conception is a very specific hybrid of mutable graphs and immutable ~~trees~~ nested sets that distinguishes between descriptions and references, and it seems very amenable to compact 2D representations, unlike most graphs. I won't describe it further just yet, since I don't want to skew anyone's responses. Anyway, I'd love to hear from other people who have been thinking about this kind of stuff.

👍 9

🤔 1

Nick Smith

07/23/2020, 7:29 AM

@Jack Rusher From your recent posts, perhaps you have some insight? 🙂

Ian Rumac

07/23/2020, 7:56 AM

The ideal model is a user defined n-ary tree… or well, a list 🙂 My long-term project is a tree editor that does exactly that - purely transforms the data. You have a tree T, you write a transform function f(t)->g, you got a projected tree of G now, but inside I keep them connected via references - the model isn’t “G”, the model is “f(t)->G”. This way, you grow your codebase from the model outwards, expanding it into each category you need (sql schema, model, viewmodel) via transformations, or by creating new trees that reference stuff from any of the existing trees. That way, you mutate your original model and all the references get the mutation transformed to their own projection. I keep it all inside a large model called Lattice, that is a basically a huge graph that connects over multiple dimensions (each transformation adds a new dimension to the graph) Didn’t yet come to edges as mutation permissions, since I feel that isn’t a part of the pure model, but of a “projection”.

👍 1

Nick Smith

07/23/2020, 8:16 AM

Your "transform function" is the same as "derived state", a database "view", and a "projection", right? We have a lot of names for it. That's an example of a programmatically-maintained information structure as I called it earlier. What about static structures? How would you encode a text document, or a (static) spreadsheet, and how do you derive meaning from the encoding? Spreadsheets in particular do not have a clean representation as a list or a tree. You'd have to do some shoehorning.

S.M Mukarram Nainar

07/23/2020, 9:33 AM

What do you mean by "information"? In the most general form, this is the core question of ontology, which philosophers have been wrestling with for millenia. Slightly more restricted, you can look at GOFAI approaches to this problem, like cyc, which have largely failed to bear the kind of revolutionary fruit promised. How general are you trying to be? What are you trying to achieve?

Jack Rusher

07/23/2020, 9:34 AM

This is a fairly deep topic for which a simple answer is impossible. A few fingers pointing toward the moon: • It would be best for such a system to store the data in a structure that "changes" through accretion rather than mutation (practical example, much emulated since). If every modification point is essentially a snapshot, you get time travel and other useful semantics for free (Datomic is a recent database that uses this approach). • Up-front schemas are brittle over time. The less you can get away with saying about structure, the better off you will be. (A special case of the more general advice to "design for change"). • The approach whereby everything is a logical assertion (i.e. triples, sometimes called "Subject/Predicate/Object", sometimes "Entity/Attribute/Value", depending on the culture of the speaker) is the most flexible I've been able to come up with yet. This underlies RDF (and TripleStores generally), and is currently in vogue among those who have encountered datalog, mostly via Datomic. • It's very useful to be able to specify additional automatic behavior over triples via some kind of metadata, in the best case also specified in triples. (RDF via OWL, not Datamic.) I have many differences of taste with the place where RDF ultimately went, but there's loads of good stuff in there too. You can find out more about the history of the "data as assertions" approach by reading up on Knowledge Representation (as distinct from data representation), Frame-based Reasoning, and so on. This stuff is one of the threads that leads to object orientation, though much less emphasized in a modern context than the strand motivated by simulation. The "immutable tree on disk" stuff comes from generalizing database journals, which for me starts in the 80s with Log Structured File Systems. Today there's loads of work on immutable logs for all sorts of applications, especially with the recent popularity of cryptocurrencies.

☝️ 2

Duncan Cragg

07/23/2020, 10:09 AM

Wow this thread has already brought out so many fellow travellers who I didn't know were exploring the same terrain as I am! The history of my own project over decades is intimately tied to my search for the "perfect" data and data transformation approach, in search of the "purest" model of programming - which is one that is most cognitively aligned. Aligned to the cognition of a .. ahem .. "average human", so probably less to that of the "average programmer"! So for me it is important to step back and look at common patterns of cognition - how we organise our perception of reality. One approach is to read Roget's Thesaurus! All the important cognitive concepts are there, organised in a specific idiosyncratic way. There are many other "meta-ontology" approaches, as have already been listed here.

Duncan Cragg

07/23/2020, 10:12 AM

I could start listing out what I believe are essential "reality modelling primitives" as you already have done yourself. For me, where I am right now in my journey, that is basically: identity of things, properties of things, relations between things.

💯 2

Duncan Cragg

07/23/2020, 10:13 AM

At a more abstract level, you have symbols of value, order of those symbols and their aggregations, bags/sets of symbols. You have named properties that can have values.

Duncan Cragg

07/23/2020, 10:15 AM

So, as you can probably deduce, I've settled on something that looks like objects that have unique identity, which are bags of properties in the form "property name: property value". Property values are just symbols (text really) or ordered lists of symbols but can also be links to other objects by identity - relations between objects.

❤️ 1

Duncan Cragg

07/23/2020, 10:16 AM

One issue that really bugs me is that I can't extract a single representation of order! There is order in my properties (I don't need that but I want it). There is order in my lists of property values, and there is order in the direction of the link pointers.

🤔 1

Chris Knott

07/23/2020, 10:18 AM

If you want to make it attractive to humans it needs to be built out of concepts that humans have "hardware support" for. Sometimes there's a tendency to seek "elegant" solutions which use the minimum possible number of concepts, whereas I think it's often more natural to expand the number of concepts, even if introduces "redundancy". For example, you can describe a bijection between XML and JSON, that would involve translating JSON lists into a something you can describe in XML such as ["one", "two", "three"] becomes <list-item index="0">one</list-item><list-item index="1">two</list-item>... or something. You could say that XML is a more elegant way to represent information, because it is based basically only on the concepts of Containment, Naming and Properties. But because I believe that humans have "hardware support" for many more concepts, such as Sequences, I think that JSON is actually easier and more natural (for humans). So basically my only input would be to be sceptical of mathematical beauty if the format is intended to store information relating to the human world.

✔️ 1

👍 1

Chris Knott

07/23/2020, 10:20 AM

There is a list of things that have been observed in every human culture (Donald Brown's Human Universals) and it is surprisingly long. Unfortunately it doesn't have much on the way of numbers (only the numbers 1 and 2) as some cultures are almost completely innumerate. It does have quite a few logical operations like Not, Or etc

Nick Smith

07/23/2020, 10:27 AM

@S.M Mukarram Nainar By "information" I just mean data with associated meaning. I refrained from using the word "data" because it's a machine-oriented word and I don't want people to start talking about conventional data structures and pointers. I want to focus on the human interface, not the machine representation. I'm not too concerned with standardizing ontologies or AI right now. I'm more concerned with allowing humans to group a bunch of symbols together by hand or programmatically in a way that makes sense for their own understanding. I'm designing a programming language, but unlike almost everyone in the last few decades, I'm not taking "arrays" or "objects" or "algebraic data types" or "databases" or "relations" for granted: I'm trying to figure out, from complete scratch, what information structures are easiest for people to think with when they are integrated into a complex system (e.g. a distributed app). I'm trying to be as general as I can, but my domain is "programming", taken broadly. I want a model for information that allows users to encode information like "Bob has a car with 4 wheels" in the form that is most useful to their specific application. In a company database the car might be an immutable fact with an associated date and source. You might then tell me that the right model for encoding information about cars is the relational model. However, in a video game the car might be an entity with a position that changes over time, and so we need to be able to talk about the idea of change and the distinction between descriptions (where the description itself doesn't "change" as time passes) and references (where the entities referred to can change). The question of "how do I model information" may seem trivial (or "solved") at first glance, but the moment I ask that you don't treat conventional data structures as axioms ("we already figured it out, just mix these existing things together and follow best practices"), it becomes incredibly deep.

💯 2

Nick Smith

07/23/2020, 10:43 AM

@Jack Rusher I've liked the idea of "accretion, not mutation" in the past. But I'm worried that making it a requirement in a programming system is going to lead to space consumption problems for certain use cases. How do you get around that, if you aren't restricting your application domain and user base? I agree that it's wise to limit unnecessary structure: scenarios where a relationship is specified with some kind of directionality/asymmetry that isn't inherent. I've been calling this "incidental order", and it's one of my biggest concerns. I've been inspired by Datalog, though I've come to a conclusion that mere relations (including as triples) aren't expressive/intuitive/"easy" enough as the sole information structure for for general-purpose programming*. I think it's part of the answer though. Do you believe otherwise? * I probably wasn't explicit that I want a model for information that can be used ubiquitously throughout a computation, not just as a "final output".

Nick Smith

07/23/2020, 10:44 AM

And thank you for the links! I'm going to chase them up later.

Jack Rusher

07/23/2020, 11:08 AM

@Duncan Cragg Careful! 🙂 The age old practice of modeling human mental processes with first order logics has not gone very well. Even something as seemingly straightforward as object identity is quite fraught in practice: https://en.wikipedia.org/wiki/Ship_of_Theseus

😄 3

Nick Smith

07/23/2020, 11:09 AM

@Duncan Cragg

where I am right now in my journey, that is basically: identity of things, properties of things, relations between things.

How do you distinguish between properties and relations? Is the fact that I'm "holding my tea cup" a property of myself, or am I just in a relationship with my tea cup?

At a more abstract level, you have symbols of value, order of those symbols and their aggregations, bags/sets of symbols.

What is a "symbol of value"? I don't quite understand that term. And you seem to recognise both sets and bags? I've always found those to be in contention with one another, and so I've (for the meantime) concluded that if a user is in a scenario where they want a bag (which should be rare), they can simulate it with a set quite easily. The converse is not possible, though.

So, as you can probably deduce, I've settled on something that looks like objects that have unique identity, which are bags of properties

I've currently settled on two notions that together can act like a conventional object. The first is something I'm tentatively calling a picture, which is a set with an exterior label (I call it the topic) that contains symbols and/or further pictures (these are called subtopics). These can be used to form the "descriptions" I was hinting at earlier; they are immutable value types. If you want a picture to be mutable (and thus behave like an object), you can stick it in a workspace, which is the unit of mutability. Workspaces are identified (and thus referenced, and queried) by the aforementioned symbols. A workspace's state evolves by a rulebook (a set of rules), which updates a workspace's picture in response to events. I plan to base the semantics of rules on logic programming. The end result will be something like "nested Datalog", though there is no literature that matches the model I described above. Picture topics replace the need to have a special notion of a statically-specified "attribute", because the topic can itself be any value. This should hopefully make meta-programming possible, but at the very least, I hope it leads to a very minimalist (and intuitive) semantics. All of the above is subject to change, of course! Given my current track record, I wouldn't be surprised if I find some major flaw in this kind of architecture.

Jack Rusher

07/23/2020, 11:21 AM

@Nick Smith In terms of storage, the natural thing to do in a log-structured environment is to have a "cleaner" that purges no longer referenced old versions subject to some policy. In the paper I linked, we built the system to purge whatever was no longer linked. In another system one might have a policy of "versions older than a year to which no one holds a dangling pointer". In either case, I would strongly encourage you not to consider practical matters of performance or storage efficiency at this stage. Go a bit mad, follow untrodden paths, find weird things! As for expressivity/ease of triples, one would expect a system that can construct other representations on top of triples for ease of use. What that looks like in your domain is a very open question. 🙂

👍 1

Nick Smith

07/23/2020, 11:25 AM

@Jack Rusher I'm definitely good at going mad! It's my perpetual state of being 🙂. Thank you. I'm going to keep pondering on all of this!

🧘‍♂️ 1

Ian Rumac

07/23/2020, 11:28 AM

@Jack Rusher which paper are you talking about exactly? I was on the verge of implementing a cleaner, “Interdimensional black hole” so to say that would purge empty references in the background, but couldnt decide on when to run it.

Nick Smith

07/23/2020, 11:36 AM

@Duncan Cragg

One issue that really bugs me is that I can't extract a single representation of order!

If you make any breakthroughs on this, let me know. I'm still trying to figure out how to represent sequences (lists, priority queues) and trees (e.g. family trees, or any other domain-relevant tree structure) in a non-hierarchical manner, such that users can perform arbitrary traversals (in any direction) and queries. My big constraint here is that I want to ensure traversals will always terminate, so I need there to be some measure of progress associated with the traversal rules. Another challenge is multi-dimensional ordered structures, for example, a spreadsheet! How can one describe a traversal across a spreadsheet? Every cell in a spreadsheet is ordered in two dimensions! My current line of investigation is to determine whether all kinds of ordering can be exposed as numberings, and sequence manipulation is manipulation of those numberings, i.e. a list might look like

{(c,1), (a,2), (t,3)}

and a priority queue might look like

{(c, -12), (a, 7), (t, 34)}

. I'm thinking of going deep and figuring out if lists and trees can be treated as spatial data sets and manipulated geometrically.

Duncan Cragg

07/23/2020, 11:43 AM

How do you distinguish between properties and relations? Is the fact that I'm "holding my tea cup" a property of myself, or am I just in a relationship with my tea cup?

it's a property of you, the link to the cup, and a property of the cup, back-linking to you, since both of you care about the holding! Other links may be one-way: the puddle has a link to the sun that's warming it and evaporating it, but the sun couldn't care less about the puddle

Nick Smith

07/23/2020, 11:45 AM

That seems like a bit of an arbitrary distinction though, right? How do you formalize who "cares" about a relationship? How does it affect how a user models their application?

Duncan Cragg

07/23/2020, 11:46 AM

What is a "symbol of value"? I don't quite understand that term. And you seem to recognise both sets and bags? I've always found those to be in contention with one another, and so I've (for the meantime) concluded that if a user is in a scenario where they want a bag (which should be rare), they can simulate it with a set quite easily. The converse is not possible, though.

"3.2" "up" "red". set/bag: just recognition that ordered lists sometimes don't need order, but have to be laid out somehow in space, which means the order is incidental. I don't think this is a problem though.

Duncan Cragg

07/23/2020, 11:46 AM

"who cares": it's up to you as the modeller

Duncan Cragg

07/23/2020, 11:48 AM

This thread is piling up and I need lunch.. 😄

Duncan Cragg

07/23/2020, 11:49 AM

What are your measures of success? To be programmable by techies or normals?

Duncan Cragg

07/23/2020, 11:51 AM

@Jack Rusher I take the approach of aggressively GCing old states/versions: if you want to model history, you do that explicitly yourself

Duncan Cragg

07/23/2020, 11:52 AM

On order (e.g. spreadsheets): that's a whole thread in itself

Duncan Cragg

07/23/2020, 11:52 AM

I'm getting overloaded. Time for lunch

Jack Rusher

07/23/2020, 12:18 PM

@Ian Rumac This paper describes one such implementation. We used multiple threads of execution. One of the benefits of the overall architecture is that multiple readers can continue running without any locking while writes are streaming in.

Duncan Cragg

07/23/2020, 12:34 PM

So ... fuelled up now.. 😄 Spreadsheets: well, tables, well... again it depends on the semantics of the table: for example, a list of objects can form a table with the common property labels as either row or col headers, and all the property values in the grid. If you have a genuine case of 2D tabular data - e.g., an f(x,y), then the order you select (x then y or y then x) only matters for optimisation

Duncan Cragg

07/23/2020, 12:36 PM

funny, can't think of an example of f(x,y) as a table, where col headers and row headers are either both symbolic or both ordinal .. can you?

Duncan Cragg

07/23/2020, 12:39 PM

(a 2D lookup table where row and col symbols are values not "property names")

Duncan Cragg

07/23/2020, 12:43 PM

If you make any breakthroughs on this, let me know. I'm still trying to figure out how to represent sequences (lists, priority queues) and trees (e.g. family trees, or any other domain-relevant tree structure) in a non-hierarchical manner, such that users can perform arbitrary traversals (in any direction) and queries. My big constraint here is that I want to ensure traversals will always terminate, so I need there to be some measure of progress associated with the traversal rules.

why non-hierarchical? how does that help with traversal termination?

Duncan Cragg

07/23/2020, 12:45 PM

I've currently settled on two notions that together can act like a conventional object. The first is something I'm tentatively calling a picture, which is a set with an exterior label (I call it the topic) that contains symbols and/or further pictures (these are called subtopics). These can be used to form the "descriptions" I was hinting at earlier; they are immutable value types. If you want a picture to be mutable (and thus behave like an object), you can stick it in a workspace, which is the unit of mutability. Workspaces are identified (and thus referenced, and queried) by the aforementioned symbols. A workspace's state evolves by a rulebook (a set of rules), which updates a workspace's picture in response to events. I plan to base the semantics of rules on logic programming. The end result will be something like "nested Datalog", though there is no literature that matches the model I described above.

Picture topics replace the need to have a special notion of a statically-specified "attribute", because the topic can itself be any value. This should hopefully make meta-programming possible, but at the very least, I hope it leads to a very minimalist (and intuitive) semantics.

This is what prompted my question above: who are your target audience? Is this going to be intuitive to everyone, if it's non-techies you're after?

Duncan Cragg

07/23/2020, 12:46 PM

(iow maybe you could draw a diagram with examples! 😄 )

wtaysom

07/23/2020, 1:05 PM

@Nick Smith "Has anyone..." asked in the place where a good fraction of us have. 😉 After playing in this space for twenty years, I'm not a believer in an ideal model. What is ideal will depend on the task at hand. What is better is having systems that you can readily shift between.

amiga tick 2

😆 1

wtaysom

07/23/2020, 1:08 PM

You know you're in bad programming place when a conceptually straightforward shift proves difficult. On realizing that instead of having one of these things, you want to be able to shift between three different kinds. which will now require six months of developement work.

☝️ 1

wtaysom

07/23/2020, 1:09 PM

Also today via Hacker News, I was pointed to Terry Tao talking about mathematical notation, which is of course a model for the underlying notions. https://mathoverflow.net/questions/366070/what-are-the-benefits-of-writing-vector-inner-products-as-langle-u-v-rangle/366118#366118

Nick Smith

07/23/2020, 1:12 PM

What are your measures of success? To be programmable by techies or normals?

Definitely "normal people". Unusual folks like us can focus on developing the infrastructure that implements humane programming systems. Of course there's still going to be concepts that need to be learned and practiced for a person to develop competence.

Is this going to be intuitive to everyone, if it's non-techies you're after?

Once learned from effective teaching materials, and with the aid of effective visualisations, yes, I hope it will be intuitive. I actually think I can map it to everyday representations like nested dot points and tables. But first I have to figure out the semantics I want.

Nick Smith

07/23/2020, 1:17 PM

@Duncan Cragg

funny, can't think of an example of f(x,y) as a table

The data that people store in a spreadsheet is usually tabular, but the spreadsheet itself, as a graphical artifact, is most certainly a function f(x,y) from row/column number to datum. Navigating a spreadsheet using a keyboard requires locating successor/predecessor cells within each axis. So a spreadsheet is perhaps better termed a grid, which one could argue is the generalization of a sequence to 2D.

Nick Smith

07/23/2020, 1:20 PM

why non-hierarchical? how does that help with traversal termination?

Those are orthogonal, but the reason for eschewing hierarchy is to avoid a situation where a user specifies a data structure, and later realises that they can't express the query they want because the data is "ordered" in the wrong way. Then they change the structure or add a new one. Hierarchical data structuring always leads to this pain.

Orion Reed

07/23/2020, 1:39 PM

Been thinking about this problem for the last year or so, and I’ve ended up somewhere I didn’t intend so this may not be an answer you’ll like. I don’t think there is any single representation that has all the desirable properties you’re looking for, but not only do I think that’s okay, I think we can do even better by moving up a level and instead look for a representation that can itself represent the more desirable representations. If we take a hypergraph or metagraph, we can use it to represent sets, lists, trees and acyclic graphs, etc, etc. And thus can embed structures with desirable properties if we accept a base representation that has few if any of the properties we would like.

👍 1

Duncan Cragg

07/23/2020, 1:50 PM

We're gonna need some examples to chew on. 😄

Orion Reed

07/23/2020, 2:10 PM

I’ll pull out some references from my research library when I’m home, but in the meantime I’ll put a couple things forward. There is a partial order to the generalisation of structures such that hypergraphs [1] are at the top, a subset of hypergraphs form a directed graph, a subset of directed graphs form an undirected graph, and this pattern follows down to trees, sets, lists, and so on. There are many possible semantics for these structures and I’ve yet to see much work here as most of it is towards proving certain properties of these structures. Category theory provides useful properties but certainly doesn’t translate directly to nice semantics for an actual end-user system. 1. This is not quite true, as metagraphs are more general though not yet well understood or defined. And also typed graphs are more general than un-typed graphs. (Sorry for all the Greek!)

Orion Reed

07/23/2020, 2:14 PM

I’d add that I’m not proposing that this is the best way to interact with a system, and there are places where natural representations do not always fit in a machine-oriented way, such as functions, n-dimensional arrays, etc. But in these places there are natural representations that can be used in part to interact with them. I.e. representations of grammars for strings (not the data type, just ordered sets of symbols) along with logics, type systems, etc given an appropriate semantics and grammar.

Nick Smith

07/23/2020, 3:52 PM

@Orion Reed Idk, I'm not really being sold on the hypergraph thing yet. I've never seen an information model based on manipulating graphs that feels "good" and "intuitive" for general-purpose applications, and if that's not your point, then I'm not sure what value you're proposing they have. Mathematical formalization takes a backseat to user experience in my world. Also, I can actually model hypergraphs quite easily in the information model I briefly described above (labelled nested sets). I'm not sure why I'd want to, though.

ibdknox

07/23/2020, 4:24 PM

@Nick Smith he's saying that if you use hypergraphs as your canonical representation, all other structures are reduced to views and mutation rules over that hypergraph. This gets you the property that @wtaysom was talking about that it is trivial to go from one representation to another. It also gives you the flexibility to create "good" and "intuitive" interfaces on top without changing the underlying model.

🤔 1

Orion Reed

07/23/2020, 4:26 PM

@Nick Smith I agree, mathematical formalisations should take a backseat to user experience a lot of the time, and also agree manipulating graphs doesn’t feel good for general purpose applications, at least not yet. My point (which by all means you can ignore!) is that it may be useful to build the information model at a slightly higher level than your planned use-case, which may help de-emphasise finding a perfect representation which you may wish to tweak later. I think there’s lots of merit in sticking to your intuition and discovering the emergent properties of your system [1] I bring up these points mostly as exploratory, and to help support the idea of separating the model from the internal workings of computers or current computer systems. 1. many great ideas happened this way, Unison for example discovered many great consequences only after deciding their basic principles of content-addressed immutable code.

👍 1

Orion Reed

07/23/2020, 4:29 PM

@ibdknox yes that’s exactly it, well put

ibdknox

07/23/2020, 4:45 PM

if your goal is to go after end users and also be truly general purpose, I think you'll likely end up on a hypergraph. People do not naturally think in strict structures, they think in very loose, just in time structure. As several people have said in here, that means you need to flow seamlessly between representations and allow for partial/incomplete ones to still be useful and to link to completely different views that color things in a bit more.

Andrew F

07/23/2020, 4:46 PM

This is what I've been thinking about for the last several years, and haven't gotten all that far. One of my goals is to provide an intermediate format for converting/proxying between systems speaking different languages (immediately commercially useful and helps overcome network effects). I think my most interesting idea at the moment is that data is a program (under a suitably restricted interpreter), that outputs itself, or parts of itself identified by an argument (e.g. treating a hash map as a function from keys to values). This also helps with one of my long-standing goals to ease mixing of manually and procedurally generated data.

💡 1

Andrew F

07/23/2020, 4:51 PM

I thought about hypergraphs for a while, albeit when I was a lot younger and stupider, and my conclusion was that I'd rather go with a fundamental structure that makes it easy to implement hypergraphs. To me that looks more like a logic database or triple (tuple?) store.

Orion Reed

07/23/2020, 4:53 PM

@Andrew F that first goal of yours is one I’ve been pursuing for a while for the exact same reasons! Have you made progress you’d be willing to share?

Andrew F

07/23/2020, 5:10 PM

@Orion Reed haha, not really. I've spent the last several years building up ideas and tearing them down when I realize they're not general enough. I am confident at this point that the data store needs to be an active participant to support things like API translation proxies, and that solving the problem of Naming Things is important (tracking identities referenced in different ways in different representations). Ask me again in a year. :D Have you already read the Functorial Data Migration peoole's work?

Orion Reed

07/23/2020, 6:39 PM

@Andrew F I hadn’t but will check it out, it looks compelling. I’m assuming this is the work you’re referring to?

Andrew F

07/23/2020, 6:52 PM

@Orion Reed yep. I think they've built a company on their ideas, too. IIRC they still had to do some weird hacks that made me think it needed more work to be nice to work with in the real world, but definitely interesting. I need to read them more closely at some point.

Ian Rumac

07/23/2020, 7:25 PM

Oh, thanks to you guys I now know that what I was doing is a hypegraph 😄 IIRC my hypergraphy was basically a store of mostly maps of <ID, Value> and <Type/Dimension/GraphID, <ID, NodeData>> . Guess I was calling it wrong the whole time 🤔

Duncan Cragg

07/23/2020, 7:35 PM

I had to look up hypergraph, but I'm still confused about why it's not just the same modelling approach as .. JSON?

Duncan Cragg

07/23/2020, 7:35 PM

Or really just relations.

Orion Reed

07/23/2020, 7:56 PM

@Duncan Cragg Relational models impose some more strict structure over your data in the form of tables. In graphs and hypergraphs the vertices/nodes and edges/lines are first class citizens.

ibdknox

07/23/2020, 7:58 PM

relations in 6th normal form are a good practical representation for hypergraphs

amiga tick 2

shalabh

07/24/2020, 12:38 AM

Amazing thread. Been slow churning on this for years with nothing to show for it. Anyway, recently found this paper which may be of interest - tries to define a 'conceptual graph' that represents info of how the user sees it, without the storage representation details: https://pdfs.semanticscholar.org/2ae6/ac8fc13710d9c086c0e5cb952eef52c9b3cd.pdf

Pezo - Zoltan Peto

07/24/2020, 12:58 AM

@Nick Smith I’ve been thinking about similar things for a while. What I see now is (obviously?) graphs are the only, truly extendable & fractal like structures. Fractal like: which also means the minimum we can do with the “Graph Information” on a the “simplest level” is also the maximum we can do with it on “higher levels”. With that in mind I feel like the direction to is to build “Views” on the top of the “Graph Information” - which Views themselves are going to be Graph based entities. Also, we can say these Views would act like filters to reduce the noise of “all information” and deliver the proper context. So after that, my answer is: just simple, pure Graphs, but a new question arises: how to build Views on them - even on multiple levels… I just don’t see ANY other idea which can’t be translated to that approach. It seems to me it’s the alpha and omega. It’s not accidental RDF node-edge-node triplets and stuff like that emerged and ontology is full of that.

Garth Goldwater

07/24/2020, 12:58 AM

this thread is amazing. i’ve been working on this since i started hoarding popular science magazines in middle school and wanted a way to organize them lmao. lately i’ve become convinced that whatever structure you settle on is going to have to be expected to be in an “incomplete” state, and that there’s some kind of dual relationship between data and computation that needs to be leveraged (thinking about issues like caches and garbage collection as a UX feature rather than an infrastructure bug). i’m working on a really stupid version of this stuff in json and will be posting in #CCL5VVBAN and #C0120A3L30R, and i would be delighted if any of the people in this thread ruthlessly criticize where i end up!

😎 2

👍 3

wtaysom

07/24/2020, 1:29 AM

When @ibdknox says "6th normal form," it more or less means "all hash tables" with array keys as needed and does a great job of capturing the dependencies of the form for every _ and _, you can have a _. Correct me if I'm way off base.

🙏 1

ibdknox

07/24/2020, 1:42 AM

Yep! You can also think of it as triples where the entity can be a composite key. This gets you to true atomic units of data, reducing any further would cause information to be lost. The important thing that this captures is properties on hyperedges - e.g. something like marriage where the key would be two people and maybe the value would be the date or number of people in attendance.

👍 1

🙏 1

Nick Smith

07/24/2020, 1:48 AM

I think my "nested labelled sets" conception is just a particular manifestation of a 6NF DB, with the only difference being it has a notion of scope and modularity that allows you to easily shuffle chunks of data around. @ibdknox this is like your scope-limiting ~~bag~~ "database" notion in Eve (I re-checked the name), except you can nest them inside each other.

ibdknox

07/24/2020, 1:52 AM

yeah, we ended up calling them databases. We regretted adding them and removed them right after 0.2. The problem was that it became very confusing to figure out where things were and should be. Namespaced tags/attributes ended up working much better for us.

Nick Smith

07/24/2020, 1:54 AM

You have to have a boundary at some point though right? If your PL supports distributed apps you don't want to accidentally query some fact from a server in some Japanese village somewhere. At what point do you hit a "bucket" of data?

ibdknox

07/24/2020, 1:56 AM

you would bring those in as namespaced tags, so #foo vs #japanese-village/foo

Nick Smith

07/24/2020, 1:57 AM

And every query must draw facts from a specific namespace?

ibdknox

07/24/2020, 1:58 AM

you can arbitrarily union them together within a query, you can also bind them to some common tag and just query that tag

Nick Smith

07/24/2020, 2:00 AM

Isn't that the same as an Eve 0.2 "database" if you add the ability to make the database a variable? That would emulate the latter capability you mentioned anyway.

Nick Smith

07/24/2020, 2:06 AM

I guess I'm trying to understand this problem you mentioned:

it became very confusing to figure out where things were and should be

I don't quite understand how this could be a serious problem. I'm no fan of hierarchical organisation in file systems, but a little bit of hierarchy is good for modularity, which you really need in any distributed system.

ibdknox

07/24/2020, 2:06 AM

We should probably split this off so as to not derail the conversation further. The difference is largely in how people approach the two mechanisms. One is just a more specific name, the other is an actual place. We, ourselves, pretty consistently made mistakes about which place to query and write into, but we didn't experience the same problems with more specific names. Part of the difference is also how you selected a database in Eve (declaring it at the search/bind level) which made unions pretty awkward. That one's not at all fundamental and you could've chosen something else, but I suspect it would look a lot like namespaced tags if you did.

👍 1

ibdknox

07/24/2020, 2:08 AM

It's easy to forget the database name, or forget to change it when you copy a block over, for example. And the bugs that result from that are pretty hard to understand without some nicer tooling.

Nick Smith

07/24/2020, 2:46 AM

I think "namespaced tags" and buckets/databases only diverge in meaning in the presence of mutation (changing values). I'll discuss that in this thread (once I think it over further): https://futureofcoding.slack.com/archives/C5T9GPWFL/p1595571049299600

Duncan Cragg

07/24/2020, 11:10 AM

Yes, @Nick Smith, that's a good point: you can't edit db views. You can't edit the value in a formula cell. So a bucket that can evolve may look the same as a query result, but the latter only evolves because one of the former does. Same with tagged data: the "bucket" of items with that tag depends on items being tagged, and can't be edited or evolve independent of that. Whereas a directory of items, a bucket, can.

Duncan Cragg

07/24/2020, 11:16 AM

From an end user perspective, the question is whether the item or a collection of items is the primary focus. In some modelling tasks, it's important to model a first class bucket of items, e.g. the actual people standing in an actual room, rather than having a big swimming pool of people and picking out those who are in the room to be tagged accordingly.

Pezo - Zoltan Peto

07/24/2020, 11:46 AM

@Nick Smith Also don’t forget you can replace data on edges easily by introducing a new, intermediate node.

(N1)--[E]--(N2)

might become

(N1)--(E)--(N2)

, then you have a bipartite graph to work with.

Nick Smith

07/24/2020, 12:10 PM

@Pezo - Zoltan Peto Tbh I'm not really keen on thinking about graphs as a foundational concept. Graphs are a solution looking for a problem. You can use a graph to model a set of relationships, but a graph is not itself a set of relationships. I think we should talk about "entities" and "pointers" between them as a potential use case for graphs rather than talking about graphs first and figuring out their applicability thereafter. Otherwise our thought patterns will never escape from our preconceptions of what a graph is and can be.

Orion Reed

07/24/2020, 12:51 PM

@Nick Smith I agree for most of us we should not go from graphs -> use-cases and want to add: “Not every problem is a graph problem” which is evidently true. There’s also the useful truth that almost any imaginable problem can be translated into some sort of graph. (Space efficiency, practical implementation aside)

👍 1

Orion Reed

07/24/2020, 12:59 PM

I often think in sets, partial orders, trees connected to lists inside of graphs next to some other structure, etcetera. It’s certainly a messy business and we’d miss a lot of perfectly sensible thinking if we forced people to first translate their thoughts into graphs. But I guess that’s what the computer is there for 😉

Garth Goldwater

07/24/2020, 3:13 PM

@ibdknox @wtaysom does that mean that clojure’s EDN is a natural fit for 6NF or am i missing something?

wtaysom

07/25/2020, 7:30 AM

I cannot read your mind @Garth Goldwater. Are you talking about https://github.com/edn-format/edn? What kind of connection are you imagining?

Garth Goldwater

07/26/2020, 12:37 AM

yeah, that’s the one—i just mean that it has support for array keys, for example. is having sets point to other sets the minimum requirement for eg serializing a hypergraph or are there other capabilities a data format would need for comparably powerful semantics

Duncan Cragg

07/31/2020, 7:30 PM

Did this thread reach anyone's conclusion? 😊

Nick Smith

08/01/2020, 6:03 AM

It gave me stuff to think about at least!

3 Views

Open in Slack

Previous Next