In something like prolog, terms can be nested. So ...
# thinking-together
j
In something like prolog, terms can be nested. So I can express the idea "Socrates believes that he is mortal" with
bel ieves(socrates, mortal(socrates)).
Are there any popular database types that make it easy to have relations of arbitrary arity as parameters of other relations, without unduly adding to the complexity of the schema? Preferably with ungrounded statements and open-world negation? Is there some obvious reason why not? Is there a computational complexity problem that arises in the real world? RDF allows triples to be referenced, I believe, but you are limited to arity 2, which seems needlessly limiting. Labeled graphs have arbitrary arity for non-entities, but entities are limited to two, and you usually can't refer to an edge. It seems... weird to me. Is it just that we don't really have the efficient reasoners over those kinds of expressions, so it hasn't been useful?
❤️ 1
n
Sounds like a "nested hypergraph" (which I have never used before): https://arxiv.org/abs/2405.12235
Might want to check out https://hypergraphdb.org/ > The unit of storage is a tuple made up of 0 or more other tuples.
This too seems similar: https://wiki.opencog.org/w/AtomSpace It hints at one of the complexities involved. Some atoms become executable. For example: Queries themselves are graphs. Atomese language is Turing-complete.
j
I've looked at nested hypergraphs before, but atomspace is interesting, thanks!
c
Can't you do this just with a unique ID/foreign key...? BELIEVES is a relation that takes a tuple as the 2nd parameter. (SOCRATES, IS, MORTAL), 123 (SOCRATES, BELIEVES, 123), 124
d
RDF* seems to do it that way, putting names on subgraphs and referencing those.
j
@Chris, I'm not sure how that representation would distinguish between asserting the fact itself, and merely asserting that the fact is believed. Statement 124 should be possible to express even if statement 124 is not known to be true.
c
Yeah you would have to separately distinguish which statements are TRUE. This would just be a particular relation though? It's no different from which statements are FUNNY or INTERESTING
(124, HAS PROPERTY, TRUE/FUNNY etc), 461
d
It is slightly different, because how do you know that 461 is true?
c
Yeah you're right in that if you want to do some kind of meta level inference there's a special kind of "true" which is USER thinks X is true
The system/UI would special case this
But you presumably want to preserve the ability to say that Socrates believes 1 = 2 is true etc though
d
yes, but I am not sure it has to look the same as "these are the things the system should believe to be true"
special casing the latter is probably OK
j
If I have to look to see if the statement exists, and then also check if it is listed as true, that's a lot of needless cognitive overhead for the human user. And as the nesting gets deeper, reading statements is going to get harder and harder.
g
The reason the relational model forbids this — the reason it is restricted to first order logic — is that this runs into the halting problem. Roughly, once relations can refer to other relations, they can form arbitrarily complex structures where queries never finish. The relational model is, almost mathematically so, the richest model you can have where queries always finish. Note that in the above suggestions where you refer to other relations by a code, you have to go into your external, Turing-complete language to use this. Not a reason a database should not have this feature, of course, but this might be a useful nuance to be aware of.
j
That helps explain it, thanks.
d
Every programming language runs into the halting problem, and yet, we don't care. Why do we care for a knowledge language?
g
SQL has poisoned the entire industry. So you’re used to thinking of a relational database as this heavyweight, fixed sort of thing with a terrible query language. The correct way to understand the relational model is that it lets you separate your first order logic (FOL) computation from the stuff that should be (either logically or pragmatically) done in a Turing complete language. The vast majority of the logic in just about any program can be written in FOL. Folks don’t even think about doing that because SQL is terrible. One of the great features of the relational model is that precisely because of its limits, optimisation is a relatively simple problem. We should be able to use relations freely across all of our programming tasks. The result would be that you can just declaratively specify what comes from what, and the query optimiser will make it work efficiently. These considerations still apply to SQL databases. If you execute a query, it will eventually finish and when it does, you’re guaranteed that if some data satisfies your query, it will be retrieved.
1
If we use a database that is Turing complete, you can write something that is logically correct, but the query engine might not be able to determine that. It is a great virtue that if something is true of your data, if you can express it in FOL, your database can find it.
j
I have difficulty gauging how practical a virtue the guarantee of termination really is. I can end up with non-terminating code elsewhere, and I just fix it. Is the benefit perhaps that someone cannot add data that makes a query stop terminating that used to terminate? Because not a lot of people are letting users run arbitrary untested queries against their data. So it feels like you would know in advance it will terminate unless the data can stop that from happening, maybe?
2
g
The relational model is, basically, “essence of declarative programming”. 2nd order logic, subject to halting problems, currently mostly needs to be written by humans. But the rest of it can be declarative — most of most programs, in fact. The part you can express in FOL benefits from this essence of declarative stuff insofar as you can just declare the rules of what facts follow from what other facts, and then the computer writes an efficient algorithm to implement the logic for you. The relational model provides a theory for generating code from declarations about what is true and what follows from what.
j
Don't get me wrong. I'm sold on declarative logic code. Love it, terminating issues and all. Big fan. I just am not sold on the FOL limitation, and I'm having difficulty believing that anyone without a comp sci degree really cares about guaranteed termination that much, unless it is guaranteed against changes in the data.
g
But that right there is one reason for the design. If you have a language in which someone can’t express a non-terminating query, they don’t have to understand any of that, but they know they have a completely reliable system that always answers their questions.
j
Regardless of the data, or just regardless of the query?
a
I’m curious too. If every query has a 60-second deadline, valid FOL and SOL queries will both sometimes fail. How much does it matter why? Are the performance problems harder to diagnose and fix with the more complicated languages? Are they more common?
I was playing with the idea of a query language the other day and realized that I was limiting myself to FOL. So I questioned the choice. I came to realize that apps will need additional flexibility in how they process data before it can be presented, and limiting the query language wasn’t going to make that requirement go away. All I was doing was forcing algorithms to be split across database and client, instead of allowing the entire algorithm to be done in one language. Worse, if the seam is in the middle of an algorithm, I might be forcing multiple network calls, with all the de/serialization that entails, to satisfy one semantic query. For what? So that I can say that every operation my database supports is fast? Wouldn’t I rather say that my database is easy to use?
I decided that I’d rather bring relational operations to a more complex language, and just isolate them enough that they can be plucked out by an optimizer.
g
Aha! You discovered the weakness in my cunning ruse! Not really. You can certainly write SQL queries that will time out. But in practice, those are rare, and if you had to solve the same problem in a Turing-complete language, there’s a good chance you have no other way to fix it. Computation complexity is always with us, either way. And in practice, there might be some things that are easier to write in a TC language. But I’d have to struggle with some quite unusual sort of data for a while to produce an example. So: a relational database can represent most of most programs more cleanly and simply than any mainstream TC language can, and it does a lot of the programming for you, particularly in terms of working out what order it should process each part of your logic in. And, because it can do that dynamically and automatically, it can retain efficiencies even as you change the model, whereas the typical TC solution might need substantial re-engineering to retain efficiency.
There is absolutely nothing wrong with a higher-order/TC query/logic language. Prolog is one such, is very useful for some types of problems, and is I believe easier for non-developers to learn than most TC languages. Mercury is another such, not so beginner friendly but very expressive and flexible.
Codd’s original conception was that the relational database engine would be used in combination with a TC language. Predicates for filters, for example, are not part of the relational theory but are just supplied from a TC language. So your idea for a language that contains both but separates them is not terrible. Although even with SQL, you can see the advantage of separating the relational side. One of the most underappreciated advantage of a separate relational storage engine is that you can use the same data store from multiple programs, written in different languages.
SQLite is interesting, in that actually leans a fair way into injecting the TC language. The features are not as well known as they should be, and depend on whether the language embedding supports it, but in SQLite, the host application/language can potentially define: • regular functions; • aggregate functions; and; • (what most folks don’t know) custom type deserialisers. Postgres has close similar features also.
j
The fact that all queries terminate isn't the actual benefit. As a result of that fact, the user can alter the data arbitrarily, and there is no possibility that by doing so they cause queries to become non-terminating. That's the actual upside. The user can't break it. If you are using a higher-order system, you have to either a) be careful not to use potentially infinite structures in your queries (in which case why not just use FPL), or b) guard against the user adding data that causes infinite loops. For my use case, b) was already taken care of in the context of the limited queries I have. Thanks for the enlightening chat!
g
You certainly need TC behaviour to be available. But separating what you can into the relational model has enormous benefits in terms of simplicity and flexibility.