Does anyone have thoughts about equality vs orderi...
# thinking-together
j
Does anyone have thoughts about equality vs ordering in maps/sets? I have some pondering here - https://www.scattered-thoughts.net/log/0048/#zest-ordering but the decision tree at the end is the main thing: • Order isn't observable at all. • Iteration order is either non-deterministic or expensive. • Determism can be manually recovered by storing both a map and a list of keys, but at the cost of storing two copies of each key. • Order is observable. • Order doesn't affect equality. ◦ Equality is not extensional ie
a == b
does not imply that
f(a) == f(b)
. ◦ If
[a: 0, b: 1] == [b: 1, a: 0]
then we must have
struct[a: i64, b: i64] == struct[b: i64, a: i64]
, but we still have to remember that the field order is different, which implies that type equality can't rely on interning and pointer comparison. • Order affects equality. ◦ Sets become surprising / less useful. ◦ If I want to add query planning, I can't promise that
f(db) == optimize-query(f)(db)
.
❤️ 1
Good job collapsing the tree slack 😄
Copy code
* Order isn't observable at all.
  * Iteration order is either non-deterministic or expensive.
  * Determism can be manually recovered by storing both a map and a list of keys, but at the cost of storing two copies of each key.
* Order is observable.
  * Order doesn't affect equality.
    * Equality is not extensional ie `a == b` does not imply that `f(a) == f(b)`.
    * If `[a: 0, b: 1] == [b: 1, a: 0]` then we must have `struct[a: i64, b: i64] == struct[b: i64, a: i64]`, but we still have to remember that the field order is different, which implies that type equality can't rely on interning and pointer comparison.
  * Order affects equality.
    * Sets become surprising / less useful.
    * If I want to add query planning, I can't promise that `f(db) == optimize-query(f)(db)`.
k
I don't particularly like order being observable by your description. Have you considered two iterators, akin to Lua's
pairs
and
ipairs
?
g
What about non-ordered but supporting composite keys where one of the components is the order?
❤️ 1
j
It’s so refreshing to read a discussion of design tradeoffs rather than being presented with a post hoc rationalization. Your discussions heavily weigh performance, so it would make sense to avoid ordering and allow non-determinism. I favor order for user-friendliness. Translating into your terminology, I provide sets as maps to a unit value that are auto-sorted by key value.
s
Do you see ordered objects as being mostly useful for users? Or mostly incidental and only occasionally useful? In general I prefer key/value bundles to be unordered. I can still add order by encoding another list on the side as you mention. It also seems easier to evolve the language from unordered to ordered iteration if needed, than to go in the other direction.
j
@shalabh if you’re asking me, I want all data to be user-visible, and that visualization must be realistic. All collection visualizations I know of are ordered. So collections should be ordered. Jamie raises a good point that ordering interferes with transparent query optimization. I refuse to let performance override usability, so I leave that as a problem for future work.
🤔 1
k
But aren't unordered collections more basic than ordered? We've all had a bag of marbles growing up? Arguably that's more the default state of the world we've evolved for. Even on the computer, I visualize unordered things all the time these days. The way I do it is I have a zone on screen and things I can move around continuously within it. Position has no semantics there.
j
Sets had to be invented. We are just so math-pilled that we no longer see them as an abstraction that has to be learned. That’s fine for us mathophiles but not so much for everyone else. Sets are also an abstraction in implementation: every practical implementation of sets has an order, because it needs to support iteration. You can ignore the order by squinting but it’s still there. Your bunch of things on the screen has an underlying order that you have chosen to ignore. So for both conceptual and implementation reasons I think ordered lists are more basic than sets.
1
🤯 1
g
Both are correct. Set has a more basic interface. List has a more basic implementation.
j
because it needs to support iteration
Well, one solution to the dilemma is just not supporting iteration on maps/sets. If you want to iterate you have to keep a list of keys separately.
Have you considered two iterators, akin to Lua's
pairs
and
ipairs
?
But the order of iteration is still observable, no? This does actually cause problems in practice. Eg to get reproducible builds a lot of compilers had to be refactored to never iterate over hashmaps. If equality doesn't take iteration order into account then it also makes memoization hard:
Copy code
>>> a = {'x': True, 'y': True}
>>> b = {'y': True, 'x': True}
>>> def f(x):
...     return ''.join(x.keys())
...
>>> a == b
True
>>> f(a) == f(b)
False
👍🏼 1
The options that appeal to me at the moment are: • Iteration is always in sorted order and we just pay the cost of trees instead of hashmaps. • Iteration is in insertion order and if you want to compare sets then you can sort them first.
ordering interferes with transparent query optimization. I refuse to let performance override usability
If you sort the keys that's actually less of a problem. You just have to sort at the end (and maybe before aggregations that you can't prove are commutative). But being able to observe insertion order means that you can't reorder loops into joins, which is a much bigger problem.
t
I think the most ergonomic is ordered Maps, thats where JS and python converged. If you want insanely fast then a data-structure for every occasion is where JVM and Rust converged to. I think from a pragmatic PoV having ordered datastrcutures will end up the least work overall coz you don't need to support less datastrcutures and your programs that require ordering will require less typing, but its at the ecosystem cost of a tiny bit of overhead. The consequence on ordering for join optimization is interesting. I kind of think contiguous memory access is super important for optimization and therefore forcing orders on everything might actually might still outweigh the drawbacks, like maybe more optimization oppertunities appear coz the set is not a primary collection (hashjoins?) (!?! dunno)