On “missing” values, null and undefined aren’t the...
# thinking-together
s
On “missing” values, null and undefined aren’t the same as an empty string. For example, if there’s an object for a person’s name that has a middle name field, does empty string mean 1) “we know that this person has no middle name”, 2) “we don’t know” or 3) we haven’t yet been asked or tried to input this”? Another example is when you want an undefined attribute to inherit from some delegate. Empty string can’t serve this purpose because it may mean “we know this value is an empty string” instead of “we want this value to be inherited”.
d
the delegate thing isn't something covered in the Twitter thread AFAIR
i
There was a good Rich Hickey talk about this a little over a year ago, if memory serves. He critiqued the use of Maybe in Haskell to represent something that either has a non-empty value, or does not have a value, for the reasons stated. Can't remember which talk, but I can probably look it up if there's interest.
s
Unlike zero, which was invented very early in maths, how come we never needed the empty string until we started working with computers? There could be one, two or fifty different reasons a value is missing - it depends on the problem domain. There is no universal understanding for how these reasons maps onto the empty string, null or any other undefined object that's provided. The only right solution seems to be to have a domain specific enum for the missing reason, and use an empty string as the value.
✔️ 4
This talk?

https://www.youtube.com/watch?v=YR5WdGrpoug

👍 1
d
@shalabh "Unlike zero, which was invented very early in maths, how come we never needed the empty string until we started working with computers?"
?? The empty string was invented by mathematicians long before the first programmable computer was built. Axel Thue invented string rewriting systems in 1914, so the idea is at least that old.
s
Sure. But compare 0 in mathematics to the empty string in text. We've had the notion of missing information but no actual literal form for it. Some countries you write 'NA' in forms, others you write '-' or maybe just scribble in margins about why something is blank.
d
The best way to encode a missing value depends on the programming language. A
Maybe
type (or
Either
type) is the best solution, if you are using Haskell, because then your code is compatible with all of the Haskell infrastructure that supports the use of
Maybe
types, and you don't want to force other Haskell programmers to have to write glue code to interface with your non-idiomatic API. If you are programming in Clojure, then Maybe types are not the best way to represent this concept. Clojure isn't Haskell. If you are using a Clojure map, then omitting a field from the map is a better way to indicate missing data. I think that Rich Hickey's talk is directed at Clojure programmers, not at Haskell programmers, in order to explain why you shouldn't blindly import Haskell idioms into Clojure, when Clojure has its own idioms.
s
I feel this discussion points to how what we call values don’t include all the information we end up needing to track surrounding typical values. We need something more like slots with arbitrary attributes, a value being just one of them. Others might be things like: What’s the default value? What it’s type(s)? How should it get initialized? What are the valid set of, range of, or validator function for this value? Should we persist this value? Is it editable? What are it’s permissions? Do we own it or link to it? How should it be copied when needed? Should it be persisted? etc. These are properties of the slot, not the value it references.
👍 3
i
Of course, now we're deep into the data-modelling bikeshed. Aren't those pieces of metadata just values in themselves? What if we care about the metadata of the metadata? Which systems have done a good job resolving this? RDF / semantic web? EAV[T]?
s
Right, though not necessarily meta in the class sense as these properties may be specific to the instance of the object which owns the value. My point is that it may not be appropriate to try to encode too much meta data (as we might be tempted to do with an empty string) into a value. Most languages and object systems lack abstractions around this, and end up scattering this meta info in difficult to work with ways throughout the code. Databases have similar problems.
w
Slots + attributes feels more like the right way of thinking about it than anything else I've come across. A variation: rather than using explicit attributes, there may be a system that understands the history and current state of slots: whether it's empty or full, whether it was always that way, how many times it has changed—whatever. It tracks what's happening/happened to slots. I think it's just shifting the problem into computation rather than data, so maybe not useful 🙂 But it's what I thought of after reading @Ivan Reese's comment.
s
In programming we generally have an aversion to ambiguity and like everything to be well defined. That’s not surprising because whenever we have ambiguity, it usually leads to exceptions, crashes, undefined behavior, or hard to track down bugs. As a consequence we came up with a solution: get rid of ambiguity. We introduced more specific types that make the formerly ambiguous cases explicit and/or limit the possible values to those we know to work with only. We added type checkers that prevent us from compiling ambiguity into our programs. So now, if your function needs two integers to work, it only works with integers. If you have a string “5” and want to stick it in there, you’ll need another function that’s been specifically designed to convert a string into an integer, which includes a “not possible” scenario and so you’re forced to deal with that scenario. That’s all great and clearly works better than what we had before. Sometimes I wonder if that “let’s make everything more explicit” solution was maybe not the best one and we should’ve thought more into the opposite direction, about how we can keep the ambiguity, but figure out better ways to deal with it. After all, humans deal with ambiguity quite well. If you read this post and you don’t know what “ambiguity“ means, your comprehension process doesn’t crash and you can finish reading the whole post just fine. You’re even trying to make sense of what it could mean based on the context. Or you can look it up later and then it probably makes sense without you having to read the whole post again. I think this is also something that makes programming hard to learn — we’re not used to having to make everything explicit. Most of the world works just fine with lots of ambiguity. Sure, sometimes that approach causes problems too, but look how far we got with it.
☝️ 2
w
That brings to mind two things for me: 1. More flexible notions of equivalence, rather than just equality when e.g. comparing types. Liskov substitution is one example, and "duck typing" is I guess a kind of extreme. The first works to enable context-dependent equality: i.e. in some specific context, we can use a looser notion of equality; the second one works toward equal for "all intents and purposes". Additional strategies could probably be found in each of those categories though. I could see something like neural net recognizers being used for more flexible equivalence definitions as being in the equal for all intents and purposes category, for example. 2. Deferred / partial / placeholder definitions. It seems like a big part of how we deal with ambiguity when learning things is something like: we come across a term/concept that we don't understand while reading something—and it's not a problem. We just invent a unique placeholder symbol for that thing and accumulate multiple tentative definitions which each gain or lose support as you learn more. The 'placeholder' symbol in our minds has something like a tentativeness value associated to it: we can keep using the symbol, but we take into account the uncertainty of its meaning/value.
The only programming systems I can imagine using those concepts in a very deep way would have to depend on some kind of backtracking behavior though: trying out multiple routes in attempt to sort out the ambiguity automatically.
s
The precise gear like fittings required are also a problem when building things. Say if I have a Point object from one library that I connect to a Canvas from another library (either via direct manipulation or code). The interfaces must match precisely (same language, same field names) for it to even work, otherwise it is a total failure. But consider how there are only a few ways that the Point shape could even 'fit' with a Canvas like object. The Point has two fields which are single dimensional each. The canvas has at least two dimensions + maybe color. I have to precisely hand code the mapping, but it seems like the system could provide some obvious mappings for me instead and let me choose? The current models of composition are mostly 'gear like tight' or nothing. Maybe we need some models of composition that incorporate some kind of negotiation? What does this interplay look like?
1
s
@Stefan “Sometimes I wonder if that “let’s make everything more explicit” solution was maybe not the best one and we should’ve thought more into the opposite direction, about how we can keep the ambiguity, but figure out better ways to deal with it.” I agree. An example that comes to mind is the use of constructors and immutable objects by people obsessed with objects never being in an invalid state. I’ve found it much easier to have classes that you instantiate and then call a few configuration methods to get them into a valid state than ones that attempt to have a constructor for each of the combinatorial explosion of valid initial configuration states. Much better to have an object that can report what’s wrong with its state.
w
Shalabh, that made me think of something: in your Point example the problem is to find an automatic translation/mapping function so that some data is structured correctly when moved into a different context. Seems like the other half of that is being able to move behaviors into new contexts. Seems like finding translations of behaviors automatically is a similar problem optimizing compilers have: searching among programs with equivalent semantics, but some different 'secondary' effects. But rather than optimizing performance here, we're looking for some kind of format compatibility...
The gear analogy also makes me think: what is an alternate model at that level which would do better with imprecision/ambiguity? An image that came to mind: the gears were instead composed of tiny little particles and in certain conditions they could be broken up, gear teeth knocked off etc.—but this always happens in the context of something like an annealing process where the objects coming apart are in a higher energy state, and will settle into some alternate/compatible equilibrium state after.
So the runtime for the language takes on some similarities to this sort of physical simulation (custom-tailored of course), so conditions for entering energetic vs. settled states are always consistently enforced—it's just an aspect of the background medium for the language's execution.
s
I don't talk specifically about ambiguity in this post, but you might see where I'm coming from when reading this: https://stefan-lesser.com/2020/01/17/categories-structure-our-world/ In essence, we can talk about `cat`s, although none of us will have exactly the same concept in mind. Nevertheless, we usually understand each other. Only in rare cases or with more abstract concepts does this become an issue. In programming we focus probably too much about defining the value and its type so narrowly that there is no room for interpretation. Although there usually still is. The concept of integer is very different from the type Int in your favorite programming language. For instance, your programming language likely only considers a finite range of integers an Int; perhaps it has another type to represent arbitrary size integers. I see a huge opportunity in hiding this kind of complexity. It will likely not have any performance benefits, but putting these computation cycles to work to no longer make people distinguish between Int32 or Int64 or Int and Float, or Int and String seems like a path to massively simplify programming for people trying to learn it.
s
@westoncb the alternative to gear-like is biology-like which is what Alan Kay has also talked about. Tissues don't have to fit perfectly but if one emits some chemicals that the other has some receptors for, then you get some interaction. There is some fundamental compatibility built-in - the actual mechanisms of messaging for instance. For useful processes, there would also have to be some shared notion of what different messages mean. But there is also tolerance for imperfect fits - unidentified messages may be just ignored and accumulate - perhaps picked up by other neighboring cells that do respond to these. BTW an interesting thing from biology I saw a long, long time ago: two cells from heart tissue may 'pulse' at different rates independently, but when touching each other, they pulse in synchrony. This is so unlike the kinds of compositions we see in computers where aggregation has to be hand wired. I think you're right that the extension of this idea is to 'run any behavior' in a new context. I suppose behavior-as-program is just data for the 'host'.
👍 2
d
@Stefan "I see a huge opportunity in hiding this kind of complexity. It will likely not have any performance benefits, but putting these computation cycles to work to no longer make people distinguish between Int32 or Int64 or Int and Float, ... seems like a path to massively simplify programming for people trying to learn it."
I agree with this, and I'm doing it in my project for that reason. Curv has a single numeric type, the "number", which is represented internally as a float64. No need to explain why 1 and 1.0 are different objects, or why (in Ruby) 7/2 is 3 but 7.0/2.0 is 3.5. Javascript works this way. It's an old idea. Dartmouth BASIC worked this way, in the mid 1970's. APL worked this way in the mid 1960's. Where I encounter a problem with having a single Number type is when I want to work with huge, multi-dimensional arrays, which blow up memory if all numbers are 64 bits. In this case, I need to exert precise control over how array elements are laid out in memory. I need representation types like UInt32, Float32, and UNorm8 (a number between 0 and 1, represented using 8 bits, so that 0x00 is 0, 0xFF is 1, 0x55 is 1/3, and so on. This last one is used for representing RGB values in a pixel array. So now I can create a "typed array", where I use a representation type to specify how abstract values are mapped onto bit patterns in memory. This is a low level programming interface, and is only needed by developers who are developing libraries that use efficient internal representations for data. This interface shouldn't be needed by typical end users, who only need to understand about abstract values like "numbers". So there is a way to do low level programming without inflicting all of the concerns of low level programmers onto end users who are just using high level library interfaces.
n
Approach I like is to use separated operators for different functions. For example: 7 / 2 = 3.5, 7 // 2 = 3, 1 + 2 = 3, "1" ++ "2" = "12". When you limit ambiguity on one side it's much more safe to allow "1" + "2" = 3 because we know that '+' is always mathematical plus operator.
👍 1
d
@Niko Autio I agree, it's the only sensible and correct way to design math operators. There is a paper by John Reynolds that formalizes the design principle using category theory. If you introduce a coercion between strings and numbers, then you have a problem, because there isn't a one to one correspondence between strings and numbers. There are multiple string representations for each number. For example, 1==1.0 and 01==1, but "1" != "1.0" and "01" != "1". So now your equality operator is broken, and the way to fix it, according to this principle, is to do what Perl 5 does, and provide different equality operators for numbers and strings. In other words, a coercion between numbers and strings doesn't simplify anything, it just pushes language complexity from one place to another.
s
@Niko Autio What’s the behavior if the values are of different types?
n
@Doug Moen yea that's true. And to be precise coercion between strings and numbers parts wasn't included to things I like 😅 @Steve Dekorte "/" is float -> float -> float so if any input is integer they are converted to float. "//" is int -> int -> int and if you are trying to insert float you should <placeholder for some trade-off> 😁 But one thing I have been considered is implicit write & explicit read separation. So you could write implicit way: "1" + "2" but derived form of that code would be int.parse("1") + int.parse("2"). Something like:

https://www.beslogic.com/wp-content/uploads/2018/05/type-inference.png

Still challenge is that such visualization separations doesn't fit too well on text edit.