Perhaps all programming is distributed programming...
# thinking-together
k
Perhaps all programming is distributed programming? This little thread is melting my mind: https://queer.af/@erincandescent/105562630364133151 My mindset for the past 5 years has been that UX = gradual teaching on demand = never lying to the end-user, which implies never telling big lies, only little white lies that are easy to push aside to learn more. But at the heart of everything I did is the big lie that I'm programming "a" single unitary computer. I've always known about many little controllers in our computers, but my mind treated them as exceptions. I think society has this consensual hallucination that software runs "on top of" hardware. The past six months (starting from my old realizations about BIOS: https://futureofcoding.slack.com/archives/C0120A3L30R/p1599112907014300) have been a slow, painful journey to come to grips with this fact. Bat signal to @Charlie; this is fodder for your old thread from May 2019 that most software is built across collections of computers. @Ian Bicking too had a post around the same time that messaging and communication are all, that decomposition is trivial, that focusing on "programming" is often a modernist approach. If we started from the assumption that "coding" is about orchestrating groups of computers, what would we do differently? What does the UX for programming look like if you also have to specify where computations happen? It seems to make Bret Victor's problem much harder. But following Alan Kay, perhaps we don't yet have the "hardest and most profound thing to then build every easier thing out of". I'm also thinking about Dave Ackley's https://movablefeastmachine.org. He started out in security, so he likely knew all this when he started out on his project. You can't secure your computing substrate if you aren't thinking about 90% of the computers in it. Anyways, likely the fever dream will break in a day or two and I'll go back to lying to myself programming my little computer. But I thought I'd throw this out while it's fresh.
๐Ÿ’ฏ 1
๐Ÿ’ก 2
n
I'm definitely of the opinion that distribution should be built into the semantics of every programming language (actors etc.), rather than exposed as a library. That suggestion probably isn't too controversial within this community ๐Ÿ™‚
๐Ÿ’ก 2
s
There are all the little systems inside our computers that are somewhat hidden from us, because we usually โ€” even as developers โ€” don't get to access them directly. The system takes care of that for us, and we get to benefit from faster image (de)compression or video de-/encoding or disk encryption etc. with API calls that look just like any other API call, but hook into these other processing units offloading some work from the CPU. But that's just half the picture. We also more and more program different processors explicitly, with completely different paradigms: CPUs, GPUs, TPUs, โ€ฆ If your OS lights up pixels on a screen, but doesn't utilize the dedicated graphics processing capabilities of your hardware, you are not using your system "correctly". Or at least as effectively and likely also efficiently as you could (should?). And it's only going to get worse, because this complexity is not going to go away. Now with Intel under extreme pressure the times of standardization are over. Sure, everybody will have SoCs soon, and it looks like a form of simplification. But market forces push everyone to differentiate, which will only lead to more complexity on that level. There might be a swing back to the opposite in the far future, when the landscape is much more fragmented than it is today and the commercial benefits of diversification have been used up so that consolidation can kick in again. Until then, it's going to be messy for a while. I wonder if projects with the goal of simplifying the stack are better off building against a virtualization layer instead of trying to keep up with the further diversifying landscape of hardware architectures. Basically, bring your own (virtual) machine (and perhaps even instruction set), and let the virtualization/emulation figure out translation to actual hardware instructions. You'll likely do exactly that for testing already anyway.
๐Ÿ‘ 1
r
What does the UX for programming look like if you also have to specify where computations happen?
I love this question. I think this was one of

the fundamental questionsโ–พ

that drove the creation of the Unison Language iirc.
๐Ÿ’ก 1
In another life, when the first generation of Nvidia GPU's that supported CUDA were released, I was working under a National Science Foundation grant to port some algorithms related to GIS (among other things) to run on GPUs. The biggest problem was getting data from the CPU to the GPU and back. GPU's had relatively small onboard memory at the time, and it was very easy to saturate the PCI bus when trying to move the data back and forth. It proved to be a huge bottleneck, and really broke any illusions of "a single machine."
๐Ÿ’ก 1
s
Perhaps all programming is distributed programming?
Yes, yes, yes! Its just that we have established models and tools to create a "single computer" abstraction out of certain kinds of smaller computers, e.g. the "PC" bundle. One way I look at this is to see a bundle as having observable, consistent states and "inner bundles". Consider this statement in a C like language:
i = j + 1
This "C-system" (language + runtime) executes the above statement - you have values for i and j, then after the step, you have a new value for i. To simplify you have the "state before" and "state after". But if we look at the same system as a collection of "inner-systems", the transition consists of various nested states such as moving bits from memory to a register, an operation on the register, then moving it back to anther place in memory. Now the question is: if the first MOV has happened, but addition has not, what is the state in terms of the C-computer? It's not inconsistent, maybe it's in transition? Certainly it is on its way to being consistent and visible in outer system. We can go deeper than MOV - a register isn't a single physical array of memory in the CPU (neither is a "location in memory") - theres caches and other kinds of inner-systems all the way. One observable transition in the outer system contains many transitions in the inner systems.
What does the UX for programming look like if you also have to specify where computations happen?
This has been on my mind too and I think for certain parts of programs, we specifically don't want to specify where the computation happens. This is what I sometimes call topology independent programs. Note that location presumes a addressing space. Above, in the C-system, there is no location (the C-system is whole world) and when we zoom into it, we see locations such as cpu, ram, AX, BX or whatever. The compiler made that mapping for us - it organized the inner systems to appear as if its one outer system. (But we could have too, and there are many alternative locations and ways to map that would have all worked out just fine). However if we zoom out from the C-system to a distributed database (or whatever it is part of), the single C-system is now one location in a larger address space, but the database-system will have location independent concepts like a table. I think to specify where computation happens, we need to build this idea of mapping the outer-system meaning to the inner-system meaning and distinct address spaces. To do this in a general way would mean we can always scale out to the next level using the same mapping ideas as we did at the previous step out.
๐Ÿ‘ 1
g
this is a great thread and iโ€™m mostly commenting to get notified, but the one thing iโ€™d like to point out is that for whatever reason, itโ€™s pretty easy to forget (or never learn) that all these chips are working together. itโ€™s pretty rare that just the microcontroller for your ram stick needs to be restarted, and itโ€™s even rarer that a failing stick corrupts your word documents. are there any system design lessons we can learn from that? why is it easier to make these smaller and cheaper components reliable than it is to make servers reliable?
๐Ÿค” 1
k
The single-process serial computation (aka "Turing machine") is a useful abstraction. It corresponds nicely to how people perform symbolic manipulations by hand. It's also much easier to reason about than distributed computing models. Given that the Turing Machine abstraction won't (and shouldn't) go away, let me rephrase the topic of this thread: should it be the dominant abstraction for developing software system? Certainly not. Perhaps the mistake is to consider it a more fundamental abstraction than distributed models, and try to implement the latter in terms of the former. Perhaps it should be the other way round: implement the Turing Machine as a high-level abstraction on top of a more powerful one, as a useful simplification to use when appropriate.
๐Ÿค” 1
๐Ÿ‘ 1
k
@Konrad Hinsen Does Erlang (aka Actor Model) feel like a more powerful model? There's also the Smalltalk model of an internet of computers.
k
@Kartik Agaram I have next to no real-life experience with distributed computing, so I'll leave such questions to others. Everything I have personally used sucks (that's mostly MPI for parallel scientific computing).
๐Ÿ‘ 1
j
+1 with @Konrad Hinsen on the terribleness of MPI. I like Actors and pi calculus, but for networked things you really need to build for both component failure and network partition. Erlang is the best thought out thing in production that I've seen in this regard.
n
@Kartik Agaram I think Smalltalk's original model (going back to Kay's 1-page description of Smalltalk) is that each object in the system represents a mini-computer (or better a computational process). I recall him writing somewhere that even by the late 70s he felt Smalltalk could be done better, and the official Smalltalks had reified too much of the interesting bits to be able to evolve.
๐Ÿ‘ 3
To @Stefan's point, if more SOCs exposed the microcode of the system, it would be possible to defer 'reification' of software to hardware until the last minute, providing a lot more flexibility. So you could imagine a VM that could customize the processor it was running on to optimize the way it ran, ie instead of JIT'ing the running of the VM, you reify the operations the VM needs to run efficiently down to the hardware level. From Kay: "Another example: all the Parc personal computers of various kinds were microcoded, and in the latter 70s you could sit down to any of them to e.g. run Smalltalk, whose image would be sent from a server, and the machine type would select the special microcode for the Smalltalk VM on that machine."
โค๏ธ 1
๐Ÿคฏ 1
๐Ÿ˜ฎ 1
That quote was from a comment to one of his answers on quora starting with "We need to note that the plan": ` https://www.quora.com/How-old-is-cloud-computing/answer/Alan-Kay-11?comment_id=179237241&comment_type=2
I think Alan made a comment that's stuck with me for a long time - hardware is just reified software
k
I actually sent him my OP -- and he responded among other things with that exact quote ๐Ÿ™‚
n
ah would love to see his response, if it doesn't feel intrusive to share
i have a bot that checks his quora feed, i've been devouring his answers and follow-on comments. Always feels like I'm getting a small peak into the blue plane. I wish that he could one day release a book or even a set of his collected writings in one place.
โค๏ธ 1
k
Actually it was only one other thing:

https://www.youtube.com/watch?v=AnrlSqtpOkw&t=135sโ–พ

Which was interesting to rewatch in this context.
n
Yep, I love that version of Smalltalk. I've been trying to see if I could get it running baremetal on a raspberrypi and give it to my kids and see what they can do with it
๐Ÿคฉ 2
k
I'd love to hear if you do.
k
That video is depressing, considering that what we have today in terms of "personal dynamic media" is at the same time more complicated and more limited. As far as I can see, even today's Smalltalks (Squeak, Pharo) don't support this.
โž• 1
n
@Kartik Agaram sure, will post here when I get it working. I'm using https://github.com/michaelengel/crosstalk as a base, with modifications to support the smalltalk version used above
s
I don't believe it is particularly useful to equate heterogeneous computing with distributed computing and use the same abstractions (especially something as high level as an actor model) because the constraints and failure cases tend to be wildly different
๐Ÿ‘ 1
I don't know of any consumer hardware that will continue to run if say, its GPU just dies
but in a true distributed system (you usually) care about robustness for potentially many machines that have much different uptime requirements than a consumer device
r
@Scott Anderson I'm going to play devils advocate here ๐Ÿ˜ˆ
I don't know of any consumer hardware that will continue to run if say, its GPU just dies
I have a direct counter example: I have two GPU's in my laptop, the one bundled with my CPU, and an "external" Nvidia card. The Nvidia card regularly has driver failures and my laptop falls back to the on die graphics card. This is almost seamless, and very much a distributed system. OS kernels handle these kind of driver failures all the time, and they are conceptually very similar to a distributed database that experiences a network partition. Some more extreme examples are Plan9 "everything is a file" and EMACS TRAMP. Both are examples of abstractions that cross both file system and network boundaries with similar semantics and error cases. referring back to my earlier story about my National Science Foundation work, there was one group of grad students working on GPU's, and another group working on distributed graph algorithms on AWS. We were part of the same umbrella project, and regularly exchanged ideas about how to solve problems. The constraints and failure cases are not wildly different at all imo.
๐Ÿค” 1
s
Sure, and there are ways in modern graphics APIs to explicitly manage that there are multiple GPUs. The thing is it's very rare that they have symmetrical capabilities, so if you have a work load that actually needs that Nvidia hardware and you switch to intel integrated graphics, you'll have a worse experience. A language with an actor model that abstracts heterogeneous hardware and treats it as a homogeneous distributed system won't really solve that
And making the cases where it doesn't matter easier doesn't really help anyone I guess? I mean potentially there is an interesting abstraction of a "command buffer" (a circular ring buffer that could look a little bit like messages in actor model systems) and an actor could be an explicit piece of defined hardware that has specific capabilities (both fixed function and code) but programming models and capabilities of different hardware are different enough that I'm not sure how useful it would be
Maybe there is a interesting use case where you can use local or remote compute for the same task and its entirely invisible (don't have a GPU and it automatically uses a cloud GPU)
Also device hang\device lost due to a driver crash is not the same as an actual hardware failure. The OS will recover from that, and as a system wide resource it doesn't practically make sense for applications to handle that. In some cases (most of the interesting ones) a discrete GPU failing and falling back to an embedded GPU is effectively a not working computer. Laptops switch regularly but desktops will by default have embedded GPUs turned off, and don't have the same switching capabilities. Consoles and smartphone generally have one GPU as part of an SOC, etc.
The reason I bring up command buffer as an abstraction is because new consoles have custom I/O hardware that uses a GPU style command buffer model for I/O requests, with the assumption that modern games will be making I/O requests at a similar rate they make draw calls. Game engines also generally use command buffers to send commands to a render thread (running on another core on the CPU) to drive rendering on the GPU. It works great when you are mostly pushing data/commands to another device, not so great for frequent two way communication though, because this pattern tends to be used in systems with high-throughput and high latency (relatively speaking, high latency for a GPU is still really low compared to an HTTP request)
r
I don't believe it is particularly useful to equate heterogeneous computing with distributed computing and use the same abstractions
Using the same abstraction does not mean requiring the same constraints. And continuing with graphics as an example, things like feature flags for OpenGL are used all the time to provide different feature sets depending on hardware capability. Similarly, Erlang and most actor models have ways to query the available resources for a particular process. Smalltalk has this as well.
In some casesย (most of the interesting ones) a discrete GPU failing and falling back to an embedded GPU is effectively a not working computer.
It is a matter of opinion about what is interesting... The fact that I can still use my laptop if one of the GPU's fails seems pretty damn useful to me personally, even if the experience is degraded. In the network case, variable bitrate video is also extremely interesting. If my network is suddenly over saturated or degraded, I used to not be able to play a video on Youtube at all, or my video call would be disconnected. With variable bitrate streaming, the quality is simply lowered.... Same abstraction, that has built in semantics for heterogenous capabilities (dynamically changing capability btw)
s
Not all applications can gracefully downgrade, and a ton of effort goes into supporting that. For example, Cyberpunk 2077 got delisted because it actually wasn't playable on last generation console hardware. If your discrete GPU dies and you're playing the game on PC it will be unplayable (either functionally or actually due to lack of capabilities). You could apply the same logic to machine learning training and other GPU tasks that require high compute. Also its great that your laptop keeps running, but I'm not sure how a distributed systems approach in a user land application programming language fixes that? Maybe the idea is it could make it easier for driver and kernel developers to implement local failsafe?
r
Not all distributed databases are usable if the network failure is bad enough, That doesn't mean the abstraction is wrong. There are always failure cases that cannot be recovered from. It does not mean using an abstraction of heterogenous and distributed computing is a bad model (your original point iiuc).
Maybe the idea is it could make it easier for driver and kernel developers to implement local failsafe?
This is exactly what things like micro-kernels and Docker do. IPC is a form of distributed abstraction.
I'm not sure how a distributed systems approach in a user land application programming language fixes that?
The point of exposing these to the user application level is that the application can have a say in how the failure is handled. Cyberpunk 2077 maybe can't run if the GPU fails, but Microsoft Word still can. You want to allow both options.
s
I guess I'm stuck on... What do we get if we treat a local machine like a distributed system? What features of distributed programming are generally not necessary for local hardware? GPU programmers do have to handle device removed (https://docs.microsoft.com/en-us/windows/uwp/gaming/handling-device-lost-scenarios), which could mean try to reinitialize and reload, and that could mean using a different device, but it also could mean crash or quit. I'm focused on GPUs because I'm a graphics programmer, but I imagine you could apply this to any specific hardware (storage, audio, etc.) Maybe the issue is this all handled at OS level, should be to implement the OS in a different way rather than try to abstract over all OS APIs?
r
Similarly for security models. In docker or CGroups situation, you may or may not have access to a system resource depending on the security policy. Your application can decide how to handle that. Your database may try to set up a database on a certain file system, but if it doesn't have access, maybe it falls back to a memory backed store.
I'm focused on GPUs because I'm a graphics programmer
I'm a database programmer, so this is fun to compare perspectives ๐Ÿ™‚
I guess I'm stuck on... What do we get if we treat a local machine like a distributed system?
Maybe the issue is this all handled at OS level, should be to implement the OS in a different way rather than try to abstract over all OS APIs?
Microkernels are the best practical example I can think of that try to do exactly this: http://www.microkernel.info It's more about security and reliability. It does have a performance cost. (Which I'm sure sounds like pure pain to a GPU programmer lol)
Your example about cloud GPU is already here as well, That's basically what Google Stadia is.
k
When I started the thread I was thinking about all the ways in which non-programmable processors (built out of programmable components!) hide inside our computers. It's a place for bugs and especially security issues to hide. It might be an interesting exercise to ask how we might bootstrap a computer from a single tiny bit of RAM, either by building hardcoded circuits or requiring an upstream device to initialize them. I said "distributed computing" just because it was the closest term I could think of, but it did pull in unintended connotations that it's been interesting to see explored ๐Ÿ™‚
r
It might be an interesting exercise to ask how we might bootstrap a computer from a single tiny bit of RAM, either by building hardcoded circuits or requiring an upstream device to initialize them.
This sounds like the idea of a microkernel taken to the level of firmware / BIOS / UEFI. Sounds like an awesome research project to explore ๐Ÿคฉ
I'm also reminded of a podcast I listened to a while ago: https://oxide.computer/podcast/on-the-metal-3-ron-minnich/ Google does some of this kind of thing on their servers, and also on chrome-books with U-root. (Trusting trust type stuff at the firmware level.) It's frustratingly difficult because the hardware vendors do not want this. It has taken companies like Google, Facebook, and Amazon to strong arm them into opening up their firmware more....
Also, Alan Kay's quote as mentioned by @Naveen Michaud-Agrawal
"Another example: all the Parc personal computers of various kinds were microcoded, and in the latter 70s you could sit down to any of them to e.g. run Smalltalk, whose image would be sent from a server, and the machine type would select the special microcode for the Smalltalk VM on that machine."
I remember talking to @Kartik Agaram about this exact quote a while ago. The closest modern equivalent we have is probably FPGAs.
๐Ÿ’ฏ 1
s
Yeah I was probably being way too specific about what the definition of distributed system is now that we're explicitly talking about distributed systems within hardware. Back to the original post, what I know of FPGA architecture, it's laid out in a similar way as MFM
๐Ÿ‘ 2