Kartik Agaram
Consider the powerful, time-honored environment that gives us many โsmall programs, each doing one thing wellโ, the Unix shell. There is acommand, acut
command, and many more. A versatile collection of blocks that I can snap together in different ways (yay pipes!). There isnโt much duplication of commands and the environment seems to have nice composition properties.sort
But it only goes so far.
If I write a program in Unix using Java or Python, can I reuse the Unixto sort an array of items inside my program? Of course not, what an improper question! The decent choice is to reimplement sorting in my program (or use the standard library where someone else has already re-implemented it).sort
The computer already knows how to sort things, why do I need to tell it again?-- @shalabh (https://shalabh.com/programmable-systems/on-composition.html) From the inventor of shells:
I felt that commands should be usable as library subroutines, or vice versa. This stemmed from my practice of writing CTSS [OS] commands in MAD, a simplified Algol-like language. It was much faster and the code was more maintainable than IBM 7094 assembly code. Since I needed MAD-friendly subroutine calls to access CTSS primitives, I wrote in assembly code a battery of interface subroutines, which very often mimicked CTSS basic command functions. I felt it was an awkward duplication of effort. However, I did not go further in the context of CTSS.-- Louis Pouzin (https://multicians.org/shell.html)
Emmanuel Oga
11/26/2020, 3:14 AMEmmanuel Oga
11/26/2020, 3:20 AMJack Rusher
11/26/2020, 9:10 AMcurious_reader
11/26/2020, 9:20 AMEmmanuel Oga
11/26/2020, 9:52 AMChris G
11/26/2020, 4:57 PMAndrew F
11/26/2020, 10:10 PMEmmanuel Oga
11/26/2020, 11:59 PMEmmanuel Oga
11/27/2020, 12:02 AMEmmanuel Oga
11/27/2020, 12:03 AMfind
, i'm looking at you)Emmanuel Oga
11/27/2020, 12:05 AMpowershell
went in the right direction but after trying really hard to like it, I still don't feel like it has very good ergonomicsKartik Agaram
I can remember the exact instant when I realized that a large part of my life from then on was going to be spent [doing this].I've been grinding my wheels for the past week because I've been reading about how to print floating-point numbers in decimal. Absurdly, ridiculously difficult as it is, the time has gone less in figuring out how to do it and more in coming to terms with the irreducible difficulty of it. No, this shortcut doesn't work. No, we can't avoid bignum support. And on and on. No, reading this one more research paper isn't going to make the problem magically simpler. OP seems similar. Yes, it's actually not that hard to reuse code between shell scripts and standard libraries. All you have to do is specify for every function call: * Where to look for the function. (Is it a source file? A dynamic library somewhere in the path? A database? A registry somewhere online?) * How to look for the function in that source. (An address in RAM? A mangled name? A URL? A file handle?) * Where to look for each input, and how to look for it. (An address in RAM? A file handle to load lines asynchronously from? How to unmarshall?) * Where the computation must run. (Same process? A new child sharing nothing? A threadpool? A coroutine? Some computer on the internet?) * Where to send each output. (To a register? Memory location? In-memory FIFO? File system? Database table? API endpoint on some server on the internet?) Given the infinite detail of reality (http://johnsalvatier.org/blog/2017/reality-has-a-surprising-amount-of-detail), each of these questions can be answered with arbitrary, Turing-complete code. Still, totally doable. You just have to be willing to be verbose. At every single call site. No? Well, if you aren't willing to be verbose, by Shannon's law of entropy you have to trade off flexibility for concision. Live within the life-boat of a shell script, and it will impose the standard policy called, "`execve()` all the things". Live within the life-boat of a single language, and it will impose the standard policy called, "push args on the stack and call." Or something like that. It's certainly possible to explore other points on this spectrum. For example it might be quite ergonomic to unbundle an OS to the point that any synchronous function call
foo()
forks a process in an isolated address space and receives its inputs asynchronously over a channel simply by calling it as foo!()
(or spawn
, or go
, in a maze of subtle differences in semantics). But it seems clear you have to accept a limited menu of choices. Languages that share this menu can interoperate. Languages that make distinct choices will have to "be told again." Regardless of what you choose, I submit that having to specify the algorithm again is not that big a deal next to these problems of namespace management, mutable stores and marshalling/unmarshalling. Computation is always the easiest part of any non-trivial program.
Interesting question. It's clearly one Louis Pouzin grappled with, and I see a day later that I've dealt with it multiple times in myriad guises. It's a question my brain is designed to keep butting heads against every so often.wtaysom
11/27/2020, 3:00 AMKartik Agaram
Emmanuel Oga
11/27/2020, 3:08 AMEmmanuel Oga
11/27/2020, 3:13 AMwtaysom
11/27/2020, 3:14 AMKartik Agaram
Jack Rusher
11/27/2020, 8:04 AMcurious_reader
11/27/2020, 8:50 AMshalabh
11/27/2020, 10:54 AMI feel like shells could work closer to IDEs, with inline documentation, autocompletion and solid language oriented tools, it doesn't feel like scifi, it feels doable.@Emmanuel Oga - have you looked into the Genera Lisp Machines? It comes very close to what you are describing. Each "function" is available at the "shell" (they call it something else, I think "listener"?). When you invoke a function from the listener, it does have autocomplete. E.g. if it accepts a file, all the filenames on the screen will "light up" and become clickable. This works because fundamentally it is not dumping streams of text on a tty, but attaching presentation objects on the screen - so each visible presentation is linked to the backing object and its type is available to the system. There are some details in this short demo: . You can also introspect and jump to source from the listener. I linked to a twitter thread about some more features: https://twitter.com/chatur_shalabh/status/1213740969201262593. Also check out http://lispm.de/genera-concepts (section "Data Level Integration")
shalabh
11/27/2020, 11:06 AMfor all nice things people say about the wisdom of unixSome say fewer nice things than others ๐. I'm starting to think the primary purpose of an OS is providing composition models which subsumes the abstract objects they provide. E.g. the Unix models are the C ABI, bytestream pipes, command shells and the rest are built on top of these. The concepts of processes and files sort of live within these models.
Emmanuel Oga
11/27/2020, 11:15 AMJack Rusher
11/27/2020, 1:59 PMโI liken starting oneโs computing career with Unix, say as an under- graduate, to being born in East Africa. It is intolerably hot, your body is covered with lice and flies, you are malnourished and you suffer from numerous curable diseases. But, as far as young East Africans can tell, this is simply the natural condition and they live within it. By the time they find out differently, it is too late. They already think that the writing of shell scripts is a natural act.โ
โ Ken Pier, Xerox PARC(taken, of course, from The UNIX- HATERS Handbook wiki, pdf)
Garth Goldwater
11/27/2020, 3:50 PMChris G
11/27/2020, 4:37 PMBreck Yunits
11/27/2020, 10:09 PMEmmanuel Oga
11/27/2020, 10:27 PMEmmanuel Oga
11/27/2020, 10:40 PMserialized . deserialize = id
, which may seem obvious but in fact is not trivial.
I guess there's a reason unix people went with plaintext, they were basically kicking the problem to someone else to deal with ("do the simplest thing that works" and such).Andrew F
11/28/2020, 12:07 AMGarth Goldwater
11/28/2020, 5:58 PMshalabh
11/28/2020, 8:11 PMYes, it's actually not that hard to reuse code between shell scripts and standard libraries. All you have to do is specify for every function call:
<snipped a long list of mechanism details pertaining to function invocation>here:
Shannon's law of entropy you have to trade off flexibility for concisionand here:
"People tried to solve problems. The problems turned out to be more difficult than expected. People came slowly to terms with how difficult they were."Funnily I kind of agree with Kartik but only while we stick to the current models of abstraction. I'm going to use the following analogy to describe how I think about this. Imagine we are all "electronics people" making circuit boards with LEDs, batteries, wires, switches etc. We send signals around encoded as voltage-on-wire or even FM. Note that voltage-on-wire serves a higher purpose (e.g. a specific voltage on a line may represent the "turn this light on" signal). We create wired-ports/FM transmitters/recievers to hook up multiple such gizmos together. Unfortunately we all use different conventions for how the voltage levels or FM bands correspond to the the signals we share. So most of our integration work is spent building adapters - the simple ones might just step up/down the voltage. The complex ones may receive a signal encoded as FM and re-encode the same signal as voltage-on-wire, or even re-transmit it on another FM band. There's nothing wrong with this because this is reality. OTOH, we can also think of making things better, or even doing something different. Here, and this is key, I think there are two broad paths: 1. Standardize. If everyone uses 5V and the same FM band and the same "encoding" of the shared signals, we can more easily plug things with each other. We're still electronics people, doing mostly the same thing, but better and easier. 2. A new level of abstraction. Say we completely stop thinking about voltages and think in a new level of abstraction called "bytes". This is the idea of software. I'd say we're now doing a fundamentally different thing - we now are "software people" and no longer electronics people - because we are absolved of concerns about voltages and such things. The voltage didn't disappear, but don't matter directly to us. In fact many different physical contraptions may use very different physical mechanisms to represent the same bytes, but from our abstraction level, we still think of those disparate systems through a unified model of "bytes". Now the next chapter.. bytes are themselves problematic in ways that resemble voltage and FM band mismatches. The thing is bytes are used to represent "higher purpose" signals and messages that we send around - so encoding matters, again. The first approach is "standardize the encodings". This is very popular (cue this history of byte formats). But is there a second approach which would make bytes irrelevant to us? What are the new concepts we would think and design in? We may end up with many different byte level encodings for the same messages we send around but we will have a new unified model of design across all of them. We will no longer be "bytes people", so what will we be? (BTW, eventually we could forego bytes completely in some cases and map directly to voltage or other physical representations.)
Emmanuel Oga
11/28/2020, 11:45 PMshalabh
11/29/2020, 12:51 AMshalabh
11/29/2020, 1:09 AMAndrew F
11/29/2020, 3:34 AMshalabh
11/29/2020, 6:23 AMIn the general case, I suspect your "encoding grammar" is equivalent in complexity to a general purpose programming language.I agree.
I think of this as "encoding" vs "intent".You've put this really well. I think of this as "meaning", which only exists in our minds, and "representations" and "mechanisms" which exist outside the mind.
I don't believe you can really get higher-level than "arranging bits" without confronting this non-physicality.If you're saying that irrespective of the encoding, the machines can only hold signifiers and all mechanisms in the end amount to just transmitting arbitrary symbols (bits are just a sequence of symbols "1", "0") I must agree. But consider that we spend a lot of effort designing and manually implementing the mapping between multiple vocabularies of such symbols - all of which are entirely in the machine. So there is the possibility of removing this extra work. (BTW, @Konrad Hinsen and I have a discussion on this topic on my blog post).
avoiding encoding things that are
not intended but merely side effects of the encoding a user is using to
convey their intent to a computerNicely put. In some sense these are encoding artifacts. Looking forward to see where you take these ideas.
Andrew F
11/29/2020, 8:51 PMSo there is the possibility of removing this extra work.
Perhaps the closest thing to a disagreement is this: it would be more precise to say that we're only doing the work once, rather than not doing it at all (and we want to let a compiler handle the details of hooking up encodings to business logic). Probably this is already what you were thinking, but I like to be explicit that the essential complexity is still there. I think we should keep in mind that any "solution" to this problem of lots of encodings needs to be imagined in a world that still seethes with conflicting formats and protocols. Even besides the mess of existing formats that have zero chance of going away, different encodings have different useful properties, notably performance under various queries (also space efficiency, error resilience, etc). You wouldn't try to back a relational database with the same encoding you send over a network. I'm looking for a tool to navigate chaos more than tame it (at least in a global standard sense).
Emmanuel Oga
11/30/2020, 12:22 AMKonrad Hinsen
11/30/2020, 10:47 AMshalabh
11/30/2020, 6:31 PMPerhaps the closest thing to a disagreement is this: it would be more precise to say that we're only doing the work once, rather than not doing it at all (and we want to let a compiler handle the details of hooking up encodings to business logic).I think that's fair. There's definitely some work we'll always need to do and some complexity in the way. BTW there's an old thread which disputes that there's a sharp distinction between incidental and essential complexity. Hopefully this link to the search history works.
Andrew F
11/30/2020, 6:53 PMshalabh
11/30/2020, 7:14 PMKonrad Hinsen
11/30/2020, 7:47 PM