An essay on the challenges of image based/live sys...
# linking-together
j
An essay on the challenges of image based/live systems: https://www.scattered-thoughts.net/writing/there-are-no-strings-on-me/
k
This is outstanding, @jamii!
j
Ah yeah, I would’ve tagged had I remembered he joined here.
k
Nice! This reminds me of another post I saw yesterday, on what you need to keep in mind when doing interactive development in Lisp. It's quite a bit, but also because Common Lisp is a rather complex language. Wondering:
A saner option is to recompile and reload the entire codebase whenever a change is made, while preserving the state of the heap
Are there any real-life implementations of that idea?
j
Are there any real-life implementations of that idea?
Sort of erlang. It's designed for upgrading production services so it has a lot more toggles and manual controls than you'd want for interactive editing, and there's no undo if you screw up the migration. But it does work fairly well. Certainly compared to everything else. I also see a lot of game developers do this in C and it seems pretty useful even with the complete lack of type safety.
Also tagging me did nothing, someone had to manually tell me about this thread. Is sending notification emails a paid feature on slack?
j
I know I’ve only ever seen notifications inside slack, or on my iPhone screen when I gave slack that permission. So it’s not a default that it does it.
j
I have set the preference though. It just doesn't seem to work. Makes slack much less useful for occasionally visited communities.
k
It's definitely a bug.
s
Excellent essay! I believe the issues described with live systems are not actually solved with dead systems, but just deferred until later. That’s because dead systems don’t often represent entire systems but only a slice of the system. Consider a system with a Rust (or another dead lang) web app backed by another web service and a db. One problem is evolving the web app code while preserving the db data and migrating the db schema. This is similar to evolving live code while the heap remains intact. Another problem is that upgrading the downstream service may change or delete an API method and it leave it incompatible with the caller web app. This is similar to upgrading a subset of functions within the same live program. Dead-lang approaches don’t solve any of these problems of “external couplings” and “externalized data” but simply exclude them from their scope, leaving us to wrangle these with other approaches. Live systems seems less tractable because these problems are harder, and more worthwhile (my position).
Another note on this:
For a start, you can actually find the code as a single artifact rather than it being the product of a log of mutations.
Some of this gets lost even in dead-langs when you’re dealing with macros, metaprogramming or subsclassing at some level of complexity. Not only do you have to simulate the runtime semantics in your head, you also have to simulate the type checker and compiler semantics.
Reading the essay, another idea that clicked for me was having a “version-aware runtime” - ie the system should be aware of the different versions of various artifacts (code, data, etc.). No dead-lang does this - there is only one version of all types.
j
Mentioning web apps and databases makes me think about how there are playbooks for migrations there—you (if you’re being careful) use monitoring, let both versions coexist, try to find callers using the old versions and encourage them to switch, etc. I wonder what the playbook looks like in a programming environment.
It’s possible to do monitoring in your runtime.
k
@shalabh
I believe the issues described with live systems are not actually solved with dead systems, but just deferred until later. That’s because dead systems don’t often represent entire systems but only a slice of the system.. upgrading [a] downstream service may change or delete an API method and it leave it incompatible with the caller.
This depends on the context. If you think of the upstream and downstream as within a common ownership boundary, then such checks are valuable. However, if they span ownership boundaries then these checks can seem onerous. You're right that dead systems work almost accidentally for the second case, by just not doing some work. Ideally we'd have both tools in our pocket, and the flexibility to select from them depending on the situation. However, this is a lot of complexity, and all code carries costs. So worse may be better here. This situation is analogous to structured editing like in @alltom's recent submission https://futureofcoding.slack.com/archives/C5T9GPWFL/p1700405882617679. It's easy for a structured editor to feel like an overbearing presence. The implementor has to juggle both technical complexity and UX nuance. Meanwhile plain text is often "good enough" and so we continue to muddle along..
s
It’s possible to do monitoring in your runtime.
Yeah, the key requirement is version-awareness - whether the endpoint name encodes the version or some introspection api returns the version of the service, any strategy needs a representation of the version of functions etc within the system itself, and should allow multiple versions to coexist. An interesting implementation here is gemstone smalltalk - it is a live system where you can evolve the schema by providing migration methods and the objects can be migrated to newer definitions either opportunistically or in batch.
@Kartik Agaram > Ideally we’d have both tools in our pocket, and the flexibility to select from them depending on the situation. 💯 I’d even add that ideally these would not be two separate tools but a single model that can be applied in varying use cases - from “early bound, validated” bundles of code to “late bound, but also validated” and in between. > However, this is a lot of complexity, and all code carries costs. So worse may be better here. Would this be more complexity than present day type checkers? 😜
k
I absolutely think so! You're adding a whole new dimension to the problem. What are you taking out?
s
Ah good point. If we keep all the existing static lang ideas intact and then layer on dynamic bindings, yes it’s a whole another dimension. However if we first reduce the language features (eg like the essay suggests, 2nd class functions only) and design the types and semantics of the language to be amenable to both static and dynamic bindings we want, then maybe we have a chance.
The problem as I see it is that dyanmic+live langs have typically not been designed to make validation or optimization easy, and static langs have not been designed to make late binding easy. I don’t think we want to total freedom of dynamic+live langs. We want some snapshotting of state, we want visualization and provenance tracking of all dynamically generated code. Once we start adding these we can introduce ways to validate and optimize the slices of the live system that have stabilized.
k
Throwing a somewhat related paper into the discussion: "World Age in Julia" http://janvitek.org/pubs/oopsla20-j.pdf More about metaprogramming than live modification, but the issues are similar. Abstract:
Dynamic programming languages face semantic and performance challenges in the presence of features, such
as eval, that can inject new code into a running program. The Julia programming language introduces the
novel concept of world age to insulate optimized code from one of the most disruptive side-effects of eval:
changes to the definition of an existing function. This paper provides the first formal semantics of world age
in a core calculus named Juliette, and shows how world age enables compiler optimizations, such as inlining,
in the presence of eval. While Julia also provides programmers with the means to bypass world age, we found
that this mechanism is not used extensively: a static analysis of over 4,000 registered Julia packages shows
that only 4–9% of packages bypass world age. This suggests that Julia’s semantics aligns with programmer
expectations.
j
Julia's behaviour around closures is among the nicest I've seen:
Copy code
julia> foo(x, y) = x + y
foo (generic function with 1 method)

julia> bar = x -> foo(x, 1)
#3 (generic function with 1 method)

julia> bar(1)
2

julia> @code_llvm bar(1)
;  @ REPL[1]:1 within `#3`
define i64 @"julia_#3_122"(i64 signext %0) #0 {
top:
; ┌ @ REPL[7]:1 within `foo`
; │┌ @ int.jl:87 within `+`
    %1 = add i64 %0, 1
; └└
  ret i64 %1
}

julia> Base.delete_method(@which foo(1,1))

julia> bar(1)
ERROR: MethodError: no method matching foo(::Int64, ::Int64)
Stacktrace:
 [1] (::var"#3#4")(x::Int64)
   @ Main ./REPL[1]:1
 [2] top-level scope
   @ REPL[6]:1

julia> @code_llvm bar(1)
;  @ REPL[1]:1 within `#3`
; Function Attrs: noreturn
define void @"julia_#3_133"(i64 signext %0) #0 {
top:
  %1 = alloca [2 x {}*], align 8
  %gcframe2 = alloca [3 x {}*], align 16
  %gcframe2.sub = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe2, i64 0, i64 0
  %.sub = getelementptr inbounds [2 x {}*], [2 x {}*]* %1, i64 0, i64 0
  %2 = bitcast [3 x {}*]* %gcframe2 to i8*
  call void @llvm.memset.p0i8.i32(i8* noundef nonnull align 16 dereferenceable(24) %2, i8 0, i32 24, i1 false)
  %thread_ptr = call i8* asm "movq %fs:0, $0", "=r"() #6
  %ppgcstack_i8 = getelementptr i8, i8* %thread_ptr, i64 -8
  %ppgcstack = bitcast i8* %ppgcstack_i8 to {}****
  %pgcstack = load {}***, {}**** %ppgcstack, align 8
  %3 = bitcast [3 x {}*]* %gcframe2 to i64*
  store i64 4, i64* %3, align 16
  %4 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe2, i64 0, i64 1
  %5 = bitcast {}** %4 to {}***
  %6 = load {}**, {}*** %pgcstack, align 8
  store {}** %6, {}*** %5, align 8
  %7 = bitcast {}*** %pgcstack to {}***
  store {}** %gcframe2.sub, {}*** %7, align 8
  %8 = call nonnull {}* @ijl_box_int64(i64 signext %0)
  %9 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe2, i64 0, i64 2
  store {}* %8, {}** %9, align 16
  store {}* %8, {}** %.sub, align 8
  %10 = getelementptr inbounds [2 x {}*], [2 x {}*]* %1, i64 0, i64 1
  store {}* inttoptr (i64 140053833609312 to {}*), {}** %10, align 8
  %11 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 140053835183152 to {}*), {}** nonnull %.sub, i32 2)
  call void @llvm.trap()
  unreachable
}
But it doesn't allow redefining types - you have to delete and recreate a module, which means that functions in the old module don't get redefined. And then you're back to having old closures hanging around.
k
Yes, types are the hard part. They are not part of data and code, but live in the control layer that structures data and code. I doubt there can be a generic technique to update types in a live system in all circumstances. But perhaps there are ways to handle specific practically important cases.
j
I agree with a lot of what @jamii says. I’d also amplify some of the above comments: versioning is the key unsolved problem of image-based/live programming