I'm currently learning about WASM+WASI, and I was ...
# present-company
x
I'm currently learning about WASM+WASI, and I was wondering: what's the big difference with the JVM, aside from the lack of a GC (which arguably is a pretty big deal)? Why couldn't we decide to adapt the JVM to the needs of WASM, and instead went ahead and created a whole new byte code language? I've searched online, but most answers are unsatisfactory, e.g. JVM sandboxing isn't secure enough (if so, can't we improve it?), JVM depends too much on Oracle (I thought OpenJDK had solved that problem), and so on. Curious to know if someone here has an interesting explanation or a good article on the subject 🤔
k
Most arguments I have seen for WASM are about being low-level, language-neutral, high-performance, etc. in comparison to existing "managed" languages. But I guess there's also a dose of neophilia. The JVM is 25 years old. It can't be the solution to today's problems because otherwise, tons of highly payed engineers would have been blind!
x
Yeah that was the feeling I got from a lot of stuff I read: people are excited about it because it's the New Thing!! Which doesn't mean it's not better, but the tone of the discourse makes it harder to find a more "objective" comparison with older technologies like the JVM.
I guess it's probably more fun to write a new standard than to try and "retrofit" an existing bytecode into something else
a
If you're aiming for security, it's a lot easier to build it in from the start than try to retrofit it. I'm guessing one does not simply improve the JVM's security.
j
Also, the realization that WASM is built for a new platform is important. Tying it to the JVM could mean that significant, early changes to WASM meant either quarreling with those who maintain the JVM spec (waste of time) or forking from it (and thus creating a separate WASM again, but likely worse)
👍 2
There are lots of great initiatives we're seeing in WASM, like better JavaScript interoperability and integration of a type system, that have the potential to enable smoother high speed software development for the web and ensure language interoperability with languages all sharing a target that isn't tightly coupled to a CPU architecture. Lots of interesting research that wouldn't be possible trying to lobby a legacy system. No GC is also huge. I don't think many people are saying the JVM is bad though : )
n
Wasm grew naturally out of the use cases of asm.js The fact that all the browser vendors saw value in asm.js and optimized their JS engines to support it was a big deal. I couldn't imagine the big 3 starting with a blank slate and agreeing on anything.
k
@Jacob Chvatal Very interesting allusions to the pragmatics of standards in your comments: 1. If you're going to fork, it can be more memorable to make something intended for a very different situation look different. "It's the same except this one little change" falls into an uncanny valley that can interfere with adoption. 2. If the goal is a standard people agree to not fork, starting from a fork perhaps sets the wrong example. Very interesting. I'm not sure I agree (why does a new standard not feel like a fork?) but you definitely got new neurons firing in my brain.
🍰 2
j
I think when a standard is established as a fork of another, especially for a compiler target, there are advantages to making the fork strictly a superset of the ecosystem so that tooling is easy to build (just fork the tools made for the original standard!), and because of this it's much more difficult to justify making a breaking change to the original spec than to justify adding new features. Starting from first principles, however, is a wild west - because every possible specification to fit the desired criteria will likely have a comparable time cost for implementing tooling, there are no deterrents from building precisely what's necessary and no excuse is required for breaking compatibility. As you mentioned, the marketing story is relevant - and unfortunately is one of the most important aspects of the decision. Example: I didn't touch JVM languages for years because I hated the clunky verbosity of Java, but Clojure's now the language I use the most!
💯 2
j
It’s still early days but one of the many future Wasm proposals (https://github.com/WebAssembly/proposals) is for a GC (https://github.com/WebAssembly/gc/blob/master/proposals/gc/Overview.md) to efficiently support various languages that expect it (like those on the JVM).
j
You have correctly ascertained that the WASM approach (shipping byte code to run in a client-side sandbox) is very, very similar to 1990s Java applets (shipping byte code to run in a client-side sandbox) — which I might mention was also the Inferno model. Some might further note that the history of the JS runtime has been: interpreted text -> compiled byte code -> JITing the compiled byte code -> shipping the bytecode to be JITed, and how this represents gradual convergence with the design and history of the JVM. It's almost as if the constraints of the design space dictate the correct solution ("always has been" meme goes brrr). But if that's true, haven't we essentially seen billions wasted on parallel efforts for entirely social reasons? I'll leave this here for your consideration: https://ideolalia.com/essays/standing-in-the-shadow-of-giants.html
🤯 1
❤️ 6
w
Inferno! 🥂
k
That's a great link. But social reasons don't seem less real than technical ones. I suspect all reasons are social, if you probe deep enough.
n
One reason the Wasm folks couldn't "just use the JVM", is that JVM bytecode contains specific instructions to do OOP things like instantiating objects and invoking methods, and so they'd be committing to putting OOP on a pedestal for another 20 years. I'm very glad they didn't.
j
@Nick Smith You can tell you're correct because there are absolutely no functional programming languages on the JVM. 🙄 I will also go ahead and a pinky-swear bet with you that Wasm will add equivalent instructions within ten years to support the representation and manipulation of JS classes and objects.
👍 1
🤣 1
n
@Jack Rusher I didn't mean to suggest that non-OOP languages can't be compiled to Java bytecode because it contains specific instructions for OOP operations. I meant to assert that a bytecode format that will be the basis for "the future of the web" should be a minimalist one, that maps to actual CPU instructions, not OOP idiosyncrasies.
👍 2
The Wasm committee is in the process of adding "reference types", a memory management (GC) system, and "interface types" (FFI) to the specification. They don't need any more than that to support/integrate with OOP languages, as far as I can see.
a
@Xavier Lambein you could probably improve JVM sandboxing, but providing a high performance sandbox without imposing a specific memory/object model is basically the whole problem. The JVM isn't designed for that, so using it as the foundation for wasm would just add a lot of incidental complexity.
🤔 1
❤️ 1
x
Hmm 🤔 I don't know enough about implem details of the JVM, but so your point is that this is easier to plan for when designing something from the ground up, rather than to retrofit it into the JVM?
a
yes, basically. it's hard to make something more general, when the original designers were clever people who took full advantage of the narrower specification they had
there are a lot of assumptions built into it which would just be stumbling blocks for someone trying to compile C++, or even just C#.
some examples: you can't create a composite type (like a struct) on the JVM without allocating it to the heap. the jvm has an inheritance model but you can't use it to compile c++ because c++ has multiple inheritance.
you can probably hack your way around these issues, but then you aren't deriving any value from the jvm starting point.
and the jvm doesn't provide you with good tools to hack features like this together because it's not designed for stuff like pointer arithmetic. it doesn't want to tell you what the address of anything is, or guarantee where things will be allocated.
the design of wasm is really more like someone fixed llvm to support sandboxing, rather than the jvm
k
The JVM supports native multi-threading, which is great if you want to improve performance by taking advantage of all of your cores.
x
you can't create a composite type (like a struct) on the JVM without allocating it to the heap
Yuck 😬
(hmmm idk how to quote on slack apparently 😅)
But yeah OK the comparison with llvm is interesting, thanks!
👍 1
j
Compiling unmodified C, C++, and Fortran code to Java byte code with gcc in 2004: http://www.megacz.com/berkeley/research/papers/nestedvm.ivme04.pdf (extra credit to them for writing the paper using a version of TeX thus compiled) The JVM has not been a static target over the last 25-ish years. Just in the last couple of years they've added records, sealed types, and experimental support for proper tail calls. On the social side, I knew people at Sun and Netscape back in the day, and that they didn't collaborate is not surprising given (mumble, mumble). Anyway, despite understanding the technical decisions made along the way by both projects, it really does look to me like yet another example of Freud's _Narzissmus der kleinen Differenzen_ in tech.
😮 1
🤔 1
a
@Jack Rusher yes, to be fair my explanation kind of glosses over the complex history. It's my very subjective opinion that the JVM is not the best foundation for something like wasm in 2021, but I'm not sure I would have said the same thing in the 2000s.
but i'm sure there are people in this community who would question the foundations of the internet much more deeply even than that. certainly any big fans of Alan Kay 😛
j
@Andrew Martin To be clear, I'm not suggesting that it would make any sense to jettison the JS engines that exist today and integrate OpenJDK into all popular browsers. My statements are relative to the tremendous quantity of duplicated effort over the last 30 years. I could continue this rant into the space of Ruby (MJIT, YJIT), Python, Perl/Parrot, LuaJIT, GNU lightning, emacs using GCC to compile bytecode to native, the new JIT written for Guile, and on and on.
👍 4
a
at a glance it's not obvious to me that much of this duplicated effort could be avoided or is a bad thing, so I'd be very interested to hear your rant in full sometime 👍
k
A critical retrospective review of those efforts would probably be interesting. Did the different efforts learn from each other? Were the reasons they quoted for starting from scratch justified, with hindsight? The fundamental question is how to best sample the search space for solutions of specific technical problems. I doubt we know much about that.
👍 2
j
I think generalized lessons about language/ecosystem design happen far too slowly, and aren't explicit enough (there's a lot of things people who have lots of experience working on PLs know that new PL designers might not know). One piece I like is Graydon Hoare's list of "features" Rust avoided: https://graydon2.dreamwidth.org/218040.html, though I know not every one of them would provoke universal agreement.
Thinking about Java, I think it's unlikely a 2021 language would: • have a mutable date class • synchronize objects on a method by method way the way Java vector, hashtable, or stringbuffer.
❤️ 2
Java's pervasive use of heap allocated objects is the kind of decision you might repeat today, but you'd hopefully be aware what the performance cost would look like (Java is working on fixing that wart).
k
That's a nice example! I wouldn't expect everyone to agree on any such list, of course, but they are still useful to learn lessons from the past. However, they would probably need to be more verbose, with references to the real-life problems that motivated each point. As for your confrontation between "Java" and "2021 language", I'd love to see people think about the next step: how to correct design decisions that turned out to be bad without starting a new ecosystem from scratch? Could e.g. Java be improved while (1) not breaking existing code and (2) permitting developers to gradually port a codebase to better foundations? Ideally (3) without adding new layers of complexity? The only example I know for such a deliberate attempt of backwards-compatible redesign was Fortran 90 and its follow-ups. Which has basically achieved the goals, with some compromises on (3).
j
Java is removing a few things, but it's slow going, and falls far short of a net reduction in complexity. It also hasn't done so in a way that never breaks existing code. • Date was superseded by a new date/time library in Java 8 that uses immutable objects. I suspect Date will not be removed within the coming decade, however. Hashtable and Vector have been superseded by other classes for over a decade, but still are in the standard library. • Wrapper type for primitives may be removed in the next few years. This requires removing some existing functionality, and is ongoing. It will most likely break some existing code, and that's been deemed acceptable. It does allow gradual evolution (prior versions support the idioms you need for future ones, old idioms have been deprecated before removal). (https://github.com/openjdk/valhalla-docs/blob/main/site/design-notes/state-of-valhalla/02-object-model.md). Of course, I'm not sure all those decisions are optimal, but it's an example of a widely used ecosystem trying to square the circle.
k
Thanks for those details, that looks like an interesting case indeed. Of course Java has the advantage of having a form of backwards-compatibility at the VM level. I guess one could compile old code with an old compiler and new code with a later one, perhaps at the cost of having to mess around when putting everything together.
k
Hashtable and Vector have been superseded by other classes for over a decade, but still are in the standard library.
Actually I think they were superseded more than two decades ago. Also, various IO libraries and two UI libraries: AWT and Swing. But I think you could argue that it's a feature of Java that you can still run code written a quarter of a century ago. If you're writing new code, just ignore the deprecated libraries.
Could e.g. Java be improved while (1) not breaking existing code and (2) permitting developers to gradually port a codebase to better foundations?
This is what FOAM does by adding a high-level modelling layer on top of Java (and JS and Swift). I feel that the design differences between most programming languages are the equivalent of shovel designers arguing over things like handle lengths and scoop geometries, when the real gains in efficiency would come from replacing all of your shovels with robot backhoes instead. For the typical client/server database-y app, developers are writing 50X too much code, and that's the real issue, not whether they're writing 48X too much or 52X too much. See:

https://www.youtube.com/watch?v=S4LbUv5FsGQ

w
More than two decades Map was preferred over Hashtable and List was preferred over Vector. Can confirm from having not written more than a few lines of Java in two decades.
k
@Kevin Greer Once you work on larger software systems, a programming language is always only one ingredient out of many. And as you say, the technical difference are often minor compared to the big picture. But for some reason, it's languages that all the public debate focuses on. Maybe one reason is that every project and every developer starts at a small scale, with languages taking center stage.
k
@Konrad Hinsen @Kevin Greer Totally agreed with both. Thinking aloud as I try to take in the story arc of this thread: just like we have blog posts of the form "language X for language Y programmers", I wonder if there's a market for blog posts of the form, "VM X for people familiar with the spec for VM Y". Doesn't roll off the tongue quite so well; the market is definitely much smaller.