this was so refreshing to read: <https://litestrea...
# linking-together
s
👍 1
i
This echos my philosophy as well - almost no one is building a system that really requires more than one machine. The belief that we are has caused us to add so much incidental cruft that if you use industry best practices, you actually do need multiple machines. Taking an extreme example, the entire write load of twitter is only ~6k tweets/s, which can easily fit on a single machine. As another, there’s a great paper from the hyper folks that showed you could serve all of the queries to wikipedia from a single instance. If you use them well, computers are insanely fast. The problem is just that we use them really poorly and because of the incentives around software, we’re more likely to go with our first poor solution to a problem than taking the time to come to a good one. This is how we ended up with things like Hadoop which are millions of lines of code to run 100+ node clusters that can’t beat a laptop.
💯 8
s
oh boy, this hits ^
yeah theres a lot to be said here… I think the move fast / startup mindset of building “just enough” software quickly / inefficiently until the next set of scale hits (and new set of engineers can wrangle the PGbouncer ) has been the modus operandi
the counter to this thinking is Jonathan Blow (and others like ALan Kay ofc).
Crave Cookie runs on Crystal + sqlite, <$300 a month: https://www.indiehackers.com/podcast/166-sam-eaton-of-crave-cookie
but this requires a slower, artisan attitude. Which is unfortunate…
I’m more on the data science side of things, and the layers of abstraction I keep encountering there is absolutely bonkers.
i
Yeah, the problem is that “fixing” this would require a whole lot of unlearning and re-education, which just isn’t likely to happen
s
TL;DR my take is that systems software engineering is just software engineering turned into risk management
yup
ultimately, new inventions tailored to new use cases are probably the only way these types of things go away. E.g. you can’t build “the next Google” by replicating what Google does, but worse. You have to build the thing that replaces the need for Google.com with a different solution
in many ways, true personal computing inventions could reduce the need for all this systems complexity (over time)
i
My hope is to flip the problem on its head and show that part of the problem is we didn’t actually go high level enough. Languages like python still let you specify too many details and those details are what prevent us from creating compilers that output extremely high performance programs by default. We can’t expect everyone to magically become great systems programmers, but giving them tools where a single integer add takes ~30 instructions is hell of a handicap to start with.
💯 5
Chris Lattner’s (LLVM, Swift) thesis actually talks a bit about this and his attempt to try and create a layout aware compiler
👍 1
rather than trying to convince anyone to unlearn stuff though, something like better compilers seems like a much more likely path to success
💯 1
it has to just be the default somehow
y
Well put @ibdknox @Srini K. I think this is a general problem of software development, not just “systems engineering”. Too much focus on trends / hypes / frameworks, which causes overcomplicated solutions for even the simplest cases
Thinking out loud: A higher level language would be a solution, but should probably work across the stack in order to really reduce complexity of entire applications
r
Fun related anecdote, the, original NASDAQ system ran on a single machine (using Java no less)! https://signalsandthreads.com/multicast-and-the-markets/#2923 It actually made things a lot easier for making sure transactions were correctly synchronized. Then you just had to make sure the network routed things well.
👏 2
s
@Yousef El-Dardiry yeah, I think the key really is having system level understand-ability. Even if a language was higher level but the abstractions it was built on are leaky / shaky, you’re in for trouble. Arguably Ruby / Python are higher level in some specific ways but systems understanding is poorer (hence the need for systems engineering / risk management / server-wrangling)
r
@ibdknox
My hope is to flip the problem on its head and show that part of the problem is we didn’t actually go high level enough. Languages like python still let you specify too many details and those details are what prevent us from creating compilers that output extremely high performance programs by default.
This reminds me of the Haskell ethos a bit? A language designed to describe pure computation at an extremely high level, then let the compiler figure out efficient ways to run those computations. The language is so high level that it didn't support the very concept of side effects (causing them to invent/discover the monad...) The result is that the GHC compiler truly is a marvel of modern engineering. But (devils advocate time), It took 30 years of the most advanced CS research to get here, and it still has extremely unpredictable performance gotchas in some places. Maybe it's because I've been jaded by our current world, but I can't see "higher level" as a solution. Mainly because "creating compilers that output extremely high performance programs by default" is actually a really hard problem for the general case. Part of the problem being: there is no single general case. @Srini K makes the point, a higher level language often results in poorer systems understanding, not greater. Or to put it another way, all abstractions are leaky eventually... How do you handle it when the leaks happen?
☝️ 1
i
Haskell focused much more on things that are counter to what would make something understandable and fast, e.g. laziness.
😮 1
There are a handful of patterns we know to have very good performance by default, the question is can we you create a language that can be compiled to those patterns and whose idioms feel natural to the user, such that you’re not constantly trying to go around them (side effects in haskell being a good example).
Another approach is to separate the specification from the layout ala relational DB world.
💯 1
the logic of a program doesn’t have to bake in access paths, that’s just an artifact of the way our languages evolved. If we didn’t have pointers, arrays, etc, we could have the same kind of logical/physical independence.
💯 1
➕ 1
A slightly less ambitious version of something like this is the DOTS effort in Unity, where following a relatively small set of conventions allows them to employ a specialized compiler to make your games much faster than what they were on the traditional stack.
r
First, I have to fan boy out a little. I followed Eve for a long time, and I'm a fan of your work. 😄 Thank you. It's an honor to argue on the internet with you 🙂
👋 1
😄 1
ok back on topic: SIMD intrinsics / vectorization is another low level example of this sort of thing. I think it works great when the scope is limited. That's why I'm a fan of the "DSL" approach. Many tiny languages that are all good at different things but have a way to communicate. I don't want a swiss army knife, I want a tool box.
i
SIMD is actually a great example - it is exceptionally difficult to automatically vectorize because the semantics of most languages force your layout to be exactly as you described it.
Imagine for a moment you had a compiler that could make layout choices based on the way data is actually used. E.g. in the case of something like a physics simulation where you’re constantly doing math over a series of objects with x, y, vx, vy, we could choose to store those as x[], y[], vx[], vy[]
now vectorization is trivial
(the reality of something like that is actually even more interesting, to go as fast you possibly could, you’d actually want to tile them as something like xxxx,yyyy,vxvxvxvx,vyvyvy for better cache utilization)
doing that everywhere would be nuts, it’s so much incidental bookkeeping that no one in their right mind would do that transformation across their whole system and our compilers have their hands tied because layout is explicitly part of the semantics of systems languages.
ideally you’d want something where you can mostly leave layout to the compiler, but provide a separate explicit specification of it if you really need to. (e.g. where layout has semantic meaning, like a tcp packet)
r
I see where you are heading. Something similar already happened for register allocation when we moved up from assembly language. You don't layout your own registers anymore. The compiler has fancy graph coloring algorithms that do it for you.
💯 2
i
fwiw, I think the DSL approach is totally legit too and I would argue that there’s a pretty thin line between DSL and well architected library. The main issue you face going down the stable of languages route is that eventually you want to take something from your nice sound DSL and use it in your robot movement DSL - then what? How do you translate semantics across different languages? And can people internalize all the little differences between the two? Something like LISP is pretty close to that and the conventional wisdom quickly became “don’t use macros (the DSL creation feature) unless you absolutely have to.”
Computational performance is such a fascinating topic to me and it took me a long time to realize that it’s actually much simpler in concept than I thought. To go fast, you just can’t do much stuff. All of those super fancy structures we’d read about in research papers? They were almost always slower because in the end the only cleverness that matters is the bit where the total amount of work you do is less. You were often saving 10% here and hiding an extra 20% over there without realizing it. What makes the problem so difficult is that we hide a bunch of the work under abstractions that go all the way down to the computer arch. Realistically you can never account for all of that, but you’d do better than 99% of the code that’s out there if you had something that could give you better layouts.
so much of modern performance is bound to data movement
💯 2
r
That's a totally fair criticism against the DSL / zoo of languages approach. I don't have a good answer there either, just a personal preference 🙂 I will point out that the "smart compiler" approach has limits as well. This article is a bit specialized, but shows a practical example of automated where register allocation breaks down. Hardware engineers give preference to different registers for certain operations. It's hard to encode that into a modern compiler in an automatic way, though both Intel and ARM spend a lot of time and money attempting to do so. They often get "good enough" but it's never "optimal". Which is fine, but the problem is, because they are heuristic based, there is usually some input program that produces a terrible layout that completely destroys the performance, and then it's hard to fix without breaking other things. Computational performance is definitely a fascinating problem, with multiple approaches. 😄
i
Yeah, I think with something like layout it’s important that you can still manually specify it if you want. The problem with most automated approaches is that you can’t see into it or adjust it slightly without resorting to weird tricks like reordering completely unrelated things.
💯 2
This gets back to one of the biggest mental shifts we had after Eve which was that rather than focusing on simple, we should focus on “simple to understand”
❤️ 2
s
Computational performance is such a fascinating topic to me and it took me a long time to realize that it’s actually much simpler in concept than I thought. To go fast, you just can’t do much stuff. All of those super fancy structures we’d read about in research papers? They were almost always slower because in the end the only cleverness that matters is the bit where the total amount of work you do is less. You were often saving 10% here and hiding an extra 20% over there without realizing it.
This is a profound insight. I remember Jonthan Blow talking about just this in an older talk he gave:

https://youtu.be/JjDsP5n2kSM?t=556▾

TL;DR lists are often fine! Start there, optimize only later / avoid premature optimization. All data structures are fundamentally about optimization. Fun story of him looking at DOOM source code when it came out how he was chewed out by John Romero when Blow said asset loading was done sub-optimally (back when he wasn’t enlightened!)
f
This is a really interesting topic. How does say Python fair here? It's the staple ingredient in most popular data science / machine learning oriented stacks. Most heavy computation is done indirectly via libraries who in turn call out to tightly optimized C++ or GPU code. On the plus side, that gives you the flexibility of a high level language to declaring your computational flow in with the heavy work done in optimized land. Of course, this also makes it a lot of "fun" if you want to customize / inject yourself into some part of the computation, since you need to drop down to Cython or so. On a secondary note, what do you think of Blow's Jai language? Here's a some information on Jai for the unfimilar - basically, C++ "but better", intended for games but quite general.
w
"almost no one is building a system that really requires more than one machine" — have hundreds of billions of dollars of transactions to prove it. Guess I'll retire after overseeing $1T. We've always used one server — running Rails! 🐺 Except for that one time when had a calculation that needed parallelizing over a few hundred EC2 instances. We used Redis.