<@UKQT95T1V> that intuitively makes sense but in p...
# thinking-together
w
@Alan Johnson that intuitively makes sense but in practice this doesn't seem to happen. While managed languages don't have to worry about issues like aliasing in C++, so in theory they could be faster, in practice they never go that far. The levels to which a compiler like LLVM can collapse both code and data in C++ to almost nothing is crazy, nothing in the managed world compares. Languages like Java have their own optimisation challenges, e.g. as soon as escape analysis fails it gets very inefficient.
i
I'd be interested to see whether this balance changed if there was an LLVM or JVM-sized investment put behind a language where the programming model was a better fit for modern CPU design — where the programming model mapped more cleanly to the different classes of speculative execution (branch prediction, etc) than C-like languages that expect execution to be strictly in order.
❤️ 2
Not sure that this is practical currently, but it's surely theoretically possible if one of the titans of industry decided it was a good idea. Or if, say, Intel decided they wanted to create the next hit systems programming language.
w
Yup, very much agree that the amount of effort put into making a language faster makes all the difference
s
Or if there was an equivalent investment in hardware that was more suitable for so called 'higher level' languages. E.g. https://en.wikipedia.org/wiki/Tagged_architecture which is emulated in software by almost every high level language.
w
Rust is a good candidate: it lacks C++ aliasing issues (which are a huge impediment to optimisation) but is otherwise equivalent. So given endless effort put into optimizing it, it should eventually be faster than C++
s
The fact that LLVM is the best we have is evidence to me that we're running on brute force rather than good ideas. All that complexity of millions of LOC to get good performance of even the simple programs.
🍰 1
w
And of course crazy efforts have gone into making JS fast, whereas similarly dynamic languages like Python have received relatively little optimisation
@Ivan Reese that's a nice article.. yes the machine that C targets its far from reality 🙂 But we can't actually target the real hardware anymore, as even the CPU ISA pretends its still 1990 or so
👍 2
s
The other angle important to me, where this whole model fails is that in a large scale system, one-process/binary is just one piece of the puzzle. Large scale architecture and topology design will dominate any single process optimization. Basically LLVM is then just a peephole optimizer for the whole system.
w
@shalabh well yes, but that's a chicken and egg problem. certainly if we changed hardware, we'd favor different kind of languages.. but we are not. So as a language designer currently considering should I design this feature as having X or Y semantics, where Y is 2x slower on current hardware, that is still a real concern in terms of the practicality of your language
of all the things we can try to improve about this world, changing the hardware is about the furthest from our abilities 🙂 It may suck, but there's little point in complaining about it
s
..so we need a whole system optimization model - so a language/environment to express and execute the larger (whole system) processes rather than local RAM bit fiddling optimizers.
w
you will always need the latter
the largest "whole systems" that I am somewhat familiar with are the ones at Google, and we're very far off any kind of optimizing at that level (in a compiler optimisation sense), it's all done by humans. This LLVM level optimization you so dislike controls 99% of data center costs
s
if all the things we can try to improve about this world, changing the hardware is about the furthest from our abilities 🙂 It may suck, but there's little point in complaining about it
@Wouter - yeah I think this is the situated vs radical perspectives that came up earlier. How much we want to work with existing tech, how much we want to reinvent and rethink etc. If we're looking at this as research and imagining ~50+ years out, ideas today may end up being actualized. So it's still worth and interesting to explore and develop the ideas. Depends on the end goals and motivations.
👍 1
w
I guess I personally prefer to work on designs I can implement myself, today 🙂 Maybe I'm conservative? 😛
Also, even 50 years may not be enough to turn the tide in whatever way you think
remember, in the previous 50 years, we've seen little to no fundamental change in basic technologies like CPUs, memory, and OSes etc.. it is not impossible we're going to spend another 50 with them 😛
s
rather than local RAM bit fiddling optimizers
you will always need the latter
Agree, but I'm saying that it could be much simpler than millions of LOC and the C++ language spec without cost.
This LLVM level optimization you so dislike controls 99% of data center costs
Yes, I'm well aware. But also looking at how much proxying, redundant encoding/decoding, RPC fan-out and model duplication happens across all the services, it seems we're losing a lot as well. I don't dislike LLVM, I think it's pretty solid. I'm arguing for exploring a different model starting top-down. Not just LLVM but all language runtimes/compilers come from a world where 'process' meant OS/Unix process (maybe your whole world was a single OS instance). But processes in large scale systems are very different and we're kind of hacking them on top as a separate layer.
I guess I personally prefer to work on designs I can implement myself, today 🙂 Maybe I'm conservative?
Why not do both? 😄
w
I have limited time 🙂
i
I would be very surprised if Apple didn't end up making changes to some of their hardware to accommodate some Swift feature or framework. They're already knocking at the door with features like BitCode, and they'll be making their own ARM chips for Macs soon. I expect they eventually stop licensing ARM and develop their own instruction set and architecture, maybe even within the decade. That's why I suggested Intel could also end up doing something similar. It's not within the realm of possibility for any of us, but there are people that have the entire stack from wafer to monad within their purview.
s
FPGAs are another architecture that's growing again and folks are using it in data centers.
I have limited time
Fair. I guess I'm not arguing for other folks to take this position but just that these positions focused on managed/higher level systems with longer term horizons are valuable and can become net wins.
Also, even 50 years may not be enough to turn the tide in whatever way you think
Yes, yes. It may not turn at all or likely even turn a completely different way. But these ideas are still very interesting to me.
❤️ 1
f
Check out https://millcomputing.com/ for some ISA differences from "50 years ago" - still heavily in(slow) development alas
d
in the previous 50 years, we've seen little to no fundamental change in basic technologies like CPUs, memory
We have GPUs, which have a radically different architecture from CPUs optimized for running C programs, and which are capable of a wider range of general purpose computing than is admitted by conventional wisdom. And we now have non-volatile RAM. Current software doesn't fully exploit this hardware: we need new languages and operating systems for that.
👍 1
The levels to which a compiler like LLVM can collapse both code and data in C++ to almost nothing is crazy, nothing in the managed world compares.
"Managed" means uses a garbage collector, or compiles to VM code, or both. There's no necessary conflict between "managed" and using LLVM as a backend code generator, it's just a matter of designing the language or compiler to support this. My Curv language is an interpreted, dynamically typed language, with a subset that compiles into highly optimized machine code for the CPU and GPU.
w
And don't forget RAM caches. A recall a funny Ruby performance improvement that was accomplished by increasing the base size of objects from five words to eight. No other change than just padding out the struct. https://github.com/ruby/ruby/pull/495
d
don't forget RAM caches
K is a dynamically typed, interpreted language used for performance critical applications (high speed financial trading). It is super fast in part because the entire interpreter and runtime fits in L1 cache.
2
k
@Doug Moen What would a new operating system look like that could exploit GPU and NVRAM technologies?
d
@Ken Kan With NVRAM, your operating system doesn't lose its state when the hardware is shut down and loses power. That's called "persistence", and current operating systems support this via "hibernation". In current Unix-like operating systems, there's a big difference in how data is represented in RAM, vs how it is represented on disk (files, directories). With NVRAM, you could unify these representations. HP had a research project called "The Machine" which explored the implications of an NVRAM operating system in more detail. https://www.hpl.hp.com/research/systems-research/themachine/
@Ken Kan GPUs support data-parallel computing. This is a powerful paradigm for general purpose computing. If you have thousands, or tens of thousands of processing units for each gigabyte of memory, that might be a more scalable and efficient architecture, a better way to allocate your transistor budget. HP's "The Machine" project proposed to "bring processing closer to the data", and that's what GPUs do when you have thousands of cores and gigabytes of memory on the same chip. (Although the HP material doesn't talk about GPUs explicitly.) There have been many research projects that investigated scalable, massively parallel computing, with huge numbers of cores, each having local memory. The advantage of GPUs is that every personal computer has one (I'm including cell phones in my definition of personal computer). Currently, GPUs are difficult to program, the standard APIs are quite terrible. High level language support for data-parallel computing is quite limited. Conventional programming languages and libraries push you to use idioms that don't map well to data-parallel computing. So there are all these barriers that inhibit more widespread use of GPU capabilities, and I see a lot of wasted potential. Because of these barriers, GPUs are only used for a few specialized applications, like graphics and machine learning. If these barriers could be removed, via better languages and libraries, then GPUs and data-parallel programming might come to be used in a more general and pervasive way. GPU manufacturers will respond by designing GPU hardware to be more general purpose (eg, like the way they are currently responding to the rise of machine learning). That could eventually lead to more GPU-centric operating systems.
k
Interesting. It looks like The Machine is something HP designs from the ground up so that the hardware and the OS are optimized for memory localized to GPUs. Is that the idea? The missing piece seems to be how to write software for it.