I will remind people that the Luna people are goin...
# linking-together
e
I will remind people that the Luna people are going for a very hard target, which is having dual graphical + textual representation, freely convertible. Also, i added info on Jonathan Blow's Jai language, which i estimate will release in 2020. It is an extremely thorough implementation, designed for building games, and is already greatly superior to C++ IMHO in its current state. Of course we can only see videos at this point, but you can tell it is real, and coming.
y
My guesstimate is that Jai is similar to D and could had been successful if Rust didn’t pick up, but that the trend in that space will go towards Rust.
2
🤔 1
s
Yes, Jai looks great but its still mostly a "better C++" in a long line of languages that are better versions of C or C++ (Nim, V, D, Rust). It will cover a niche that most of those languages don't though, in that I think it will have language level support for low level patterns that game developers already use but can be painful to implement in a disciplined way in C++ or C#. Something that isn't a goal of Rust. I actually think Jai encodes John's knowledge of how "game programming should be" from the last 30 years of game development on the scale he's done it at though, and doesn't really solve problems for modern games developers, let alone Future. So it's not really the "Future of Coding" for games, but more like the past.
👍 1
I don't think game developers will widely adopt Rust. It's possible that it could happen over time if someone built a Unity or Unreal quality game engine where Rust was the primary game programming language (not just for engine code). I'm not exactly sure where Rust will gain wide adoption though. Maybe areas where C++ is traditionally used for performance but safety is important?
The reason I say that Jai doesn't solve modern game development problems is that there isn't even a mention of SIMD, autovectorization, SPMD, etc. The only mention of concurrency or parallelism is eventually it will have a "better concurrency model"
💡 1
so it seems like now he's just exposing OS threads and calling it a day
he cares about single threaded performance, he cares a lot about memory (no GC pauses, reduce fragmentation, etc.), but he doesn't care about utilizing all available compute, and he doesn't care about networking, which is somewhat in line with the single player puzzle games that Jon makes, but it's been 15 years since game engines didn't at least have a rendering thread
Unity is doing some good stuff with Burst Compiler and DOTS, but that could be taken further. I really want to make a hybrid visual\textual programming language inspired by vfx editors (Niagara, Unity VFX graph, Bungie TFX) and machine learning frameworks where you can choose to run code on heterogeneous processors (GPU for various pipeline stages and CPU)
👍 1
e
I have watched a bunch of the Jai videos, and I can assure you he is doing a superb job on the performance bottlenecks that actually matter. The fancy instructions that Intel has added to their hardware in the last dozen years are garbage, and have nearly zero impact on performance of actual software. The real killer problem is whether you are generating cache misses; every cache miss in the CPU can cost over 100 clocks, and he has spent a lot of thought working on that crucial issue. Do not underestimate someone who has shipped on so many platforms with such a wide hardware power range. That gives you a real education. I have never met Mr. Blow, but i know from experience that staying in the cache really matters, way more than using one of the silly instructions that Intel added primarily for the purpose of slowing down the cloners. One of the technical reasons Java programs are so slow is that they spray objects all over the heap, and it causes a lot of cache misses. This is one of the intrinsic flaws of the OOP paradigm, that it doesn't concentrate memory accesses into a small region. Yes, Jai is focused on giving you very low level control over things, but 3D games do throw around a lot of data, and being able to optionally specify various low level aspects will be appreciated by people on slower platforms (like mobile devices, which have very tight RAM constraints compared to desktop platforms). Of the new language projects on the next gen language spreadsheet, i would say he is ahead of the other teams; he has already got something like 50k lines of code of a game working, so only the Red team is ahead in terms of number of lines of code in their language being used. Red has split their language into two main dialects, one for system programming and one for general use. That is an interesting strategy, because the system one can be simpler and not try to cover such a wide range of applications. It is more work to create two code bases and language specs, but it does reflect that fact that some people are doing system programming and don't need any user interface stuff. Certainly the libraries can way smaller and perhaps more focused on a lower level implementation of network protocols, which would be a hindrance for someone writing a simple app that needs to post some fields to a web API form.
s
I know the importance of cache friendly data layouts. I've shipped multiple AAA games on console, worked on my own personal game (which was funded by Blow at one point) that targeted Xbox 360, and I just mentioned Unity DOTS (data oriented tech stack), whose major reason for existence is reducing cache misses in Unity game code. I even said that Jai is doing a good job with memory related performance issues, although I should have explicitly mentioned cache management. The feature that allows for treating data as SOA or AOS with a single keyword is slick. To say that SIMD doesn't matter at all because "fancy instructions that Intel has added to their hardware in the last dozen years are garbage" (AVX?) is naive. Utilizing SSE4 instructions efficiently is standard in modern game engines, so is scaling to ~4-8 cores. You can argue that the language shouldn't do this, game developers are already doing it, but that brings Jai back to "better C++" land imo, not Future of Coding for game engines.
e
I did a comparison once where i turned off the compiler flags for instructions past the Pentium, which is a very simplistic set of instructions, and it didn't affect the performance in either storage or speed more than 1%. So my assertion that most of the latest intel instructions are near worthless stands. For hand-coded assembler work, they can occasionally come in handy (my product had 1% assembler for key bitmap rotation/scaling library functions). There is simply no excuse for some of the latest Intel instructions which one cannot even understand what they do, obviously intended to slow down cloners. Each baffling instruction will chew up precious engineering time at the clone companies, which cannot match Intel's vast labor pool. The only instruction i really like of the last 10 years is the true random number generator they added. We have wanted real random numbers for decades, and now we finally have them. You have to admit that VGF2P8AFFINEINVQB as an opcode is absurd, and how many programmers out of a million will ever use the Galois Field Affine Transformation Inverse instruction?. Intel is spinning their wheels in the sand, with 10x the number of engineers they had to develop the Pentium which was their big winner. They seem to be making things unnecessarily complicated, and their security holes are embarrassing. I like Intel chips, they are well made and stand behind their product, but the post Andy Grove era is not a pretty picture. How about ditching IEEE floating point for DEC64 if you want to make some actual progress? 0.1 + 0.2 does not equal 0.3 still, and if they listened to programmers instead of strategizing to paralyze AMD and Chinese cloners, we might get somewhere.
k
I did a comparison once where i turned off the compiler flags for instructions past the Pentium... and it didn't affect the performance in either storage or speed more than 1%.
Over what programs in what domain? Could other domains have different needs?
e
It was a huge 120k line interactive graphics program that did pretty much everything you can do: image processing, sound, interactive bitmap graphics, printing, tons of functionality (the Discus CD labeler). Since one has to support machines without all the fancy instructions, because intel has a crazy-quilt of availability, you end up avoiding them anyway. No evidence that creating a crazy-quilt of instructions has any effect on general performance; mass-market software developers like me don't waste time optimizing for fractional marketshare device features.
k
Thanks, that helps put your generalization in context. And what was your methodology for measuring the improvement of 1%?
s
hem. You have to admit that VGF2P8AFFINEINVQB as an opcode is absurd, and how many programmers out of a million will ever use the Galois Field Affine Transformation Inverse instruction?
Yes, many of the higher level, sometimes domain specific instructions and wasteful, I agree with that. I think we're talking about different CPU instructions. I'm talking about explicit efficient use of basic SIMD instructions like move, shuffle, add, multiply, etc. that effectively exist on all consumer devices. SSE has existed for 20 years. SSE, Altivec or Neon are on almost all consumer devices you'd would want to ship a game on, including raspberry pi and other cheap SOCs. I assume you're only talking about new AVX instructions because you mentioned fractional market share devices? Turning on compiler flags in many compilers does not (and cannot!) usually take advantage of SIMD, so I'm not surprised you saw no difference. Sometimes auto-vectorization can be a win, but often it isn't, and some compilers don't support it (MSVC) or require you to write special code to make sure it is done in a performant way. Often this stuff is hidden in a math library (https://github.com/Microsoft/DirectXMath, https://glm.g-truc.net/0.9.9/index.html) or some place where it really matters, but I'd argue that even for 3D gameplay code there are advantages. I'm talking about the domain of games specifically because that's what Jon is trying to address in Jai. I'm not anti-Jai btw, there are a ton of things I like about it, but SIMD and multithreading are fundamental to game performance. The reason I'm bringing this up in the context of language is we've seen language features introduced that make writing vectorized, cache friendly and parallel code easier, and I believe there are ways to improve existing patterns that have been demonstrated. If you're not familiar please read up on ISPC (https://ispc.github.io/index.html), and Burst Compiler\HPC# (https://docs.unity3d.com/Packages/com.unity.burst@0.2/manual/index.html, https://lucasmeijer.com/posts/cpp_unity/). It's also completely possible that Jon Blow is thinking about all of this and just hasn't demonstrated it in Jai.
👍 1
e
I ran benchmarks for speed, and counted the bytes of the EXE file on windows. I only shrunk the code size by 1% by adding the fancier instructions. Negligible effect on performance. Programs typically follow the 80-20 rule, where 80% of the time is spent in 20% of the code. And oftentimes performance is determined by how well you lay out the data, not really which instructions you use to access the data.