So I have this c/c++ codebase, around 116,000 line...
# present-company
o
So I have this c/c++ codebase, around 116,000 lines of code. A full release compilation takes 6:25 minutes. Not too bad to work on. If I break this down, a single line of c/c++ code takes 0.0033s (3.3 miliseconds). For comparison, a game that runs at 60 fps, draws a new frame in under 16 miliseconds. So while a game simulates and renders a frame, a c/c++ compiler compiles 5 lines of code. That is usually not even a single function. My cpu avarages at around 10 instructions per clock cycle. At 3.6Ghz it can do 3.6 * 10 * 1,000,000,000 instructions on each core, per second. That's 36,000,000 per milisecond, on 8 cores.. but let's add some cache misses, any instructions can either go full speed 1x or go to main memory around 200x slower. We are looking at around 5 million instructions per milisecond. If we print out (9pt font) all instructions executed to compile a single line of c/c++ code, we'd end up with over 6 kilometer long paper.
🍰 2
💥 5
l
I'm curious about gpu based compiling; eg. 100k lines at avg 80char/line = 8MB ≈ one 4k frame at single channel u8. Tokenizing this and astifying this may require a few passes, but still think this theoretically could be done multiple orders of magnitude faster... 400s to sub 400ms?
j
Gcc and clang are really not tuned for compile latency. Compare to eg https://home.in.tum.de/~engelke/pubs/2403-cgo.pdf or https://arxiv.org/pdf/2305.13241.
Computer graphics are also well suited to data-parallel hardware, which has several orders of magnitude better throughput than serial hardware.
A lot of basic tasks (eg representing changes to code during optimization) don't have good solutions yet.
o
I don't think adding more compute is helping. It's absolutely abysmal that a c++ compiler executes around 40MB of code just to compile a single line. And most lines are not complex at all, assignments and such. It probably re-compiles the standard library for each line. 😅
j
Most of the work is not per-line, but finding good approximations to np-complete problems that span entire functions.
2
😎 2
LLVM could certainly be faster (pointer heavy ir, lots of dynamic function calls for extensibility, mandatory linking phase even when only compiling a single compilation unit). I could believe getting a single order of magnitude improvement while achieving the same code quality. You can also get 2-3 orders of magnitude improvement today from baseline compilers, but typically at the expense of 2-5x worse runtime performance. That's absolutely the wrong tradeoff for release builds in large deployment, since it translates directly to either increased datacenter costs or reduced battery life on laptops/mobile.
Another consideration that's specific to c/c++ is that if you count how many times your includes get duplicated, the compiler is probably dealing with far more lines of code than you actually wrote.
☝️ 1
☝🏼 1
let's add some cache misses
Just a guess, but with llvm being so pointer heavy I'd not be surprised if you have many more cache misses than you're guesstimating. If you have a fairly recent intel cpu you can measure both cache misses and instructions executed with
perf stat -e cycles,instructions,cache_misses -- $BUILD_MY_THING
.