So I have this c c++ codebase around 116 000 lines of code A Future of Coding #present-company

So I have this c/c++ codebase, around 116,000 line...

oPOKtdJ4UbTdPaZig6jg

08/10/2024, 7:25 AM

So I have this c/c++ codebase, around 116,000 lines of code. A full release compilation takes 6:25 minutes. Not too bad to work on. If I break this down, a single line of c/c++ code takes 0.0033s (3.3 miliseconds). For comparison, a game that runs at 60 fps, draws a new frame in under 16 miliseconds. So while a game simulates and renders a frame, a c/c++ compiler compiles 5 lines of code. That is usually not even a single function. My cpu avarages at around 10 instructions per clock cycle. At 3.6Ghz it can do 3.6 * 10 * 1,000,000,000 instructions on each core, per second. That's 36,000,000 per milisecond, on 8 cores.. but let's add some cache misses, any instructions can either go full speed 1x or go to main memory around 200x slower. We are looking at around 5 million instructions per milisecond. If we print out (9pt font) all instructions executed to compile a single line of c/c++ code, we'd end up with over 6 kilometer long paper.

🍰 2

💥 5

Leonard Pauli

08/10/2024, 4:29 PM

I'm curious about gpu based compiling; eg. 100k lines at avg 80char/line = 8MB ≈ one 4k frame at single channel u8. Tokenizing this and astifying this may require a few passes, but still think this theoretically could be done multiple orders of magnitude faster... 400s to sub 400ms?

jamii

08/10/2024, 7:20 PM

Gcc and clang are really not tuned for compile latency. Compare to eg https://home.in.tum.de/~engelke/pubs/2403-cgo.pdf or https://arxiv.org/pdf/2305.13241.

jamii

08/10/2024, 7:23 PM

Computer graphics are also well suited to data-parallel hardware, which has several orders of magnitude better throughput than serial hardware.

jamii

08/10/2024, 7:29 PM

There's been some work on compiling on gpus (eg https://onedrive.live.com/?authkey=%21AN3eH7D93Q8%2Dxzs&cid=0CFFDB1C3A2F95F6&id=CFFDB1C3A2F95F6%21702303&parId=CFFDB1C3A2F95F6%21184476&o=OneUp) but it's very basic.

jamii

08/10/2024, 7:30 PM

A lot of basic tasks (eg representing changes to code during optimization) don't have good solutions yet.

oPOKtdJ4UbTdPaZig6jg

08/12/2024, 6:38 AM

I don't think adding more compute is helping. It's absolutely abysmal that a c++ compiler executes around 40MB of code just to compile a single line. And most lines are not complex at all, assignments and such. It probably re-compiles the standard library for each line. 😅

jamii

08/13/2024, 10:17 PM

Most of the work is not per-line, but finding good approximations to np-complete problems that span entire functions.

➕ 2

😎 2

jamii

08/13/2024, 10:23 PM

LLVM could certainly be faster (pointer heavy ir, lots of dynamic function calls for extensibility, mandatory linking phase even when only compiling a single compilation unit). I could believe getting a single order of magnitude improvement while achieving the same code quality. You can also get 2-3 orders of magnitude improvement today from baseline compilers, but typically at the expense of 2-5x worse runtime performance. That's absolutely the wrong tradeoff for release builds in large deployment, since it translates directly to either increased datacenter costs or reduced battery life on laptops/mobile.

jamii

08/13/2024, 10:25 PM

Another consideration that's specific to c/c++ is that if you count how many times your includes get duplicated, the compiler is probably dealing with far more lines of code than you actually wrote.

☝️ 1

☝🏼 1

jamii

08/13/2024, 10:30 PM

let's add some cache misses

Just a guess, but with llvm being so pointer heavy I'd not be surprised if you have many more cache misses than you're guesstimating. If you have a fairly recent intel cpu you can measure both cache misses and instructions executed with

perf stat -e cycles,instructions,cache_misses -- $BUILD_MY_THING

Open in Slack

Previous Next