Hello! I'm curious if anyone here has a good idea ...
# of-graphics
p
Hello! I'm curious if anyone here has a good idea about interleaving works between a compute shader and a fragment shader. Some relevant details: • My app is built with Rust and wgpu, and I'm running on an M1 Macbook Pro. • I have a single encoder with a compute pipeline and a render pipeline. • The compute shader writes to a storage buffer defined like this:
Copy code
@group(0) @binding(2) var<storage, read_write> output: array<vec4<f32>>;
• The fragment shader reads from the same buffer. Basically, each fragment is just one element of the
vec4<f32>
. The fragment shader is very simple, and doesn't touch anything else in the storage buffer. I've added timestamp queries to the pipeline, and what I'm seeing is this:
Copy code
Duration #1: 47.800208ms
Duration #2: 47.809876ms
Frame time: 51.2545ms
Duration #1
is computed from the compute shader timestamps (the duration between the beginning and end of the compute pass) and
Duration #2
is the time for the render pass, computed the same way.
Frame time
is measured on the CPU. I expected the duration of the compute shader and fragment shader to add up to the frame time (approximately). But it doesn't and I'm confused about why! Could it be due to interleaving of the compute pass and render pass? If so, I'm curious how the synchronization works. How does the GPU figure out the dependencies between the write (a compute shader invocation) and the reader (fragment shader invocation)? I don't have any explicit synchronization, but I'm also not seeing any tearing or anything that would indicate that there is a data race between the shaders.
d
those durations are suspiciously close for being times of completely different processes! plus, is it possible that the compute for frame N was run in frame N-1?
p
I agree. My colleague’s theory was that they are interleaved, and that they are both running at the same time but that they are somehow synchronized by the runtime. I’m skeptical that we’d get that for free.
I also wondered whether they are somehow running in parallel but the computer shader is one frame ahead, but I don’t see how that’s possible in the code. Both pipelines are added to the same encoder and so definitely submitted together.
I asked on another forum, and the conclusion was that the timestamps may not be reliable. There are a bunch of wgpu issues related to timestamps on Metal and TBDR architectures
j
I also vote “unreliable timestamps”
s
I did a bit of digging in the Metal docs and they do mention some kind of magic auto-parallelism:
Metal automatically tracks dependencies between the compute and render passes. When the sample sends the command buffer to be executed, Metal detects that the compute pass writes to the output texture and the render pass reads from it, and makes sure the GPU finishes the compute pass before starting the render pass.
https://developer.apple.com/documentation/metal/compute_passes/processing_a_texture_in_a_compute_function
s
There are pretty good graphics debugging tools in Xcode/Instruments. If you can somehow hook them up to the Metal part of whatever you’re running, they would give you detailed information about what’s running where and when. https://developer.apple.com/documentation/xcode/metal-debugger/
Perhaps this gets you started: https://developer.apple.com/documentation/xcode/capturing-a-metal-workload-programmatically In particular: “Alternatively, in macOS 14 and later, you can set the environment variable on your Metal app: MTL_CAPTURE_ENABLED=1.” I’d assume that the library you’re using would likely expose any debugging facilities as well…?
Then you should be able to replay a captured trace like this: https://developer.apple.com/documentation/xcode/replaying-a-gpu-trace-file
p
@Sam Gentle That's interesting, thanks! Though I'm still skeptical that it could automatically determine the fine-grained dependencies between individual invocations of the compute/render passes. From the wording it sounds more like it would wait until the compute pass is complete before starting the render pass.
@Stefan Good call, I should see what XCode will tell me. I do know how to capture a GPU trace, the only issue is that when I last tried it, it seemed that wgpu doesn't generate Metal debug symbols, so it's somewhat limited. But I could always try debugging a standalone pure Metal example that replicates the same access patterns. I found this, which seems like it could be helpful: https://developer.apple.com/documentation/xcode/analyzing-resource-dependencies