Patrick Dubroy
11/02/2024, 10:45 AM@group(0) @binding(2) var<storage, read_write> output: array<vec4<f32>>;
• The fragment shader reads from the same buffer. Basically, each fragment is just one element of the vec4<f32>
. The fragment shader is very simple, and doesn't touch anything else in the storage buffer.
I've added timestamp queries to the pipeline, and what I'm seeing is this:
Duration #1: 47.800208ms
Duration #2: 47.809876ms
Frame time: 51.2545ms
Duration #1
is computed from the compute shader timestamps (the duration between the beginning and end of the compute pass) and Duration #2
is the time for the render pass, computed the same way.
Frame time
is measured on the CPU.
I expected the duration of the compute shader and fragment shader to add up to the frame time (approximately). But it doesn't and I'm confused about why! Could it be due to interleaving of the compute pass and render pass? If so, I'm curious how the synchronization works. How does the GPU figure out the dependencies between the write (a compute shader invocation) and the reader (fragment shader invocation)?
I don't have any explicit synchronization, but I'm also not seeing any tearing or anything that would indicate that there is a data race between the shaders.Duncan Cragg
11/02/2024, 1:33 PMPatrick Dubroy
11/02/2024, 1:56 PMPatrick Dubroy
11/02/2024, 1:58 PMPatrick Dubroy
11/02/2024, 2:01 PMJack Rusher
11/03/2024, 3:07 PMSam Gentle
11/03/2024, 11:31 PMMetal automatically tracks dependencies between the compute and render passes. When the sample sends the command buffer to be executed, Metal detects that the compute pass writes to the output texture and the render pass reads from it, and makes sure the GPU finishes the compute pass before starting the render pass.https://developer.apple.com/documentation/metal/compute_passes/processing_a_texture_in_a_compute_function
Stefan
11/04/2024, 7:28 AMStefan
11/04/2024, 7:32 AMStefan
11/04/2024, 7:35 AMPatrick Dubroy
11/04/2024, 7:37 AMPatrick Dubroy
11/04/2024, 7:45 AM