yes, currently 512. I have various investigations to do, and there are trade offs to be considered.
• As you can see, there isn't much startup delay assigning work to the threadpool, especially compared to the processing time for an individual graph/synth.
• With less samples (as you were possibly hinting at), the dynamics would change.
• The graph is optimised for flexibility, but making nodes that did combined work might be prudent at some point. Each node effectively means that the memory is passed over again, and in general is copied. The effect chain that works on the stereo path could be an optional single node that combines delay, phaser, reverb, etc. Since it will often be the same.
• I've actually spent very little time optimising anything; just the bigger picture stuff; using memory pools for buffers, ensuring the audio thread doesn't have to wait for anything.
• I have to also make this work in the context of the overall app where graphics/rendering are also occurring. I may have mentioned before, that making a unified app has interesting challenges because the process boundaries have gone. Sonic Pi for example has a ruby process that is working on the notes, and SuperCollider. This is effectively 'free' threading once the communication is worked out. In a single app, you have to manage more threading issues to get it all to work together.
In short, lots of tuning to do. But I'm going for the bigger picture 'get it all working/stable', ship it, and work on the details once people can play with it.