getImage() previously held _mutex across the 4K NV12->BGR sws_scale in
avframeToCVMat, blocking the decoder callback (onVideoFrame) for 100-300ms
per frame. Under multi-camera load this cascaded into 5-21s frame stalls
and STALE PTS events in the log.
- avframeToCVMat: drop outer _mutex. NV12/YUV420P paths touch no shared
state; avframeAnyToCvmat still locks internally for swsCtx.
- getImage: split into two short locked phases with the BGR conversion
unlocked between them. Decoder callbacks can push new frames and run
the CUDA HW capture path in parallel with the reader's conversion.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>