When the decoder hasn't produced a frame in 5s, skip the call to
_playerClient->getImage() entirely and return the cached frame with
unchanged _pts. LabVIEW sees STALE PTS one poll earlier and can
trigger reconnect sooner.
Threshold matches the existing checks on the duplicate-PTS branch and
in areImagesIdentical() so all three stale paths agree. Near-zero cost:
one getLastFrameAgeMs() call before the main path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- NvJpegPool: singleton pool of 4 NvJpegCompressor instances with
lock-free slot acquisition (~160MB VRAM). Threads that can't grab
a slot fall back to TurboJPEG with zero wait.
- JPEG passthrough: BmpToJpeg now checks if input is already JPEG
(FF D8 FF magic) and copies directly without re-encoding.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
BmpToJpeg was slow (~25-45ms for 4K) due to two bottlenecks:
1. cv::imdecode for BMP parsing (unnecessary for uncompressed BMP)
2. TurboJPEG CPU encoding (~11ms for 4K)
Fix 1: Zero-copy BMP parsing — parse header directly and wrap pixel
data in cv::Mat without allocation or copy. Eliminates ~47MB of heap
allocations per 4K frame.
Fix 2: NvJpegCompressor class using nvJPEG hardware encoder on NVIDIA
GPUs (~1-2ms for 4K). Integrated into CompressJpegToString so all 5
JPEG encoding callsites benefit automatically. Reusable GPU buffer
avoids per-frame cudaMalloc/cudaFree. Silent fallback to TurboJPEG
on Intel/AMD or if nvJPEG fails.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>