Improve ANSCV with sotfware decoder:

Thread-local staging Mat (video_player.cpp:1400-1407) — single biggest win. Eliminates the 12 MB per-call malloc/free cycle.
Contiguous get_buffer2 allocator (video_decoder.cpp:35-102) — keeps the 3 bulk memcpys cache-friendly. Would also enable FAST/zero-copy for resolutions where visible_h % 64 == 0.
SW-decoder thread config (video_decoder.cpp:528-540) — thread_count=0, thread_type=FRAME|SLICE. FRAME is downgraded to SLICE-only by AV_CODEC_FLAG_LOW_DELAY, but decode throughput is sufficient for your input rate.
SetTargetFPS(100) delivery throttle (already there) — caps onVideoFrame post-decode work at 10 FPS. Keeps the caller path warm-cached.
Instrumentation — [MEDIA_DecInit] / [MEDIA_Convert] / [MEDIA_SWDec] / [MEDIA_Timing] / [MEDIA_JpegTiming] — always-on regression detector, zero cost when ANSCORE_DEBUGVIEW=OFF.
This commit is contained in:
2026-04-20 12:18:43 +10:00
parent adf32da2a2
commit 9f0a10a4c8
13 changed files with 431 additions and 201 deletions

View File

@@ -89,6 +89,15 @@ target_link_libraries(ANSCV
PRIVATE CUDA::nvjpeg
)
# libyuv — vendored at 3rdparty/libyuv (added by top-level CMakeLists when
# the submodule is present). Provides SIMD-accelerated I420→RGB24 used by
# CVideoPlayer::avframeYUV420PToCvMat for the SW-decode fast path.
if(ANSCORE_HAS_LIBYUV)
target_link_libraries(ANSCV PRIVATE yuv)
target_include_directories(ANSCV PRIVATE ${CMAKE_SOURCE_DIR}/3rdparty/libyuv/include)
target_compile_definitions(ANSCV PRIVATE ANSCORE_HAS_LIBYUV=1)
endif()
# Platform-specific libs
if(WIN32)
target_link_directories(ANSCV PRIVATE