Improve ANSCV with sotfware decoder:

Thread-local staging Mat (video_player.cpp:1400-1407) — single biggest win. Eliminates the 12 MB per-call malloc/free cycle.
Contiguous get_buffer2 allocator (video_decoder.cpp:35-102) — keeps the 3 bulk memcpys cache-friendly. Would also enable FAST/zero-copy for resolutions where visible_h % 64 == 0.
SW-decoder thread config (video_decoder.cpp:528-540) — thread_count=0, thread_type=FRAME|SLICE. FRAME is downgraded to SLICE-only by AV_CODEC_FLAG_LOW_DELAY, but decode throughput is sufficient for your input rate.
SetTargetFPS(100) delivery throttle (already there) — caps onVideoFrame post-decode work at 10 FPS. Keeps the caller path warm-cached.
Instrumentation — [MEDIA_DecInit] / [MEDIA_Convert] / [MEDIA_SWDec] / [MEDIA_Timing] / [MEDIA_JpegTiming] — always-on regression detector, zero cost when ANSCORE_DEBUGVIEW=OFF.
This commit is contained in:
2026-04-20 12:18:43 +10:00
parent adf32da2a2
commit 9f0a10a4c8
13 changed files with 431 additions and 201 deletions

View File

@@ -51,11 +51,19 @@ if(MSVC)
"$<$<COMPILE_LANGUAGE:C,CXX>:/W3>"
"$<$<COMPILE_LANGUAGE:C,CXX>:/utf-8>"
"$<$<AND:$<COMPILE_LANGUAGE:C,CXX>,$<NOT:$<CONFIG:MINSIZEREL>>>:/Zi>"
# RelWithDebInfo: keep /O2 but disable inlining so debuggers can land
# breakpoints on small dispatch functions (e.g. avframeToCVMat).
"$<$<AND:$<COMPILE_LANGUAGE:C,CXX>,$<CONFIG:RelWithDebInfo>>:/Ob0>"
)
add_link_options(
"$<$<NOT:$<CONFIG:DEBUG>>:/DEBUG:FULL>"
"$<$<NOT:$<CONFIG:DEBUG>>:/OPT:REF>"
"$<$<NOT:$<CONFIG:DEBUG>>:/OPT:ICF>"
# /OPT:REF and /OPT:ICF improve Release size/perf but confuse the
# debugger (folds identical functions, strips unused ones). Apply
# them to Release only, not RelWithDebInfo.
"$<$<CONFIG:Release>:/OPT:REF>"
"$<$<CONFIG:Release>:/OPT:ICF>"
"$<$<CONFIG:RelWithDebInfo>:/OPT:NOREF>"
"$<$<CONFIG:RelWithDebInfo>:/OPT:NOICF>"
)
add_compile_definitions(_CRT_SECURE_NO_WARNINGS _WINSOCK_DEPRECATED_NO_WARNINGS)
elseif(CMAKE_CXX_COMPILER_ID MATCHES "Clang|GNU")
@@ -89,6 +97,35 @@ else()
message(STATUS "ANSCORE_DEBUGVIEW = OFF — ANS_DBG verbose logging disabled (production)")
endif()
# ── Vendored libyuv (submodule: 3rdparty/libyuv) ────────────────
# SIMD-optimized YUV conversion library. Only genuinely fast on toolchains
# that understand GCC inline assembly (Clang/clang-cl/GCC) — 144 of its
# SIMD row functions live in row_gcc.cc. MSVC (cl.exe) can only compile the
# 8 routines in row_win.cc, so on MSVC libyuv silently falls back to the
# scalar C code in row_common.cc and runs ~10× slower than OpenCV+IPP's
# cv::cvtColor(COLOR_YUV2BGR_I420). Do NOT enable on MSVC builds.
#
# Default: OFF. Enable with `-DANSCORE_USE_LIBYUV=ON` only after switching
# the project's compiler to clang-cl / clang / gcc.
option(ANSCORE_USE_LIBYUV "Use libyuv for YUV→BGR conversion (only effective on Clang/GCC, NOT MSVC)" OFF)
if(ANSCORE_USE_LIBYUV AND EXISTS "${CMAKE_SOURCE_DIR}/3rdparty/libyuv/CMakeLists.txt")
if(MSVC AND NOT CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
message(WARNING "ANSCORE_USE_LIBYUV=ON but compiler is MSVC cl.exe — "
"libyuv's SIMD paths (row_gcc.cc) won't compile. "
"Expect ~10× slowdown vs cv::cvtColor. "
"Use clang-cl (LLVM) instead, or keep ANSCORE_USE_LIBYUV=OFF.")
endif()
# Prevent libyuv from finding a system libjpeg and enabling HAVE_JPEG
# (we don't use libyuv's MJPEG codepaths; avoids an unnecessary runtime dep).
set(CMAKE_DISABLE_FIND_PACKAGE_JPEG TRUE)
add_subdirectory(3rdparty/libyuv EXCLUDE_FROM_ALL)
set(ANSCORE_HAS_LIBYUV ON)
message(STATUS "libyuv: ENABLED (vendored from 3rdparty/libyuv, static target 'yuv')")
else()
set(ANSCORE_HAS_LIBYUV OFF)
message(STATUS "libyuv: DISABLED — using cv::cvtColor+IPP path (fast on MSVC)")
endif()
# ── External Dependencies ───────────────────────────────────────
include(cmake/Dependencies.cmake)