Improve ANSCV with sotfware decoder:

Thread-local staging Mat (video_player.cpp:1400-1407) — single biggest win. Eliminates the 12 MB per-call malloc/free cycle.
Contiguous get_buffer2 allocator (video_decoder.cpp:35-102) — keeps the 3 bulk memcpys cache-friendly. Would also enable FAST/zero-copy for resolutions where visible_h % 64 == 0.
SW-decoder thread config (video_decoder.cpp:528-540) — thread_count=0, thread_type=FRAME|SLICE. FRAME is downgraded to SLICE-only by AV_CODEC_FLAG_LOW_DELAY, but decode throughput is sufficient for your input rate.
SetTargetFPS(100) delivery throttle (already there) — caps onVideoFrame post-decode work at 10 FPS. Keeps the caller path warm-cached.
Instrumentation — [MEDIA_DecInit] / [MEDIA_Convert] / [MEDIA_SWDec] / [MEDIA_Timing] / [MEDIA_JpegTiming] — always-on regression detector, zero cost when ANSCORE_DEBUGVIEW=OFF.
This commit is contained in:
2026-04-20 12:18:43 +10:00
parent adf32da2a2
commit 9f0a10a4c8
13 changed files with 431 additions and 201 deletions

View File

@@ -147,6 +147,15 @@ public:
AVCodecContext* getAVCodeContext() {
return m_pContext;
}
// Custom AVCodecContext::get_buffer2 callback used by the SOFTWARE decoder.
// Allocates Y, U, and V planes of YUV420P / YUVJ420P frames in a SINGLE
// contiguous av_malloc block so that CVideoPlayer::avframeYUV420PToCvMat
// can wrap them zero-copy into an I420 cv::Mat when the allocated height
// matches the visible height (i.e. no codec padding rows between planes).
// For unhandled formats (HW surfaces, 10-bit, 4:2:2, 4:4:4, planar-alpha,
// …) it delegates to avcodec_default_get_buffer2, preserving correctness.
static int contiguousGetBuffer2(AVCodecContext* s, AVFrame* frame, int flags);
private:
BOOL readFrame();
int render(AVFrame* frame);