Files
ANSCORE/ANSFrame_Multi_Resolution_Plan.md

32 KiB
Raw Permalink Blame History

ANSFrame Multi-Resolution Architecture Plan

Detailed Comparison: Current vs ANSFrame

Test Configuration

  • GPU: RTX 4070 Laptop (8 GB VRAM)
  • Cameras: 5 running (3840×2160, 2880×1620, 1920×1080)
  • AI Tasks: 12 (subscribing to cameras)
  • Engines: 7 TRT engines
  • Decode: Software (YUV420P, CPU)
  • Frame rate: SetTargetFPS(100) = ~10 FPS per camera
  • Baseline: 8.4 hour stable run (ANSLEGION42), 1,048,325 inferences

Per-Frame Generation (inside GetImage)

Step Current ANSFrame
Decode YUV420P ~5ms (CPU) ~5ms (same)
Full res BGR (cvtColor) ~4-8ms (4K YUV→BGR) ~4-8ms (same)
640×640 letterbox Not done here ~0.8ms (resize YUV planes + cvtColor, done ONCE)
1080p display Not done (returns 4K) ~1.5ms (resize YUV planes + cvtColor, done ONCE)
Total GetImage time ~4-8ms ~6-10ms (+2ms for 2 extra sizes)

Clone & Dispatch to AI (per clone × 12 AI tasks)

Step Current (4K clone) ANSFrame (1080p clone)
Clone image size 3840×2160×3 = 24.9 MB 1920×1080×3 = 6.2 MB
memcpy time per clone 3-5ms 0.8-1.2ms
12 clones total size 299 MB 74 MB
12 clones total time 36-60ms 10-14ms

AI Preprocessing (per AI task)

Step Current ANSFrame
Receive image 4K BGR (24.9 MB) 1080p BGR (6.2 MB) + ANSFrame ref
Local clone in engine 24.9 MB memcpy (~3ms) 6.2 MB memcpy (~0.8ms)
CPU letterbox resize 4K→640×640 (~2-3ms) SKIP (use ANSFrame.inference, 0ms)
BGR→RGB 640×640 (~0.3ms) 640×640 (~0.3ms)
GPU upload 1.2 MB (~0.1ms) 1.2 MB (~0.1ms)
Total preprocess per task ~5-6ms ~1.2ms
12 tasks total ~60-72ms ~14ms

Pipeline Crop Quality (ALPR, Face Recognition)

Step Current ANSFrame
Detection from 4K image 1080p display (detection) → ANSFrame.fullRes (crop)
Crop source Same 4K image ANSFrame.fullRes (4K original)
Crop quality 4K 4K (identical)

Total Processing Time Per Frame Cycle

Phase Current (4K) ANSFrame Savings
GetImage generation 4-8ms 6-10ms -2ms
Clone × 12 36-60ms 10-14ms +22-46ms
AI preprocess × 12 60-72ms 14ms +46-58ms
TRT inference × 12 60-600ms 60-600ms Same
Postprocess × 12 12-60ms 12-60ms Same
Total CPU overhead 112-200ms 28-38ms ~80-160ms saved
Total with inference 172-800ms 88-638ms ~80-160ms saved

RAM Usage

Resource Current ANSFrame
GetImage output (per camera) 24.9 MB (4K BGR) 32.3 MB (3 images)
Clones in flight (12 tasks) 12 × 24.9 = 299 MB 12 × 6.2 = 74 MB
AI local clone (12 tasks) 12 × 24.9 = 299 MB 12 × 6.2 = 74 MB
ANSFrame shared data 0 32.3 MB (shared, refcounted)
Total RAM per frame cycle ~623 MB ~213 MB
Peak RAM (5 cams, 2-3 cycles) ~1.2-1.9 GB ~0.4-0.6 GB

GPU / VRAM Usage

Resource Current ANSFrame
VRAM (engines + workspace) ~5.9 GB ~5.9 GB (same)
GPU upload per inference 1.2 MB 1.2 MB (same)
PCIe bandwidth ~300 MB/s ~300 MB/s (same)
SM utilization 0-34% 0-34% (same)

CPU Usage

Component Current ANSFrame
SW decode (5 cameras) ~3-5 cores ~3-5 cores (same)
YUV→BGR generation ~0.3 cores ~0.42 cores (+0.12 for extra sizes)
CPU resize per AI task ~1.5 cores 0 cores (pre-computed)
Clone memcpy ~2.4 cores ~0.5 cores
Total CPU for pipeline ~7-9 cores ~4-6 cores
CPU savings ~3 cores freed

LabVIEW Thread Scheduling Impact

Factor Current ANSFrame
Data per task dispatch 24.9 MB 6.2 MB (4x less)
Memory allocation pressure 299 MB in flight 74 MB (4x less)
Cache efficiency Poor (24.9 MB > L3) Better (6.2 MB closer to L3)
Processing time (LabVIEW) 100-500ms 70-300ms (~30-40% faster)

Final Summary

Metric Current (8h stable) ANSFrame (projected) Improvement
Clone time (12 tasks) 36-60ms 10-14ms 3-4x faster
Preprocess (12 tasks) 60-72ms 14ms 4-5x faster
CPU overhead total 112-200ms 28-38ms 4-5x less
RAM usage (frames) ~1.2-1.9 GB ~0.4-0.6 GB 66% less
CPU cores for pipeline ~7-9 cores ~4-6 cores 3 cores freed
VRAM No change No change
Crop quality (ALPR/FR) 4K 4K Same
Processing time 100-500ms 70-300ms ~30-40% faster
Code complexity Simple Medium +~500 lines

Overview

Replace the single-resolution cv::Mat** output from GetImage with a multi-resolution ANSFrame that contains 3 pre-computed images generated from the YUV420P decoded frame. This eliminates redundant resizing across AI tasks and reduces clone/memcpy overhead by 20x.

Current Flow (Inefficient)

Decoder → YUV420P → cvtColor → 4K BGR (25 MB) → GetImage returns 4K
LabVIEW: clone 4K (25 MB × 12 tasks = 300 MB memcpy)
Each AI task: CPU resize 4K → 640×640 (redundant × 12 tasks)

New Flow (Optimized)

Decoder → YUV420P → generate 3 images from YUV planes:
  ├─ Full resolution BGR (for crop/pipeline)
  ├─ 640×640 letterbox BGR (for detection inference)
  └─ 1080p BGR (for display, configurable)

GetImage returns ANSFrame (contains refs to all 3 images)
LabVIEW: clone 1080p only (6.2 MB × 12 = 74 MB, was 300 MB)
AI task: uses 640×640 directly (0.1 MB clone, no resize needed)
Pipeline crop: uses full resolution image (no upscaling artifacts)

Performance Comparison

4K YUV420P frame (12.4 MB in 3 planes)
  │
  ├─ Full res: cvtColor(Y+U+V → BGR)                    ~4-8ms
  │  Result: 3840×2160 BGR (24.9 MB)
  │
  ├─ 640×640: resize Y(640×360) + U,V(320×180)           ~0.5ms
  │            + pad bottom + cvtColor                    ~0.3ms
  │  Result: 640×640 BGR (1.2 MB)                   Total: ~0.8ms
  │
  └─ 1080p: resize Y(1920×1080) + U,V(960×540)           ~1.0ms
             + cvtColor                                   ~0.5ms
     Result: 1920×1080 BGR (6.2 MB)                 Total: ~1.5ms

Total generation time: ~6-10ms (all 3 images)

Resize from BGR (Option B — slower)

4K YUV420P → cvtColor → 4K BGR (24.9 MB)                 ~4-8ms
  │
  ├─ Full res: already done                               ~0ms
  ├─ 640×640: cv::resize(4K BGR → 640×640) + letterbox    ~2-3ms
  └─ 1080p: cv::resize(4K BGR → 1080p)                   ~1-2ms

Total generation time: ~7-13ms (all 3 images)

Recommendation: Option A (resize YUV planes)

Option A is ~30% faster because YUV420P is 1.5 bytes/pixel vs BGR 3 bytes/pixel — half the data to resize. The YUV plane resize produces identical quality because cvtColor is applied after resize (same as how GPU NV12 resize works).

ANSFrame Structure

Header: include/ANSFrame.h

#pragma once
#include <opencv2/core/mat.hpp>
#include <atomic>
#include <cstdint>

// ANSFrame holds pre-computed multi-resolution images from a single decoded frame.
// Generated once in avframeYUV420PToCvMat, shared across all AI tasks via registry.
// Eliminates per-task resize and reduces clone size by 20x.
struct ANSFrame {
    // --- Pre-computed images (all BGR, CPU RAM) ---
    cv::Mat fullRes;        // Original resolution (e.g., 3840×2160) — for crop/pipeline
    cv::Mat inference;      // Model input size (e.g., 640×640 letterbox) — for detection
    cv::Mat display;        // Display resolution (e.g., 1920×1080) — for LabVIEW UI

    // --- Metadata ---
    int originalWidth = 0;  // Original frame width before any resize
    int originalHeight = 0; // Original frame height before any resize
    int inferenceWidth = 0; // Inference image width (e.g., 640)
    int inferenceHeight = 0;// Inference image height (e.g., 640)
    float letterboxRatio = 1.0f; // Scale ratio used for letterbox (for coordinate mapping)
    int64_t pts = 0;        // Presentation timestamp

    // --- Configuration (set per camera) ---
    int displayMaxHeight = 1080;  // Configurable display resolution
    int inferenceSize = 640;      // Configurable inference size (default 640)

    // --- Lifecycle ---
    std::atomic<int> refcount{1};
};

Changes to ANSFrame Registry

The existing ANSGpuFrameRegistry can be extended or a new ANSFrameRegistry created to map cv::Mat* (the display image pointer) to its parent ANSFrame. When LabVIEW clones the display image and sends it to AI, the AI task can look up the parent ANSFrame to access the inference or full-res image.

// Registry: maps display cv::Mat* → ANSFrame*
class ANSFrameRegistry {
    std::unordered_map<const uchar*, ANSFrame*> m_map;  // key = Mat.data pointer
    std::mutex m_mutex;
public:
    void attach(cv::Mat* displayMat, ANSFrame* frame);
    ANSFrame* lookup(const cv::Mat& mat);  // lookup by data pointer
    void release(cv::Mat* mat);
};

Implementation Steps

Step 1: Create ANSFrame structure and registry

Files to create:

  • include/ANSFrame.h — ANSFrame struct definition
  • modules/ANSCV/ANSFrameRegistry.h — Registry mapping display Mat → ANSFrame
  • modules/ANSCV/ANSFrameRegistry.cpp — Registry implementation

Key design decisions:

  • ANSFrame is allocated per decoded frame, shared across all clones
  • refcount tracks how many clones reference this frame
  • When refcount → 0, all 3 images are freed

Step 2: Generate multi-resolution images in avframeYUV420PToCvMat

File to modify: MediaClient/media/video_player.cpp

Replace current avframeYUV420PToCvMat which returns single BGR with new version that populates ANSFrame with 3 images.

ANSFrame* CVideoPlayer::generateANSFrame(const AVFrame* frame) {
    auto* ansFrame = new ANSFrame();
    const int W = frame->width;
    const int H = frame->height;
    ansFrame->originalWidth = W;
    ansFrame->originalHeight = H;

    // --- Resize YUV planes for each resolution ---

    // 1. Full resolution: direct cvtColor (no resize)
    cv::Mat yuv(H * 3/2, W, CV_8UC1);
    // ... copy planes ...
    cv::cvtColor(yuv, ansFrame->fullRes, cv::COLOR_YUV2BGR_I420);

    // 2. Inference size (640×640 letterbox from YUV planes)
    int infSize = ansFrame->inferenceSize;  // default 640
    float r = std::min((float)infSize / W, (float)infSize / H);
    int unpadW = (int)(r * W), unpadH = (int)(r * H);
    ansFrame->letterboxRatio = 1.0f / r;

    // Resize Y plane
    cv::Mat yFull(H, W, CV_8UC1, frame->data[0], frame->linesize[0]);
    cv::Mat yResized;
    cv::resize(yFull, yResized, cv::Size(unpadW, unpadH));

    // Resize U, V planes
    cv::Mat uFull(H/2, W/2, CV_8UC1, frame->data[1], frame->linesize[1]);
    cv::Mat vFull(H/2, W/2, CV_8UC1, frame->data[2], frame->linesize[2]);
    cv::Mat uResized, vResized;
    cv::resize(uFull, uResized, cv::Size(unpadW/2, unpadH/2));
    cv::resize(vFull, vResized, cv::Size(unpadW/2, unpadH/2));

    // Assemble padded I420 buffer
    cv::Mat yuvInf(infSize * 3/2, infSize, CV_8UC1, cv::Scalar(114)); // gray padding
    yResized.copyTo(yuvInf(cv::Rect(0, 0, unpadW, unpadH)));
    // ... copy U, V with padding ...

    cv::cvtColor(yuvInf, ansFrame->inference, cv::COLOR_YUV2BGR_I420);
    ansFrame->inferenceWidth = infSize;
    ansFrame->inferenceHeight = infSize;

    // 3. Display resolution (1080p from YUV planes)
    int dispH = ansFrame->displayMaxHeight;
    float dispScale = (float)dispH / H;
    int dispW = (int)(W * dispScale);

    cv::Mat yDisp, uDisp, vDisp;
    cv::resize(yFull, yDisp, cv::Size(dispW, dispH));
    cv::resize(uFull, uDisp, cv::Size(dispW/2, dispH/2));
    cv::resize(vFull, vDisp, cv::Size(dispW/2, dispH/2));

    cv::Mat yuvDisp(dispH * 3/2, dispW, CV_8UC1);
    // ... assemble I420 ...
    cv::cvtColor(yuvDisp, ansFrame->display, cv::COLOR_YUV2BGR_I420);

    return ansFrame;
}

Step 3: Modify GetImage to return display image + attach ANSFrame

File to modify: modules/ANSCV/ANSRTSP.cpp (and other ANSCV classes)

cv::Mat ANSRTSPClient::GetImage(int& width, int& height, int64_t& pts) {
    // ... existing logic to get frame from player ...

    // GetImage returns the DISPLAY image (1080p)
    // ANSFrame is attached to the Mat via registry
    ANSFrame* frame = _currentANSFrame;
    width = frame->display.cols;
    height = frame->display.rows;
    pts = frame->pts;

    // Register: display Mat's data pointer → ANSFrame
    ANSFrameRegistry::instance().attach(&frame->display, frame);

    return frame->display;  // 1080p, ~6.2 MB (was 25 MB for 4K)
}

File to modify: modules/ANSCV/ANSOpenCV.cpp

int ANSCV_CloneImage_S(cv::Mat** imageIn, cv::Mat** imageOut) {
    *imageOut = anscv_mat_new(**imageIn);  // clone display image (6.2 MB, was 25 MB)

    // Link clone to same ANSFrame (refcount++)
    ANSFrame* frame = ANSFrameRegistry::instance().lookup(**imageIn);
    if (frame) {
        frame->refcount++;
        ANSFrameRegistry::instance().attach(*imageOut, frame);
    }

    return 1;
}

Step 5: Modify engine Preprocess to use ANSFrame inference image

Files to modify: All engine Preprocess functions

// In ANSRTYOLO::DetectObjects (and all other engines):
std::vector<std::vector<cv::cuda::GpuMat>> ANSRTYOLO::Preprocess(
    const cv::Mat& inputImage, ImageMetadata& outMeta) {

    // Try to get pre-resized inference image from ANSFrame
    ANSFrame* frame = ANSFrameRegistry::instance().lookup(inputImage);

    cv::Mat srcForInference;
    if (frame && !frame->inference.empty() &&
        inputImage.cols <= frame->inferenceWidth) {
        // Use pre-computed 640×640 — ZERO resize needed
        srcForInference = frame->inference;
        outMeta.imgHeight = frame->originalHeight;
        outMeta.imgWidth = frame->originalWidth;
        outMeta.ratio = frame->letterboxRatio;
    } else if (frame && !frame->fullRes.empty() &&
               inputImage.cols > frame->inferenceWidth) {
        // Need larger than inference size — use full resolution
        srcForInference = frame->fullRes;
        // ... resize to model input from full res ...
    } else {
        // Fallback: use input image directly (backward compat)
        srcForInference = inputImage;
    }

    // Convert BGR → RGB
    cv::Mat cpuRGB;
    cv::cvtColor(srcForInference, cpuRGB, cv::COLOR_BGR2RGB);

    // Upload small image to GPU
    cv::cuda::GpuMat gpuResized;
    gpuResized.upload(cpuRGB, stream);
    // ...
}

Step 6: Pipeline crop uses full resolution

// In ANSLPR or any pipeline that crops detected objects:
// Instead of cropping from display image (1080p, upscaling artifacts):
ANSFrame* frame = ANSFrameRegistry::instance().lookup(inputImage);
cv::Mat cropSource = (frame && !frame->fullRes.empty())
    ? frame->fullRes     // Full 4K quality for face/plate recognition
    : inputImage;        // Fallback

// Scale bbox from display coords to full-res coords
float scaleX = (float)cropSource.cols / displayImage.cols;
float scaleY = (float)cropSource.rows / displayImage.rows;
cv::Rect fullResBbox(bbox.x * scaleX, bbox.y * scaleY,
                     bbox.width * scaleX, bbox.height * scaleY);
cv::Mat crop = cropSource(fullResBbox).clone();

Step 7: Configuration API

// Set inference size (default 640) — before StartRTSP
void SetRTSPInferenceSize(ANSRTSPClient** Handle, int size);  // 640, 320, 1280

// Set display resolution (default 1080) — before StartRTSP
void SetRTSPDisplayResolution(ANSRTSPClient** Handle, int width, int height);

// Check if ANSFrame is available for a cloned image
int HasANSFrame(cv::Mat** image);  // returns 1 if ANSFrame attached

// Get specific resolution from ANSFrame
int GetANSFrameInference(cv::Mat** displayImage, cv::Mat** inferenceImage);
int GetANSFrameFullRes(cv::Mat** displayImage, cv::Mat** fullResImage);

Memory & Performance Impact

Per-Frame Memory

Image Resolution Size Before (single 4K)
Full resolution 3840×2160 24.9 MB 24.9 MB
Inference 640×640 1.2 MB (generated per AI task)
Display 1920×1080 6.2 MB (was part of 4K)
Total per frame 32.3 MB 24.9 MB

+7.4 MB per frame for pre-computed images, BUT:

Clone Savings (12 AI tasks)

Before After
Clone size per task 24.9 MB 6.2 MB (display only)
12 clones total 299 MB 74 MB
Clone time 36-60ms 8-12ms
Resize per task 2-3ms × 12 = 24-36ms 0ms (pre-computed)
Total savings ~250 MB RAM, ~50ms CPU

Generation Time (one-time per frame)

Step Time
Full res: cvtColor YUV→BGR ~4-8ms
640×640: resize YUV planes + cvtColor ~0.8ms
1080p: resize YUV planes + cvtColor ~1.5ms
Total ~6-10ms

vs Current: cvtColor 4K = ~4-8ms + resize per task = ~2-3ms × 12 = ~28-44ms total

Net savings: ~20-35ms per frame cycle across all tasks.

Files to Create/Modify

New files:

  1. include/ANSFrame.h — ANSFrame struct
  2. modules/ANSCV/ANSFrameRegistry.h — Registry header
  3. modules/ANSCV/ANSFrameRegistry.cpp — Registry implementation

Modified files:

  1. MediaClient/media/video_player.h — Add generateANSFrame declaration
  2. MediaClient/media/video_player.cpp — Implement generateANSFrame, modify getImage
  3. modules/ANSCV/ANSRTSP.h — Add ANSFrame member, SetInferenceSize
  4. modules/ANSCV/ANSRTSP.cpp — Modify GetImage to return display + attach ANSFrame
  5. modules/ANSCV/ANSOpenCV.cpp — Modify CloneImage_S to link to ANSFrame
  6. modules/ANSCV/ANSMatRegistry.h — Optional: integrate ANSFrame into mat registry
  7. modules/ANSODEngine/ANSRTYOLO.cpp — Use ANSFrame inference image
  8. modules/ANSODEngine/ANSTENSORTRTOD.cpp — Same
  9. modules/ANSODEngine/ANSTENSORRTPOSE.cpp — Same
  10. modules/ANSODEngine/ANSTENSORRTSEG.cpp — Same
  11. modules/ANSODEngine/ANSTENSORRTCL.cpp — Same
  12. modules/ANSODEngine/ANSYOLOV12RTOD.cpp — Same
  13. modules/ANSODEngine/ANSYOLOV10RTOD.cpp — Same
  14. modules/ANSODEngine/SCRFDFaceDetector.cpp — Same
  15. modules/ANSODEngine/dllmain.cpp — Set tl_currentANSFrame for pipeline lookup
  16. modules/ANSLPR/ANSLPR_OD.cpp — Use fullRes for plate crop
  17. modules/ANSFR/ARCFaceRT.cpp — Use fullRes for face crop
  18. modules/ANSFR/ANSFaceRecognizer.cpp — Use fullRes for face crop

Apply to other ANSCV classes:

  1. modules/ANSCV/ANSFLV.h/.cpp — Same pattern as ANSRTSP
  2. modules/ANSCV/ANSMJPEG.h/.cpp — Same
  3. modules/ANSCV/ANSRTMP.h/.cpp — Same
  4. modules/ANSCV/ANSSRT.h/.cpp — Same

Clone-to-ANSFrame Mapping (Critical Design)

The Problem

LabVIEW calls ANSCV_CloneImage_S to create a deep copy of the 1080p display image. The clone has a different data pointer than the original — so a simple pointer lookup won't find the ANSFrame.

GetImage returns display Mat:   data = 0xAAAA → registry: 0xAAAA → ANSFrame #1
CloneImage creates deep copy:   data = 0xBBBB → registry: ??? (not registered)
AI task tries lookup(0xBBBB):   NOT FOUND — fallback to slow path

The Solution

Register the clone's data pointer to the same ANSFrame during ANSCV_CloneImage_S:

GetImage:     data = 0xAAAA → registry: 0xAAAA → ANSFrame #1 (refcount=1)
CloneImage:   data = 0xBBBB → registry: 0xBBBB → ANSFrame #1 (refcount=2)
CloneImage:   data = 0xCCCC → registry: 0xCCCC → ANSFrame #1 (refcount=3)
ReleaseImage: remove 0xBBBB → ANSFrame #1 (refcount=2)
ReleaseImage: remove 0xCCCC → ANSFrame #1 (refcount=1)
Next GetImage: remove 0xAAAA → ANSFrame #1 (refcount=0) → FREE all 3 images

Implementation in ANSCV_CloneImage_S (ANSOpenCV.cpp)

int ANSCV_CloneImage_S(cv::Mat** imageIn, cv::Mat** imageOut) {
    *imageOut = anscv_mat_new(**imageIn);          // deep copy display (6.2 MB)
    gpu_frame_addref(*imageIn, *imageOut);          // existing: link GpuFrameData
    ANSFrameRegistry::instance().addRef(*imageIn, *imageOut);  // NEW: link ANSFrame
    return 1;
}

Implementation in ANSCV_ReleaseImage_S (ANSOpenCV.cpp)

int ANSCV_ReleaseImage_S(cv::Mat** imageIn) {
    ANSFrameRegistry::instance().release(*imageIn);  // NEW: refcount--, free if 0
    anscv_mat_delete(imageIn);                        // existing: free Mat
    return 1;
}

Implementation in ANSFrameRegistry

class ANSFrameRegistry {
    std::unordered_map<const uchar*, ANSFrame*> m_map;  // Mat.data → ANSFrame
    std::mutex m_mutex;
public:
    // Register original display Mat → ANSFrame
    void attach(const cv::Mat* mat, ANSFrame* frame) {
        std::lock_guard<std::mutex> lock(m_mutex);
        // Remove old mapping if exists
        auto it = m_map.find(mat->data);
        if (it != m_map.end() && it->second != frame) {
            if (--it->second->refcount <= 0) delete it->second;
        }
        m_map[mat->data] = frame;
    }

    // Link clone to same ANSFrame (called from CloneImage)
    void addRef(const cv::Mat* src, const cv::Mat* dst) {
        std::lock_guard<std::mutex> lock(m_mutex);
        auto it = m_map.find(src->data);
        if (it == m_map.end()) return;
        ANSFrame* frame = it->second;
        frame->refcount++;
        m_map[dst->data] = frame;
    }

    // Lookup by any Mat (original or clone)
    ANSFrame* lookup(const cv::Mat& mat) {
        std::lock_guard<std::mutex> lock(m_mutex);
        auto it = m_map.find(mat.data);
        return (it != m_map.end()) ? it->second : nullptr;
    }

    // Release mapping (called from ReleaseImage)
    void release(const cv::Mat* mat) {
        std::lock_guard<std::mutex> lock(m_mutex);
        auto it = m_map.find(mat->data);
        if (it == m_map.end()) return;
        ANSFrame* frame = it->second;
        m_map.erase(it);
        if (--frame->refcount <= 0) delete frame;
    }
};

Thread Safety

  • Registry uses std::mutex — same pattern as ANSGpuFrameRegistry
  • ANSFrame images (fullRes, inference, display) are immutable after creation — safe to read from any thread
  • Only refcount is modified concurrently — uses std::atomic<int>
  • ANSFrame is freed only when refcount reaches 0 (all clones released)

Lifecycle Diagram

Camera Thread:                    AI Task 1:              AI Task 2:

generateANSFrame()
  → fullRes, inference, display
  → refcount = 1
  → registry: display.data → AF

GetImage returns display

CloneImage(display, &clone1) ─────► clone1
  → registry: clone1.data → AF
  → refcount = 2

CloneImage(display, &clone2) ──────────────────────────► clone2
  → registry: clone2.data → AF
  → refcount = 3

                                  lookup(clone1) → AF
                                  use AF->inference
                                  use AF->fullRes (crop)
                                  ReleaseImage(clone1)
                                  → refcount = 2

                                                          lookup(clone2) → AF
                                                          use AF->inference
                                                          ReleaseImage(clone2)
                                                          → refcount = 1

Next GetImage:
  → new ANSFrame
  → old display.data removed
  → refcount = 0 → FREE old AF

Leak Prevention (Critical)

Leak Scenarios

Scenario What leaks Size per leak
LabVIEW forgets to call ReleaseImage ANSFrame (fullRes + inference + display) ~32 MB
Camera reconnect while clones exist Old ANSFrame stays alive until clones released ~32 MB
LabVIEW crash/abort All ANSFrames in registry ~32 MB × N frames
AI task throws exception, skips Release ANSFrame refcount never reaches 0 ~32 MB

Protection 1: TTL-Based Auto-Eviction

Same pattern as ANSGpuFrameRegistry::evictStaleFrames() — periodically scan for old ANSFrames and force-free them.

class ANSFrameRegistry {
    static constexpr int FRAME_TTL_SECONDS = 5;  // Max lifetime of any ANSFrame
    static constexpr int EVICT_INTERVAL_MS = 1000;  // Check every 1 second

    struct Entry {
        ANSFrame* frame;
        std::chrono::steady_clock::time_point createdAt;
    };

    void evictStale() {
        auto now = std::chrono::steady_clock::now();
        // Throttle: only run every EVICT_INTERVAL_MS
        if (now - m_lastEvict < std::chrono::milliseconds(EVICT_INTERVAL_MS)) return;
        m_lastEvict = now;

        std::lock_guard<std::mutex> lock(m_mutex);
        for (auto it = m_frames.begin(); it != m_frames.end(); ) {
            double ageSec = std::chrono::duration<double>(now - it->createdAt).count();
            if (ageSec > FRAME_TTL_SECONDS) {
                // Force-free: remove all Mat* mappings to this frame
                ANSFrame* frame = it->frame;
                for (auto mit = m_map.begin(); mit != m_map.end(); ) {
                    if (mit->second == frame) mit = m_map.erase(mit);
                    else ++mit;
                }
                delete frame;
                it = m_frames.erase(it);
            } else {
                ++it;
            }
        }
    }
};

Call evictStale() from GetImage() (piggybacked on camera thread activity — same as gpu_frame_evict_stale()).

Protection 2: Max ANSFrame Pool Size

Limit total number of live ANSFrames. If pool is full, force-free the oldest before creating a new one.

static constexpr int MAX_ANSFRAMES = 100;  // Max live frames across all cameras

ANSFrame* createANSFrame(...) {
    evictStale();  // Clean up expired frames first

    // If still over limit, force-free oldest
    while (m_frames.size() >= MAX_ANSFRAMES) {
        auto oldest = m_frames.begin();
        // ... force-remove all mappings + delete ...
    }

    auto* frame = new ANSFrame();
    // ... populate ...
    m_frames.push_back({frame, std::chrono::steady_clock::now()});
    return frame;
}

Protection 3: Camera-Scoped Cleanup

When a camera is stopped or destroyed, force-free ALL ANSFrames belonging to that camera (regardless of refcount).

// In ANSRTSPClient::Stop() and Destroy():
ANSFrameRegistry::instance().releaseByOwner(this);

// In ANSFrameRegistry:
void releaseByOwner(void* owner) {
    std::lock_guard<std::mutex> lock(m_mutex);
    for (auto it = m_frames.begin(); it != m_frames.end(); ) {
        if (it->frame->owner == owner) {
            // Remove all Mat* mappings
            for (auto mit = m_map.begin(); mit != m_map.end(); ) {
                if (mit->second == it->frame) mit = m_map.erase(mit);
                else ++mit;
            }
            delete it->frame;
            it = m_frames.erase(it);
        } else {
            ++it;
        }
    }
}

Protection 4: One ANSFrame Per Camera (Ring Buffer)

Each camera keeps only the latest ANSFrame. When a new frame arrives, the previous ANSFrame is marked for cleanup (refcount decremented). This bounds memory to 1 ANSFrame per camera.

class ANSRTSPClient {
    ANSFrame* _currentANSFrame = nullptr;

    void onNewFrame(AVFrame* decoded) {
        ANSFrame* newFrame = generateANSFrame(decoded);
        newFrame->owner = this;

        // Replace old frame — decrement refcount
        if (_currentANSFrame) {
            ANSFrameRegistry::instance().detachOwner(_currentANSFrame);
            // If refcount reaches 0, freed immediately
            // If clones still hold refs, freed when they release
        }
        _currentANSFrame = newFrame;
        ANSFrameRegistry::instance().attach(&newFrame->display, newFrame);
    }
};

Protection 5: ANSFrame Struct with Owner Tracking

struct ANSFrame {
    // ... existing fields ...

    // Leak protection
    void* owner = nullptr;          // Camera that created this frame
    std::chrono::steady_clock::time_point createdAt;
    std::atomic<int> refcount{1};

    ~ANSFrame() {
        // Images are cv::Mat — automatically freed by OpenCV refcount
        // No manual cleanup needed for fullRes, inference, display
    }
};

Memory Budget Analysis

With all protections:

Cameras Max ANSFrames Memory (worst case)
5 running 5 current + ~10 in-flight clones 5 × 32 MB = 160 MB
20 running 20 current + ~40 in-flight clones 20 × 32 MB = 640 MB
100 created, 5 running 5 current + ~10 in-flight 5 × 32 MB = 160 MB
100 created, 95 stopped 0 (stopped cameras free ANSFrame) 0 MB

Worst case bounded by: running_cameras × 32 MB — predictable, no growth over time.

TTL Guarantee

Even if ALL protections fail, the 5-second TTL eviction ensures:

  • Maximum leak duration: 5 seconds
  • Maximum leaked memory: cameras × 5 seconds × 10 FPS × 32 MB / frame — but with ring buffer (1 per camera), it's just cameras × 32 MB
  • Periodic cleanup on every GetImage call ensures no accumulation

Replacing GpuFrameRegistry

Current State (wasteful with NV12 disabled)

With _useNV12FastPath = false (current default), GpuFrameRegistry is never populated — no gpu_frame_attach is called. But gpu_frame_addref, gpu_frame_remove, and gpu_frame_evict_stale still run on every clone/release/replace — doing empty lookups that waste CPU cycles.

Current code paths that run but do nothing:
  ANSCV_CloneImage_S   → gpu_frame_addref → lookup → NOT FOUND → no-op
  ANSCV_ReleaseImage_S → gpu_frame_remove → lookup → NOT FOUND → no-op
  anscv_mat_replace    → gpu_frame_remove → lookup → NOT FOUND → no-op
  anscv_mat_replace    → gpu_frame_evict_stale → scans empty registry → no-op

Plan: ANSFrameRegistry replaces GpuFrameRegistry

ANSFrameRegistry serves the same purpose (mapping cv::Mat* → frame metadata) but without GPU complexity:

Feature GpuFrameRegistry ANSFrameRegistry
Maps Mat* to GpuFrameData (NV12 GPU pointers) ANSFrame (3 CPU images)
Used when NV12 fast path enabled Always (SW or HW decode)
GPU dependency CUDA, pool slots, D2D copy None
Thread safety mutex + atomic refcount mutex + atomic refcount
Cleanup TTL eviction + pool cooldown TTL eviction (simpler)

Migration Path

  1. Phase 1 (implement ANSFrame): ANSFrameRegistry runs alongside GpuFrameRegistry

    • CloneImage: calls both gpu_frame_addref + ansframe_addref
    • ReleaseImage: calls both gpu_frame_remove + ansframe_release
    • Safe: both registries handle NOT FOUND gracefully
  2. Phase 2 (NV12 disabled permanently): Remove GpuFrameRegistry calls

    • Remove gpu_frame_addref from CloneImage
    • Remove gpu_frame_remove from ReleaseImage and anscv_mat_replace
    • Remove gpu_frame_evict_stale from anscv_mat_replace
    • Keep GpuFrameRegistry code for future NV12 re-enablement
  3. Phase 3 (optional, if NV12 re-enabled): Merge into single registry

    • ANSFrame struct gains optional GPU fields (yPlane, uvPlane, poolSlot)
    • Single registry, single refcount, single lookup

Backward Compatibility

  • If ANSFrame is not available (e.g., old camera module), engines fall back to current behavior (resize input image)
  • The cv::Mat** API stays the same — LabVIEW doesn't need changes
  • ANSFrame is transparent to LabVIEW — it only sees the display image
  • The GetANSFrameInference / GetANSFrameFullRes APIs are optional for advanced use

Risk Assessment

Risk Mitigation
Extra 7.4 MB RAM per frame Negligible vs 250 MB clone savings
ANSFrame lifecycle (refcount) Same pattern as GpuFrameData — proven
Coordinate mapping errors letterboxRatio stored in ANSFrame — deterministic
YUV plane resize quality Same as GPU NV12 resize — proven equivalent
Thread safety ANSFrame is immutable after creation — safe to share