Files
ANSCORE/ANSFrame_Multi_Resolution_Plan.md

865 lines
32 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ANSFrame Multi-Resolution Architecture Plan
## Detailed Comparison: Current vs ANSFrame
### Test Configuration
- GPU: RTX 4070 Laptop (8 GB VRAM)
- Cameras: 5 running (3840×2160, 2880×1620, 1920×1080)
- AI Tasks: 12 (subscribing to cameras)
- Engines: 7 TRT engines
- Decode: Software (YUV420P, CPU)
- Frame rate: SetTargetFPS(100) = ~10 FPS per camera
- Baseline: 8.4 hour stable run (ANSLEGION42), 1,048,325 inferences
### Per-Frame Generation (inside GetImage)
| Step | Current | ANSFrame |
|---|---|---|
| Decode YUV420P | ~5ms (CPU) | ~5ms (same) |
| Full res BGR (cvtColor) | ~4-8ms (4K YUV→BGR) | ~4-8ms (same) |
| 640×640 letterbox | Not done here | ~0.8ms (resize YUV planes + cvtColor, done ONCE) |
| 1080p display | Not done (returns 4K) | ~1.5ms (resize YUV planes + cvtColor, done ONCE) |
| **Total GetImage time** | **~4-8ms** | **~6-10ms** (+2ms for 2 extra sizes) |
### Clone & Dispatch to AI (per clone × 12 AI tasks)
| Step | Current (4K clone) | ANSFrame (1080p clone) |
|---|---|---|
| Clone image size | 3840×2160×3 = **24.9 MB** | 1920×1080×3 = **6.2 MB** |
| memcpy time per clone | **3-5ms** | **0.8-1.2ms** |
| 12 clones total size | **299 MB** | **74 MB** |
| 12 clones total time | **36-60ms** | **10-14ms** |
### AI Preprocessing (per AI task)
| Step | Current | ANSFrame |
|---|---|---|
| Receive image | 4K BGR (24.9 MB) | 1080p BGR (6.2 MB) + ANSFrame ref |
| Local clone in engine | 24.9 MB memcpy (~3ms) | 6.2 MB memcpy (~0.8ms) |
| CPU letterbox resize | 4K→640×640 (~2-3ms) | **SKIP** (use ANSFrame.inference, 0ms) |
| BGR→RGB | 640×640 (~0.3ms) | 640×640 (~0.3ms) |
| GPU upload | 1.2 MB (~0.1ms) | 1.2 MB (~0.1ms) |
| **Total preprocess per task** | **~5-6ms** | **~1.2ms** |
| **12 tasks total** | **~60-72ms** | **~14ms** |
### Pipeline Crop Quality (ALPR, Face Recognition)
| Step | Current | ANSFrame |
|---|---|---|
| Detection from | 4K image | 1080p display (detection) → ANSFrame.fullRes (crop) |
| Crop source | Same 4K image | ANSFrame.fullRes (4K original) |
| **Crop quality** | **4K** | **4K** (identical) |
### Total Processing Time Per Frame Cycle
| Phase | Current (4K) | ANSFrame | Savings |
|---|---|---|---|
| GetImage generation | 4-8ms | 6-10ms | -2ms |
| Clone × 12 | 36-60ms | 10-14ms | **+22-46ms** |
| AI preprocess × 12 | 60-72ms | 14ms | **+46-58ms** |
| TRT inference × 12 | 60-600ms | 60-600ms | Same |
| Postprocess × 12 | 12-60ms | 12-60ms | Same |
| **Total CPU overhead** | **112-200ms** | **28-38ms** | **~80-160ms saved** |
| **Total with inference** | **172-800ms** | **88-638ms** | **~80-160ms saved** |
### RAM Usage
| Resource | Current | ANSFrame |
|---|---|---|
| GetImage output (per camera) | 24.9 MB (4K BGR) | 32.3 MB (3 images) |
| Clones in flight (12 tasks) | 12 × 24.9 = **299 MB** | 12 × 6.2 = **74 MB** |
| AI local clone (12 tasks) | 12 × 24.9 = **299 MB** | 12 × 6.2 = **74 MB** |
| ANSFrame shared data | 0 | 32.3 MB (shared, refcounted) |
| **Total RAM per frame cycle** | **~623 MB** | **~213 MB** |
| **Peak RAM (5 cams, 2-3 cycles)** | **~1.2-1.9 GB** | **~0.4-0.6 GB** |
### GPU / VRAM Usage
| Resource | Current | ANSFrame |
|---|---|---|
| VRAM (engines + workspace) | ~5.9 GB | ~5.9 GB (same) |
| GPU upload per inference | 1.2 MB | 1.2 MB (same) |
| PCIe bandwidth | ~300 MB/s | ~300 MB/s (same) |
| SM utilization | 0-34% | 0-34% (same) |
### CPU Usage
| Component | Current | ANSFrame |
|---|---|---|
| SW decode (5 cameras) | ~3-5 cores | ~3-5 cores (same) |
| YUV→BGR generation | ~0.3 cores | ~0.42 cores (+0.12 for extra sizes) |
| CPU resize per AI task | ~1.5 cores | **0 cores (pre-computed)** |
| Clone memcpy | ~2.4 cores | **~0.5 cores** |
| **Total CPU for pipeline** | **~7-9 cores** | **~4-6 cores** |
| **CPU savings** | — | **~3 cores freed** |
### LabVIEW Thread Scheduling Impact
| Factor | Current | ANSFrame |
|---|---|---|
| Data per task dispatch | 24.9 MB | **6.2 MB** (4x less) |
| Memory allocation pressure | 299 MB in flight | **74 MB** (4x less) |
| Cache efficiency | Poor (24.9 MB > L3) | **Better** (6.2 MB closer to L3) |
| **Processing time (LabVIEW)** | **100-500ms** | **70-300ms (~30-40% faster)** |
### Final Summary
| Metric | Current (8h stable) | ANSFrame (projected) | Improvement |
|---|---|---|---|
| Clone time (12 tasks) | 36-60ms | 10-14ms | **3-4x faster** |
| Preprocess (12 tasks) | 60-72ms | 14ms | **4-5x faster** |
| CPU overhead total | 112-200ms | 28-38ms | **4-5x less** |
| RAM usage (frames) | ~1.2-1.9 GB | ~0.4-0.6 GB | **66% less** |
| CPU cores for pipeline | ~7-9 cores | ~4-6 cores | **3 cores freed** |
| VRAM | No change | No change | — |
| Crop quality (ALPR/FR) | 4K | 4K | Same |
| Processing time | 100-500ms | 70-300ms | **~30-40% faster** |
| Code complexity | Simple | Medium | +~500 lines |
---
## Overview
Replace the single-resolution `cv::Mat**` output from `GetImage` with a multi-resolution `ANSFrame` that contains 3 pre-computed images generated from the YUV420P decoded frame. This eliminates redundant resizing across AI tasks and reduces clone/memcpy overhead by 20x.
## Current Flow (Inefficient)
```
Decoder → YUV420P → cvtColor → 4K BGR (25 MB) → GetImage returns 4K
LabVIEW: clone 4K (25 MB × 12 tasks = 300 MB memcpy)
Each AI task: CPU resize 4K → 640×640 (redundant × 12 tasks)
```
## New Flow (Optimized)
```
Decoder → YUV420P → generate 3 images from YUV planes:
├─ Full resolution BGR (for crop/pipeline)
├─ 640×640 letterbox BGR (for detection inference)
└─ 1080p BGR (for display, configurable)
GetImage returns ANSFrame (contains refs to all 3 images)
LabVIEW: clone 1080p only (6.2 MB × 12 = 74 MB, was 300 MB)
AI task: uses 640×640 directly (0.1 MB clone, no resize needed)
Pipeline crop: uses full resolution image (no upscaling artifacts)
```
## Performance Comparison
### Resize from YUV420P planes (Option A — RECOMMENDED)
```
4K YUV420P frame (12.4 MB in 3 planes)
├─ Full res: cvtColor(Y+U+V → BGR) ~4-8ms
│ Result: 3840×2160 BGR (24.9 MB)
├─ 640×640: resize Y(640×360) + U,V(320×180) ~0.5ms
│ + pad bottom + cvtColor ~0.3ms
│ Result: 640×640 BGR (1.2 MB) Total: ~0.8ms
└─ 1080p: resize Y(1920×1080) + U,V(960×540) ~1.0ms
+ cvtColor ~0.5ms
Result: 1920×1080 BGR (6.2 MB) Total: ~1.5ms
Total generation time: ~6-10ms (all 3 images)
```
### Resize from BGR (Option B — slower)
```
4K YUV420P → cvtColor → 4K BGR (24.9 MB) ~4-8ms
├─ Full res: already done ~0ms
├─ 640×640: cv::resize(4K BGR → 640×640) + letterbox ~2-3ms
└─ 1080p: cv::resize(4K BGR → 1080p) ~1-2ms
Total generation time: ~7-13ms (all 3 images)
```
### Recommendation: Option A (resize YUV planes)
Option A is ~30% faster because YUV420P is 1.5 bytes/pixel vs BGR 3 bytes/pixel — half the data to resize. The YUV plane resize produces identical quality because `cvtColor` is applied after resize (same as how GPU NV12 resize works).
## ANSFrame Structure
### Header: `include/ANSFrame.h`
```cpp
#pragma once
#include <opencv2/core/mat.hpp>
#include <atomic>
#include <cstdint>
// ANSFrame holds pre-computed multi-resolution images from a single decoded frame.
// Generated once in avframeYUV420PToCvMat, shared across all AI tasks via registry.
// Eliminates per-task resize and reduces clone size by 20x.
struct ANSFrame {
// --- Pre-computed images (all BGR, CPU RAM) ---
cv::Mat fullRes; // Original resolution (e.g., 3840×2160) — for crop/pipeline
cv::Mat inference; // Model input size (e.g., 640×640 letterbox) — for detection
cv::Mat display; // Display resolution (e.g., 1920×1080) — for LabVIEW UI
// --- Metadata ---
int originalWidth = 0; // Original frame width before any resize
int originalHeight = 0; // Original frame height before any resize
int inferenceWidth = 0; // Inference image width (e.g., 640)
int inferenceHeight = 0;// Inference image height (e.g., 640)
float letterboxRatio = 1.0f; // Scale ratio used for letterbox (for coordinate mapping)
int64_t pts = 0; // Presentation timestamp
// --- Configuration (set per camera) ---
int displayMaxHeight = 1080; // Configurable display resolution
int inferenceSize = 640; // Configurable inference size (default 640)
// --- Lifecycle ---
std::atomic<int> refcount{1};
};
```
### Changes to `ANSFrame` Registry
The existing `ANSGpuFrameRegistry` can be extended or a new `ANSFrameRegistry` created to map `cv::Mat*` (the display image pointer) to its parent `ANSFrame`. When LabVIEW clones the display image and sends it to AI, the AI task can look up the parent `ANSFrame` to access the inference or full-res image.
```cpp
// Registry: maps display cv::Mat* → ANSFrame*
class ANSFrameRegistry {
std::unordered_map<const uchar*, ANSFrame*> m_map; // key = Mat.data pointer
std::mutex m_mutex;
public:
void attach(cv::Mat* displayMat, ANSFrame* frame);
ANSFrame* lookup(const cv::Mat& mat); // lookup by data pointer
void release(cv::Mat* mat);
};
```
## Implementation Steps
### Step 1: Create ANSFrame structure and registry
**Files to create:**
- `include/ANSFrame.h` — ANSFrame struct definition
- `modules/ANSCV/ANSFrameRegistry.h` — Registry mapping display Mat → ANSFrame
- `modules/ANSCV/ANSFrameRegistry.cpp` — Registry implementation
**Key design decisions:**
- ANSFrame is allocated per decoded frame, shared across all clones
- refcount tracks how many clones reference this frame
- When refcount → 0, all 3 images are freed
### Step 2: Generate multi-resolution images in avframeYUV420PToCvMat
**File to modify:** `MediaClient/media/video_player.cpp`
Replace current `avframeYUV420PToCvMat` which returns single BGR with new version that populates ANSFrame with 3 images.
```cpp
ANSFrame* CVideoPlayer::generateANSFrame(const AVFrame* frame) {
auto* ansFrame = new ANSFrame();
const int W = frame->width;
const int H = frame->height;
ansFrame->originalWidth = W;
ansFrame->originalHeight = H;
// --- Resize YUV planes for each resolution ---
// 1. Full resolution: direct cvtColor (no resize)
cv::Mat yuv(H * 3/2, W, CV_8UC1);
// ... copy planes ...
cv::cvtColor(yuv, ansFrame->fullRes, cv::COLOR_YUV2BGR_I420);
// 2. Inference size (640×640 letterbox from YUV planes)
int infSize = ansFrame->inferenceSize; // default 640
float r = std::min((float)infSize / W, (float)infSize / H);
int unpadW = (int)(r * W), unpadH = (int)(r * H);
ansFrame->letterboxRatio = 1.0f / r;
// Resize Y plane
cv::Mat yFull(H, W, CV_8UC1, frame->data[0], frame->linesize[0]);
cv::Mat yResized;
cv::resize(yFull, yResized, cv::Size(unpadW, unpadH));
// Resize U, V planes
cv::Mat uFull(H/2, W/2, CV_8UC1, frame->data[1], frame->linesize[1]);
cv::Mat vFull(H/2, W/2, CV_8UC1, frame->data[2], frame->linesize[2]);
cv::Mat uResized, vResized;
cv::resize(uFull, uResized, cv::Size(unpadW/2, unpadH/2));
cv::resize(vFull, vResized, cv::Size(unpadW/2, unpadH/2));
// Assemble padded I420 buffer
cv::Mat yuvInf(infSize * 3/2, infSize, CV_8UC1, cv::Scalar(114)); // gray padding
yResized.copyTo(yuvInf(cv::Rect(0, 0, unpadW, unpadH)));
// ... copy U, V with padding ...
cv::cvtColor(yuvInf, ansFrame->inference, cv::COLOR_YUV2BGR_I420);
ansFrame->inferenceWidth = infSize;
ansFrame->inferenceHeight = infSize;
// 3. Display resolution (1080p from YUV planes)
int dispH = ansFrame->displayMaxHeight;
float dispScale = (float)dispH / H;
int dispW = (int)(W * dispScale);
cv::Mat yDisp, uDisp, vDisp;
cv::resize(yFull, yDisp, cv::Size(dispW, dispH));
cv::resize(uFull, uDisp, cv::Size(dispW/2, dispH/2));
cv::resize(vFull, vDisp, cv::Size(dispW/2, dispH/2));
cv::Mat yuvDisp(dispH * 3/2, dispW, CV_8UC1);
// ... assemble I420 ...
cv::cvtColor(yuvDisp, ansFrame->display, cv::COLOR_YUV2BGR_I420);
return ansFrame;
}
```
### Step 3: Modify GetImage to return display image + attach ANSFrame
**File to modify:** `modules/ANSCV/ANSRTSP.cpp` (and other ANSCV classes)
```cpp
cv::Mat ANSRTSPClient::GetImage(int& width, int& height, int64_t& pts) {
// ... existing logic to get frame from player ...
// GetImage returns the DISPLAY image (1080p)
// ANSFrame is attached to the Mat via registry
ANSFrame* frame = _currentANSFrame;
width = frame->display.cols;
height = frame->display.rows;
pts = frame->pts;
// Register: display Mat's data pointer → ANSFrame
ANSFrameRegistry::instance().attach(&frame->display, frame);
return frame->display; // 1080p, ~6.2 MB (was 25 MB for 4K)
}
```
### Step 4: Modify ANSCV_CloneImage_S to link clone to ANSFrame
**File to modify:** `modules/ANSCV/ANSOpenCV.cpp`
```cpp
int ANSCV_CloneImage_S(cv::Mat** imageIn, cv::Mat** imageOut) {
*imageOut = anscv_mat_new(**imageIn); // clone display image (6.2 MB, was 25 MB)
// Link clone to same ANSFrame (refcount++)
ANSFrame* frame = ANSFrameRegistry::instance().lookup(**imageIn);
if (frame) {
frame->refcount++;
ANSFrameRegistry::instance().attach(*imageOut, frame);
}
return 1;
}
```
### Step 5: Modify engine Preprocess to use ANSFrame inference image
**Files to modify:** All engine Preprocess functions
```cpp
// In ANSRTYOLO::DetectObjects (and all other engines):
std::vector<std::vector<cv::cuda::GpuMat>> ANSRTYOLO::Preprocess(
const cv::Mat& inputImage, ImageMetadata& outMeta) {
// Try to get pre-resized inference image from ANSFrame
ANSFrame* frame = ANSFrameRegistry::instance().lookup(inputImage);
cv::Mat srcForInference;
if (frame && !frame->inference.empty() &&
inputImage.cols <= frame->inferenceWidth) {
// Use pre-computed 640×640 — ZERO resize needed
srcForInference = frame->inference;
outMeta.imgHeight = frame->originalHeight;
outMeta.imgWidth = frame->originalWidth;
outMeta.ratio = frame->letterboxRatio;
} else if (frame && !frame->fullRes.empty() &&
inputImage.cols > frame->inferenceWidth) {
// Need larger than inference size — use full resolution
srcForInference = frame->fullRes;
// ... resize to model input from full res ...
} else {
// Fallback: use input image directly (backward compat)
srcForInference = inputImage;
}
// Convert BGR → RGB
cv::Mat cpuRGB;
cv::cvtColor(srcForInference, cpuRGB, cv::COLOR_BGR2RGB);
// Upload small image to GPU
cv::cuda::GpuMat gpuResized;
gpuResized.upload(cpuRGB, stream);
// ...
}
```
### Step 6: Pipeline crop uses full resolution
```cpp
// In ANSLPR or any pipeline that crops detected objects:
// Instead of cropping from display image (1080p, upscaling artifacts):
ANSFrame* frame = ANSFrameRegistry::instance().lookup(inputImage);
cv::Mat cropSource = (frame && !frame->fullRes.empty())
? frame->fullRes // Full 4K quality for face/plate recognition
: inputImage; // Fallback
// Scale bbox from display coords to full-res coords
float scaleX = (float)cropSource.cols / displayImage.cols;
float scaleY = (float)cropSource.rows / displayImage.rows;
cv::Rect fullResBbox(bbox.x * scaleX, bbox.y * scaleY,
bbox.width * scaleX, bbox.height * scaleY);
cv::Mat crop = cropSource(fullResBbox).clone();
```
### Step 7: Configuration API
```cpp
// Set inference size (default 640) — before StartRTSP
void SetRTSPInferenceSize(ANSRTSPClient** Handle, int size); // 640, 320, 1280
// Set display resolution (default 1080) — before StartRTSP
void SetRTSPDisplayResolution(ANSRTSPClient** Handle, int width, int height);
// Check if ANSFrame is available for a cloned image
int HasANSFrame(cv::Mat** image); // returns 1 if ANSFrame attached
// Get specific resolution from ANSFrame
int GetANSFrameInference(cv::Mat** displayImage, cv::Mat** inferenceImage);
int GetANSFrameFullRes(cv::Mat** displayImage, cv::Mat** fullResImage);
```
## Memory & Performance Impact
### Per-Frame Memory
| Image | Resolution | Size | Before (single 4K) |
|---|---|---|---|
| Full resolution | 3840×2160 | 24.9 MB | 24.9 MB |
| Inference | 640×640 | 1.2 MB | (generated per AI task) |
| Display | 1920×1080 | 6.2 MB | (was part of 4K) |
| **Total per frame** | | **32.3 MB** | **24.9 MB** |
+7.4 MB per frame for pre-computed images, BUT:
### Clone Savings (12 AI tasks)
| | Before | After |
|---|---|---|
| Clone size per task | 24.9 MB | 6.2 MB (display only) |
| 12 clones total | 299 MB | 74 MB |
| Clone time | 36-60ms | 8-12ms |
| Resize per task | 2-3ms × 12 = 24-36ms | 0ms (pre-computed) |
| **Total savings** | | **~250 MB RAM, ~50ms CPU** |
### Generation Time (one-time per frame)
| Step | Time |
|---|---|
| Full res: cvtColor YUV→BGR | ~4-8ms |
| 640×640: resize YUV planes + cvtColor | ~0.8ms |
| 1080p: resize YUV planes + cvtColor | ~1.5ms |
| **Total** | **~6-10ms** |
vs Current: cvtColor 4K = ~4-8ms + resize per task = ~2-3ms × 12 = ~28-44ms total
**Net savings: ~20-35ms per frame cycle across all tasks.**
## Files to Create/Modify
### New files:
1. `include/ANSFrame.h` — ANSFrame struct
2. `modules/ANSCV/ANSFrameRegistry.h` — Registry header
3. `modules/ANSCV/ANSFrameRegistry.cpp` — Registry implementation
### Modified files:
4. `MediaClient/media/video_player.h` — Add generateANSFrame declaration
5. `MediaClient/media/video_player.cpp` — Implement generateANSFrame, modify getImage
6. `modules/ANSCV/ANSRTSP.h` — Add ANSFrame member, SetInferenceSize
7. `modules/ANSCV/ANSRTSP.cpp` — Modify GetImage to return display + attach ANSFrame
8. `modules/ANSCV/ANSOpenCV.cpp` — Modify CloneImage_S to link to ANSFrame
9. `modules/ANSCV/ANSMatRegistry.h` — Optional: integrate ANSFrame into mat registry
10. `modules/ANSODEngine/ANSRTYOLO.cpp` — Use ANSFrame inference image
11. `modules/ANSODEngine/ANSTENSORTRTOD.cpp` — Same
12. `modules/ANSODEngine/ANSTENSORRTPOSE.cpp` — Same
13. `modules/ANSODEngine/ANSTENSORRTSEG.cpp` — Same
14. `modules/ANSODEngine/ANSTENSORRTCL.cpp` — Same
15. `modules/ANSODEngine/ANSYOLOV12RTOD.cpp` — Same
16. `modules/ANSODEngine/ANSYOLOV10RTOD.cpp` — Same
17. `modules/ANSODEngine/SCRFDFaceDetector.cpp` — Same
18. `modules/ANSODEngine/dllmain.cpp` — Set tl_currentANSFrame for pipeline lookup
19. `modules/ANSLPR/ANSLPR_OD.cpp` — Use fullRes for plate crop
20. `modules/ANSFR/ARCFaceRT.cpp` — Use fullRes for face crop
21. `modules/ANSFR/ANSFaceRecognizer.cpp` — Use fullRes for face crop
### Apply to other ANSCV classes:
22. `modules/ANSCV/ANSFLV.h/.cpp` — Same pattern as ANSRTSP
23. `modules/ANSCV/ANSMJPEG.h/.cpp` — Same
24. `modules/ANSCV/ANSRTMP.h/.cpp` — Same
25. `modules/ANSCV/ANSSRT.h/.cpp` — Same
## Clone-to-ANSFrame Mapping (Critical Design)
### The Problem
LabVIEW calls `ANSCV_CloneImage_S` to create a deep copy of the 1080p display image. The clone has a **different `data` pointer** than the original — so a simple pointer lookup won't find the ANSFrame.
```
GetImage returns display Mat: data = 0xAAAA → registry: 0xAAAA → ANSFrame #1
CloneImage creates deep copy: data = 0xBBBB → registry: ??? (not registered)
AI task tries lookup(0xBBBB): NOT FOUND — fallback to slow path
```
### The Solution
Register the clone's `data` pointer to the same ANSFrame during `ANSCV_CloneImage_S`:
```
GetImage: data = 0xAAAA → registry: 0xAAAA → ANSFrame #1 (refcount=1)
CloneImage: data = 0xBBBB → registry: 0xBBBB → ANSFrame #1 (refcount=2)
CloneImage: data = 0xCCCC → registry: 0xCCCC → ANSFrame #1 (refcount=3)
ReleaseImage: remove 0xBBBB → ANSFrame #1 (refcount=2)
ReleaseImage: remove 0xCCCC → ANSFrame #1 (refcount=1)
Next GetImage: remove 0xAAAA → ANSFrame #1 (refcount=0) → FREE all 3 images
```
### Implementation in ANSCV_CloneImage_S (ANSOpenCV.cpp)
```cpp
int ANSCV_CloneImage_S(cv::Mat** imageIn, cv::Mat** imageOut) {
*imageOut = anscv_mat_new(**imageIn); // deep copy display (6.2 MB)
gpu_frame_addref(*imageIn, *imageOut); // existing: link GpuFrameData
ANSFrameRegistry::instance().addRef(*imageIn, *imageOut); // NEW: link ANSFrame
return 1;
}
```
### Implementation in ANSCV_ReleaseImage_S (ANSOpenCV.cpp)
```cpp
int ANSCV_ReleaseImage_S(cv::Mat** imageIn) {
ANSFrameRegistry::instance().release(*imageIn); // NEW: refcount--, free if 0
anscv_mat_delete(imageIn); // existing: free Mat
return 1;
}
```
### Implementation in ANSFrameRegistry
```cpp
class ANSFrameRegistry {
std::unordered_map<const uchar*, ANSFrame*> m_map; // Mat.data → ANSFrame
std::mutex m_mutex;
public:
// Register original display Mat → ANSFrame
void attach(const cv::Mat* mat, ANSFrame* frame) {
std::lock_guard<std::mutex> lock(m_mutex);
// Remove old mapping if exists
auto it = m_map.find(mat->data);
if (it != m_map.end() && it->second != frame) {
if (--it->second->refcount <= 0) delete it->second;
}
m_map[mat->data] = frame;
}
// Link clone to same ANSFrame (called from CloneImage)
void addRef(const cv::Mat* src, const cv::Mat* dst) {
std::lock_guard<std::mutex> lock(m_mutex);
auto it = m_map.find(src->data);
if (it == m_map.end()) return;
ANSFrame* frame = it->second;
frame->refcount++;
m_map[dst->data] = frame;
}
// Lookup by any Mat (original or clone)
ANSFrame* lookup(const cv::Mat& mat) {
std::lock_guard<std::mutex> lock(m_mutex);
auto it = m_map.find(mat.data);
return (it != m_map.end()) ? it->second : nullptr;
}
// Release mapping (called from ReleaseImage)
void release(const cv::Mat* mat) {
std::lock_guard<std::mutex> lock(m_mutex);
auto it = m_map.find(mat->data);
if (it == m_map.end()) return;
ANSFrame* frame = it->second;
m_map.erase(it);
if (--frame->refcount <= 0) delete frame;
}
};
```
### Thread Safety
- Registry uses `std::mutex` — same pattern as `ANSGpuFrameRegistry`
- ANSFrame images (fullRes, inference, display) are **immutable after creation** — safe to read from any thread
- Only `refcount` is modified concurrently — uses `std::atomic<int>`
- ANSFrame is freed only when refcount reaches 0 (all clones released)
### Lifecycle Diagram
```
Camera Thread: AI Task 1: AI Task 2:
generateANSFrame()
→ fullRes, inference, display
→ refcount = 1
→ registry: display.data → AF
GetImage returns display
CloneImage(display, &clone1) ─────► clone1
→ registry: clone1.data → AF
→ refcount = 2
CloneImage(display, &clone2) ──────────────────────────► clone2
→ registry: clone2.data → AF
→ refcount = 3
lookup(clone1) → AF
use AF->inference
use AF->fullRes (crop)
ReleaseImage(clone1)
→ refcount = 2
lookup(clone2) → AF
use AF->inference
ReleaseImage(clone2)
→ refcount = 1
Next GetImage:
→ new ANSFrame
→ old display.data removed
→ refcount = 0 → FREE old AF
```
## Leak Prevention (Critical)
### Leak Scenarios
| Scenario | What leaks | Size per leak |
|---|---|---|
| LabVIEW forgets to call ReleaseImage | ANSFrame (fullRes + inference + display) | ~32 MB |
| Camera reconnect while clones exist | Old ANSFrame stays alive until clones released | ~32 MB |
| LabVIEW crash/abort | All ANSFrames in registry | ~32 MB × N frames |
| AI task throws exception, skips Release | ANSFrame refcount never reaches 0 | ~32 MB |
### Protection 1: TTL-Based Auto-Eviction
Same pattern as `ANSGpuFrameRegistry::evictStaleFrames()` — periodically scan for old ANSFrames and force-free them.
```cpp
class ANSFrameRegistry {
static constexpr int FRAME_TTL_SECONDS = 5; // Max lifetime of any ANSFrame
static constexpr int EVICT_INTERVAL_MS = 1000; // Check every 1 second
struct Entry {
ANSFrame* frame;
std::chrono::steady_clock::time_point createdAt;
};
void evictStale() {
auto now = std::chrono::steady_clock::now();
// Throttle: only run every EVICT_INTERVAL_MS
if (now - m_lastEvict < std::chrono::milliseconds(EVICT_INTERVAL_MS)) return;
m_lastEvict = now;
std::lock_guard<std::mutex> lock(m_mutex);
for (auto it = m_frames.begin(); it != m_frames.end(); ) {
double ageSec = std::chrono::duration<double>(now - it->createdAt).count();
if (ageSec > FRAME_TTL_SECONDS) {
// Force-free: remove all Mat* mappings to this frame
ANSFrame* frame = it->frame;
for (auto mit = m_map.begin(); mit != m_map.end(); ) {
if (mit->second == frame) mit = m_map.erase(mit);
else ++mit;
}
delete frame;
it = m_frames.erase(it);
} else {
++it;
}
}
}
};
```
Call `evictStale()` from `GetImage()` (piggybacked on camera thread activity — same as `gpu_frame_evict_stale()`).
### Protection 2: Max ANSFrame Pool Size
Limit total number of live ANSFrames. If pool is full, force-free the oldest before creating a new one.
```cpp
static constexpr int MAX_ANSFRAMES = 100; // Max live frames across all cameras
ANSFrame* createANSFrame(...) {
evictStale(); // Clean up expired frames first
// If still over limit, force-free oldest
while (m_frames.size() >= MAX_ANSFRAMES) {
auto oldest = m_frames.begin();
// ... force-remove all mappings + delete ...
}
auto* frame = new ANSFrame();
// ... populate ...
m_frames.push_back({frame, std::chrono::steady_clock::now()});
return frame;
}
```
### Protection 3: Camera-Scoped Cleanup
When a camera is stopped or destroyed, force-free ALL ANSFrames belonging to that camera (regardless of refcount).
```cpp
// In ANSRTSPClient::Stop() and Destroy():
ANSFrameRegistry::instance().releaseByOwner(this);
// In ANSFrameRegistry:
void releaseByOwner(void* owner) {
std::lock_guard<std::mutex> lock(m_mutex);
for (auto it = m_frames.begin(); it != m_frames.end(); ) {
if (it->frame->owner == owner) {
// Remove all Mat* mappings
for (auto mit = m_map.begin(); mit != m_map.end(); ) {
if (mit->second == it->frame) mit = m_map.erase(mit);
else ++mit;
}
delete it->frame;
it = m_frames.erase(it);
} else {
++it;
}
}
}
```
### Protection 4: One ANSFrame Per Camera (Ring Buffer)
Each camera keeps only the **latest** ANSFrame. When a new frame arrives, the previous ANSFrame is marked for cleanup (refcount decremented). This bounds memory to 1 ANSFrame per camera.
```cpp
class ANSRTSPClient {
ANSFrame* _currentANSFrame = nullptr;
void onNewFrame(AVFrame* decoded) {
ANSFrame* newFrame = generateANSFrame(decoded);
newFrame->owner = this;
// Replace old frame — decrement refcount
if (_currentANSFrame) {
ANSFrameRegistry::instance().detachOwner(_currentANSFrame);
// If refcount reaches 0, freed immediately
// If clones still hold refs, freed when they release
}
_currentANSFrame = newFrame;
ANSFrameRegistry::instance().attach(&newFrame->display, newFrame);
}
};
```
### Protection 5: ANSFrame Struct with Owner Tracking
```cpp
struct ANSFrame {
// ... existing fields ...
// Leak protection
void* owner = nullptr; // Camera that created this frame
std::chrono::steady_clock::time_point createdAt;
std::atomic<int> refcount{1};
~ANSFrame() {
// Images are cv::Mat — automatically freed by OpenCV refcount
// No manual cleanup needed for fullRes, inference, display
}
};
```
### Memory Budget Analysis
With all protections:
| Cameras | Max ANSFrames | Memory (worst case) |
|---|---|---|
| 5 running | 5 current + ~10 in-flight clones | 5 × 32 MB = 160 MB |
| 20 running | 20 current + ~40 in-flight clones | 20 × 32 MB = 640 MB |
| 100 created, 5 running | 5 current + ~10 in-flight | 5 × 32 MB = 160 MB |
| 100 created, 95 stopped | 0 (stopped cameras free ANSFrame) | 0 MB |
**Worst case bounded by:** `running_cameras × 32 MB` — predictable, no growth over time.
### TTL Guarantee
Even if ALL protections fail, the 5-second TTL eviction ensures:
- Maximum leak duration: 5 seconds
- Maximum leaked memory: `cameras × 5 seconds × 10 FPS × 32 MB / frame` — but with ring buffer (1 per camera), it's just `cameras × 32 MB`
- Periodic cleanup on every `GetImage` call ensures no accumulation
## Replacing GpuFrameRegistry
### Current State (wasteful with NV12 disabled)
With `_useNV12FastPath = false` (current default), `GpuFrameRegistry` is never populated — no `gpu_frame_attach` is called. But `gpu_frame_addref`, `gpu_frame_remove`, and `gpu_frame_evict_stale` still run on every clone/release/replace — doing empty lookups that waste CPU cycles.
```
Current code paths that run but do nothing:
ANSCV_CloneImage_S → gpu_frame_addref → lookup → NOT FOUND → no-op
ANSCV_ReleaseImage_S → gpu_frame_remove → lookup → NOT FOUND → no-op
anscv_mat_replace → gpu_frame_remove → lookup → NOT FOUND → no-op
anscv_mat_replace → gpu_frame_evict_stale → scans empty registry → no-op
```
### Plan: ANSFrameRegistry replaces GpuFrameRegistry
ANSFrameRegistry serves the same purpose (mapping `cv::Mat*` → frame metadata) but without GPU complexity:
| Feature | GpuFrameRegistry | ANSFrameRegistry |
|---|---|---|
| Maps Mat* to | GpuFrameData (NV12 GPU pointers) | ANSFrame (3 CPU images) |
| Used when | NV12 fast path enabled | Always (SW or HW decode) |
| GPU dependency | CUDA, pool slots, D2D copy | None |
| Thread safety | mutex + atomic refcount | mutex + atomic refcount |
| Cleanup | TTL eviction + pool cooldown | TTL eviction (simpler) |
### Migration Path
1. **Phase 1 (implement ANSFrame):** ANSFrameRegistry runs alongside GpuFrameRegistry
- `CloneImage`: calls both `gpu_frame_addref` + `ansframe_addref`
- `ReleaseImage`: calls both `gpu_frame_remove` + `ansframe_release`
- Safe: both registries handle NOT FOUND gracefully
2. **Phase 2 (NV12 disabled permanently):** Remove GpuFrameRegistry calls
- Remove `gpu_frame_addref` from `CloneImage`
- Remove `gpu_frame_remove` from `ReleaseImage` and `anscv_mat_replace`
- Remove `gpu_frame_evict_stale` from `anscv_mat_replace`
- Keep GpuFrameRegistry code for future NV12 re-enablement
3. **Phase 3 (optional, if NV12 re-enabled):** Merge into single registry
- ANSFrame struct gains optional GPU fields (yPlane, uvPlane, poolSlot)
- Single registry, single refcount, single lookup
### Recommended: Implement Phase 1 first, Phase 2 after testing
## Backward Compatibility
- If ANSFrame is not available (e.g., old camera module), engines fall back to current behavior (resize input image)
- The `cv::Mat**` API stays the same — LabVIEW doesn't need changes
- ANSFrame is transparent to LabVIEW — it only sees the display image
- The `GetANSFrameInference` / `GetANSFrameFullRes` APIs are optional for advanced use
## Risk Assessment
| Risk | Mitigation |
|---|---|
| Extra 7.4 MB RAM per frame | Negligible vs 250 MB clone savings |
| ANSFrame lifecycle (refcount) | Same pattern as GpuFrameData — proven |
| Coordinate mapping errors | letterboxRatio stored in ANSFrame — deterministic |
| YUV plane resize quality | Same as GPU NV12 resize — proven equivalent |
| Thread safety | ANSFrame is immutable after creation — safe to share |