Two-part fix

Fix 1 — Chunk oversized bucket groups (the correctness fix)
ONNXOCRRecognizer::RecognizeBatch now slices each bucket group into chunks of ≤ kRecMaxBatch before submitting to TRT. A frame with 30 crops in bucket 320 produces two back-to-back batched calls (24 + 6), both within the profile, both on the fast path.

Fix 2 — Raise the profile max from 16 to 24 (the performance fix)
The old profile max was 16; your real scenes routinely hit 24. Raising the profile max to 24 means the common 12-plate scene (24 crops) fits in a single batched call with no chunking needed. Scenes with > 24 crops now use chunking, but that's rare.
This commit is contained in:
2026-04-15 07:27:55 +10:00
parent 5706615ed5
commit 7778f8c214
3 changed files with 44 additions and 6 deletions

View File

@@ -60,7 +60,11 @@ static PerModelOcrOptions BuildNvidiaOcrOptions(
opts.classifierOpts.preferTensorRT = preferTensorRT;
opts.classifierOpts.trtFP16 = true;
// Recognizer: TRT EP with dynamic shape profile.
// Recognizer: TRT EP with dynamic shape profile. The max-batch
// dimension is kRecMaxBatch (defined in ONNXOCRTypes.h) — the same
// constant that ONNXOCRRecognizer::RecognizeBatch uses to chunk
// oversized bucket groups. Keeping them in lockstep ensures the
// recognizer never submits a shape that falls outside the TRT profile.
opts.recognizerOpts.useMaxCudnnWorkspace = true;
opts.recognizerOpts.preferTensorRT = preferTensorRT;
opts.recognizerOpts.trtFP16 = true;
@@ -71,12 +75,13 @@ static PerModelOcrOptions BuildNvidiaOcrOptions(
"input name — defaulting to 'x'" << std::endl;
recInputName = "x";
}
const std::string maxB = std::to_string(kRecMaxBatch);
std::cout << "[PaddleOCRV5Engine] Recognizer input name: '"
<< recInputName << "' — building TRT dynamic profile "
<< "[batch=1..16, W=320..960]" << std::endl;
<< "[batch=1.." << maxB << ", W=320..960]" << std::endl;
opts.recognizerOpts.trtProfileMinShapes = recInputName + ":1x3x48x320";
opts.recognizerOpts.trtProfileOptShapes = recInputName + ":4x3x48x480";
opts.recognizerOpts.trtProfileMaxShapes = recInputName + ":16x3x48x960";
opts.recognizerOpts.trtProfileMaxShapes = recInputName + ":" + maxB + "x3x48x960";
}
return opts;
}