v1.1.1 add benchmark and trtllm offline code

This commit is contained in:
SWivid
2025-04-03 18:33:48 +08:00
parent 2374f8ec39
commit fe5c562212
3 changed files with 11 additions and 9 deletions

View File

@@ -114,9 +114,11 @@ Deployment solution with Triton and TensorRT-LLM.
#### Benchmark Results
Decoding on a single L20 GPU, using 26 different prompt_audio & target_text pairs.
| Model | Concurrency | Avg Latency | RTF |
|-------|-------------|----------------|-------|
| F5-TTS Base (Vocos) | 1 | 253 ms | 0.0394|
| Model | Concurrency | Avg Latency | RTF | Mode |
|---------------------|----------------|-------------|--------|-----------------|
| F5-TTS Base (Vocos) | 2 | 253 ms | 0.0394 | Client-Server |
| F5-TTS Base (Vocos) | 1 (Batch_size) | - | 0.0402 | Offline TRT-LLM |
| F5-TTS Base (Vocos) | 1 (Batch_size) | - | 0.1467 | Offline Pytorch |
See [detailed instructions](src/f5_tts/runtime/triton_trtllm/README.md) for more information.