Analytics

Inference requests per minute

Time to first token latency

Completion tokens per second

Inference duration

Tokens written to the cache when creating a new entry

Tokens retrieved from the cache for this request

Percentage of successful requests