zai

GLM 4.5 Air

zai/glm-4.5-air

AIR is a streamlined descendant of the GLM lineage designed for scenarios where every millisecond and cent count. By pruning redundant pathways and adopting low‑rank adaptation kernels, AIR delivers much of the expressive power of larger siblings while running comfortably on a single high‑end GPU or modest CPU cluster. Its latency is measured in tens of milliseconds, enabling responsive mobile chat and high‑frequency retrieval tasks. Compression‑aware training ensures knowledge retention despite the reduced footprint, keeping answers factual and coherent. With flexible quantization presets, AIR allows developers to trade accuracy for speed on the fly, optimizing cost at scale.

Tools

Function Calling

Context Window

128,000

Max Output Tokens

96,000

ProviderInput Token PriceOutput Token Price
zai$0.20/Million Tokens$1/Million Tokens