KV Streaming | Wooseok Gwak

The summary of the project is as follows.

Led a project to overlap model inference, KV cache streaming, and decoding in a layerwise manner to reduce TTFT.
Developed a layerwise inference engine, encoding and decoding tools, and a streaming server for pipelined execution.
Achieved a 5–15% reduction in TTFT compared to the non-overlapped baseline.