From Prompt to Prediction: Understanding Prefill, Decode, and the KV Cache in LLMs

From Prompt to Prediction: Understanding Prefill, Decode, and the KV Cache in LLMs

This article is divided into three parts; they are: • How Attention Works During Prefill • The Decode Phase of LLM Infe…