"flashattention" 태그

FlashAttention, PagedAttention, 투기적 디코딩 등 메모리 병목을 해결하고 연산 효율을 높이는 주요 LLM 추론 가속화 기술을 살펴봅니다.