Search Skills

Search for skills or navigate to categories

Skillforthat
AI & Machine Learning
speculative-decoding

speculative-decoding

Accelerate LLM inference using speculative decoding, Medusa multiple heads, and lookahead decodin...

Category

AI & Machine Learning

Developer

davila7
davila7

Updated

Jan
2026

Tags

0
Total

Description

Accelerate LLM inference using speculative decoding, Medusa multiple heads, and lookahead decoding techniques. Use when optimizing inference speed (1.5-3.6× speedup), reducing latency for real-time applications, or deploying models with limited compute. Covers draft models, tree-based attention, Jacobi iteration, parallel token generation, and production deployment strategies.

Skill File

SKILL.md
1Accelerate LLM inference using speculative decoding, Medusa multiple heads, and lookahead decoding techniques. Use when optimizing inference speed (1.5-3.6× speedup), reducing latency for real-time applications, or deploying models with limited compute. Covers draft models, tree-based attention, Jacobi iteration, parallel token generation, and production deployment strategies.

Tags

Information

Developerdavila7
CategoryAI & Machine Learning
CreatedJan 15, 2026
UpdatedJan 15, 2026

You Might Also Like