Search Skills

Search for skills or navigate to categories

Skillforthat
AI & Machine Learning
sglang

sglang

Fast structured generation and serving for LLMs with RadixAttention prefix caching

Category

AI & Machine Learning

Developer

davila7
davila7

Updated

Jan
2026

Tags

2
Total

Description

Fast structured generation and serving for LLMs with RadixAttention prefix caching. Use for JSON/regex outputs, constrained decoding, agentic workflows with tool calls, or when you need 5× faster inference than vLLM with prefix sharing. Powers 300,000+ GPUs at xAI, AMD, NVIDIA, and LinkedIn.

Skill File

SKILL.md
1Fast structured generation and serving for LLMs with RadixAttention prefix caching. Use for JSON/regex outputs, constrained decoding, agentic workflows with tool calls, or when you need 5× faster inference than vLLM with prefix sharing. Powers 300,000+ GPUs at xAI, AMD, NVIDIA, and LinkedIn.

Tags

AiWorkflow

Information

Developerdavila7
CategoryAI & Machine Learning
CreatedJan 15, 2026
UpdatedJan 15, 2026

You Might Also Like