tensorrt-llm
Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency
Description
Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production deployment on NVIDIA GPUs (A100/H100), when you need 10-100x faster inference than PyTorch, or for serving models with quantization (FP8/INT4), in-flight batching, and multi-GPU scaling.
Skill File
Tags
Information
You Might Also Like
Add Uint Support
Add unsigned integer (uint) type support to PyTorch operators by updating AT_DISPATCH macros
Docstring
Write docstrings for PyTorch functions and methods following PyTorch conventions
Skill Creator
Guide for creating effective skills
Claude Opus 4 5 Migration
Migrate prompts and code from Claude Sonnet 4
Agent Identifier
This skill should be used when the user asks to "create an agent", "add an agent", "write a subag...
Command Development
This skill should be used when the user asks to "create a slash command", "add a command", "write...