Llama 3 8B Instruct
Meta · 8B params · 8k context
Deploy this modelrocket_launch
P
Parthexpand_moreAbout
Smaller Llama 3 variant. Fast, cheap, ideal for high-volume inference.
Type
Chat
Parameters
8B params
Context
8k
License
Llama 3 Community
Recommended deployment
memory
Best quality
A100 PCIe x8 · 12k tok/s
$26.4
/hour
memory
Balanced
A100 SXM x4 · 6k tok/s
$6.6
/hour
memory
Budget
RTX 4090 x2 · 2k tok/s
$1.1
/hour
Pricing
Input tokens
$0.10
per 1M tokens
Output tokens
$0.30
per 1M tokens
Quick stats
Deploys
5,680
Recommended GPU
A100 PCIe
Inference latency
~280ms P50
Throughput
~12k tok/s
Use cases
- check_circleCustomer support automation
- check_circleKnowledge base search
- check_circleAgentic workflows
- check_circleContent generation