SpazeNode - AI Compute Console

Llama 3 8B Instruct

Meta · 8B params · 8k context

Parthexpand_more

Smaller Llama 3 variant. Fast, cheap, ideal for high-volume inference.

Type

Chat

Parameters

8B params

Context

License

Llama 3 Community

memory

Best quality

A100 PCIe x8 · 12k tok/s

$26.4

/hour

memory

Balanced

A100 SXM x4 · 6k tok/s

$6.6

/hour

memory

Budget

RTX 4090 x2 · 2k tok/s

$1.1

/hour

Input tokens

$0.10

per 1M tokens

Output tokens

$0.30

per 1M tokens

Deploys

5,680

Recommended GPU

A100 PCIe

Inference latency

~280ms P50

Throughput

~12k tok/s