← Blog

How to Run Llama 70B Inference at Near-Cost with Per-Customer Tracking

March 21, 2026 · 5 min read

Llama 3.1 70B is one of the most capable open-source models available. Most providers charge $0.80-0.90 per million output tokens. With Daylite's Batch tier, you can run it for $0.20/M input, $0.35/M output — with per-customer cost tracking included.

Here's how to set it up in 2 minutes.

Prerequisites

Python Setup

pip install openai
from openai import OpenAI

client = OpenAI(
    base_url="https://api.daylite.ai/v1",
    api_key="YOUR_DAYLITE_API_KEY",
)

# Simple completion
response = client.chat.completions.create(
    model="llama-3.1-70b",
    messages=[
        {"role": "user", "content": "Explain quantum computing in simple terms"}
    ],
)
print(response.choices[0].message.content)
print(f"Tokens used: {response.usage.total_tokens}")

Streaming

stream = client.chat.completions.create(
    model="llama-3.1-70b",
    messages=[{"role": "user", "content": "Write a haiku about solar energy"}],
    stream=True,
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

JavaScript Setup

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.daylite.ai/v1",
  apiKey: "YOUR_DAYLITE_API_KEY",
});

const response = await client.chat.completions.create({
  model: "llama-3.1-70b",
  messages: [
    { role: "user", content: "Explain quantum computing" },
  ],
});

console.log(response.choices[0].message.content);

Cost Optimization Tips

1. Use Batch for non-urgent workloads

If your task can wait a few hours (document processing, data extraction, batch summarization), use the Batch tier at $0.20/M input, $0.35/M output — significantly cheaper than Together AI, with per-customer cost tracking included.

2. Choose the right model size

Not everything needs 70B. Llama 3.1 8B is excellent for simpler tasks at $0.08/M output tokens— that's 85% cheaper than 70B.

3. Use system prompts efficiently

Shorter system prompts = fewer input tokens = lower cost. Be concise.

Pricing Summary

ModelInputOutput
llama-3.1-70b$0.35/M$0.55/M
llama-3.1-8b$0.05/M$0.08/M
deepseek-v3$0.25/M$0.55/M

Ready to try? Get your free API key at daylite.ai/dashboard. 100K tokens/month free, no credit card.