How We Run AI Inference on Energy California Throws Away
March 21, 2026 · 8 min read
The Problem: 3.4 TWh of Wasted Solar
In 2024, California's grid operator (CAISO) curtailed 3.4 million megawatt-hours of solar and wind energy — a 29% increase from 2023. That's enough electricity to power 500,000 homes for a year. And it's being thrown away.
Why? Not because there's too much solar overall, but because 70% of curtailment is caused by local transmission congestion. The energy is generated in places like Fresno County, but the grid can't move it to where people need it — the Bay Area, Los Angeles, Sacramento.
The worst bottleneck is Path 15, a transmission corridor through rural Fresno County. By 2039, it's projected to be congested 84% of the year — over 7,300 hours annually.
The Insight: Bring Compute to the Energy
On March 10, 2026, researchers from Next 10 and the University of Pennsylvania published "Curtail to Compute" — a study that proposes siting data centers at solar curtailment zones to absorb the energy that would otherwise be wasted.
Their key finding: a 20 MW data center in Fresno County could run on curtailed solar for 54% of the year. Capital costs would be $94 million less than an equivalent facility in Silicon Valley. And investor returns would hit 28%, compared to 15% for urban sites.
That's what we're building at Daylite.
How Daylite Works
We run H100 GPUs in Fresno County's Path 15 corridor — right where the grid is congested and solar energy is being thrown away. The inference is served to Bay Area developers via dark fiber at 5-8ms latency — faster than AWS Oregon (15-25ms).
Energy strategy
- Solar peak hours (~150 days/year): Near-free curtailed solar energy. This is when we run our cheapest Batch tier.
- Nights: PG&E off-peak grid at $0.057/kWh — cheaper than running batteries (battery LCOS is $0.076/kWh).
- Small battery (1-2hr): For peak shaving and transition buffer, not for overnight storage.
Cooling
H100 GPUs generate 700W each. Air conditioning can't handle that density. We use direct-to-chip liquid coolingin a closed loop with dry coolers — minimal water consumption, even in Fresno's hot, dry climate. Adiabatic assist kicks in only on the hottest days.
What This Means for Developers
Daylite is an OpenAI-compatible API. You change your base_url and save ~40% on inference costs — blended across the full year. During solar peak hours, savings hit 60%.
We're not a GPU rental business (we've seen what happens to those — CoreWeave carries $14B in debt). We're an inference API with per-token pricing, optimized throughput via vLLM with FP8 quantization, and a structural energy cost advantage that grows as curtailment grows 29% per year.
The Honest Numbers
We believe in transparency. Here's the real math:
- Total cost advantage vs Bay Area competitors: ~18% (energy + rent + cooling combined)
- Pricing advantage to customers: Near-cost tokens + per-customer cost management platform ($299/mo)
- Seasonal reality: April peak = 190 MWh/day curtailed. August = barely 20 MWh.
- Blended energy cost: $0.05-0.07/kWh vs $0.15-0.25/kWh for Bay Area grid
Try It
Free tier: 100K tokens/month, no credit card. Try the playground or read the API docs.