LLM Token Production Energy Cost Calculator

Research-based calculations using FLOP analysis and geographic carbon intensity to estimate the environmental impact of AI model inference.

Brought to you by Inference.net
Configuration
Model parameters and infrastructure settings

Active Parameters: 32B

Overall Parameters: 1,000B

Calculations use active parameters — the subset actually involved in each token’s computation. Dense models have active == overall, while Mixture-of-Experts (MoE) models use fewer experts per token, lowering energy and cost estimates. Since MoE models need to load all parameters into memory, their energy use is higher than what we estimated, but it should still be a good approximation.

1.0 (Perfect)2.5 (Poor)
Economic Impact
Electricity pricing and cost calculations
$0.05$0.50
$0.094US Low
$0.152US Avg
$0.379US High
Total Cost

$0.002460

for 1,000,000 tokens

Cost per 1M Tokens

$0.002460

per million tokens

Energy Analysis

Total Energy: 0.016186 kWh

Energy per Token: 1.619e-8 kWh

Total FLOPs: 6.40e+16

Precision: FP16

Environmental Impact
Carbon emissions based on grid location
Carbon Emissions

0.003933 kg CO₂

for 1,000,000 tokens

CO₂ per 1M Tokens

0.003933 kg CO₂

per million tokens

Grid Information

Region: Google us-central1

Carbon Intensity: 0.243 kg CO₂/kWh

PUE Factor: 1.20x

Calculation Methodology

Energy Calculation: The calculator uses the active parameter count when estimating compute. Each token triggers roughly 2×Nactive2 \times N_{active} floating-point operations, so total energy is E=2×Nactive×TηE = \frac{2 \times N_{active} \times T}{\eta}, where η\eta is hardware efficiency in FLOPs/Joule.

Overall parameters represent the full model size (all experts for MoE).
Active parameters are the subset actually multiplied for a single token. Dense models haveactive = overall; MoE models often have active ≪ overall.

This distinction means MoE models show lower compute-energy and cost here than equally sized dense models. We do not currently account for memory bandwidth or expert-routing overhead, so real-world MoE energy can be somewhat higher.

Carbon Emissions: EkWh×IgridE_{\text{kWh}} \times I_{\text{grid}}, where IgridI_{\text{grid}} is the region-specific carbon intensity (kg CO₂/kWh). Selecting cleaner grids (lower IgridI_{grid}) therefore reduces emissions even when energy use is unchanged.

Hardware Assumptions: Based on NVIDIA H100 specifications (~6.59×10116.59 \times 10^{11} FLOPs/Joule, conservative estimate). Precision improvements (FP16/FP8) increase efficiency by 2×/4× respectively.

Variable definitions: NactiveN_{active} – active parameters (billions); TT – token count; EkWhE_{\text{kWh}} – total energy in kilowatt-hours; IgridI_{\text{grid}} – regional carbon intensity (kg CO₂/kWh); η\eta – hardware efficiency (FLOPs/J).

Sources: Özcan et al. (2023), "Quantifying the Energy Consumption and Carbon Emissions of LLM Inference via Simulations"