LLM Token Production Energy Cost Calculator

Research-based calculations using FLOP analysis and geographic carbon intensity to estimate the environmental impact of AI model inference.

Brought to you by Inference.net

Configuration

Model parameters and infrastructure settings

Select Model (103 available)

Active Parameters: 32B

Overall Parameters: 1,000B

Calculations use active parameters — the subset actually involved in each token’s computation. Dense models have active == overall, while Mixture-of-Experts (MoE) models use fewer experts per token, lowering energy and cost estimates. Since MoE models need to load all parameters into memory, their energy use is higher than what we estimated, but it should still be a good approximation.

Number of Tokens

Floating Point Precision

Data Center PUE: 1.20

1.0 (Perfect)2.5 (Poor)

Economic Impact

Electricity pricing and cost calculations

Electricity Price: $0.152 per kWh

$0.05$0.50

$0.094US Low

$0.152US Avg

$0.379US High

Total Cost

$0.002460

$0.002460293

for 1,000,000 tokens

Cost per 1M Tokens

$0.002460

$0.002460293

per million tokens

Energy Analysis

Total Energy: 0.016186 kWh

Total Energy: 0.016186141 kWh

Energy per Token: 1.619e-8 kWh

Total FLOPs: 6.40e+16

Precision: FP16

Environmental Impact

Carbon emissions based on grid location

Grid Region

Carbon Emissions

0.003933 kg CO₂

0.003933232 kg CO₂

for 1,000,000 tokens

CO₂ per 1M Tokens

0.003933 kg CO₂

0.003933232 kg CO₂

per million tokens

Grid Information

Region: Google us-central1

Carbon Intensity: 0.243 kg CO₂/kWh

PUE Factor: 1.20x

Calculation Methodology

Energy Calculation: The calculator uses the active parameter count when estimating compute. Each token triggers roughly $2 \times N_{active}$ floating-point operations, so total energy is $E = \frac{2 \times N_{active} \times T}{\eta}$ , where $\eta$ is hardware efficiency in FLOPs/Joule.

• Overall parameters represent the full model size (all experts for MoE).
• Active parameters are the subset actually multiplied for a single token. Dense models haveactive = overall; MoE models often have active ≪ overall.

This distinction means MoE models show lower compute-energy and cost here than equally sized dense models. We do not currently account for memory bandwidth or expert-routing overhead, so real-world MoE energy can be somewhat higher.

Carbon Emissions: $E_{\text{kWh}} \times I_{\text{grid}}$ , where $I_{\text{grid}}$ is the region-specific carbon intensity (kg CO₂/kWh). Selecting cleaner grids (lower $I_{grid}$ ) therefore reduces emissions even when energy use is unchanged.

Hardware Assumptions: Based on NVIDIA H100 specifications (~ $6.59 \times 10^{11}$ FLOPs/Joule, conservative estimate). Precision improvements (FP16/FP8) increase efficiency by 2×/4× respectively.

Variable definitions: $N_{active}$ – active parameters (billions); $T$ – token count; $E_{\text{kWh}}$ – total energy in kilowatt-hours; $I_{\text{grid}}$ – regional carbon intensity (kg CO₂/kWh); $\eta$ – hardware efficiency (FLOPs/J).

Sources: Özcan et al. (2023), "Quantifying the Energy Consumption and Carbon Emissions of LLM Inference via Simulations"