Research-based calculations using FLOP analysis and geographic carbon intensity to estimate the environmental impact of AI model inference.
Active Parameters: 32B
Overall Parameters: 1,000B
Calculations use active parameters — the subset actually involved in each token’s computation. Dense models have active == overall, while Mixture-of-Experts (MoE) models use fewer experts per token, lowering energy and cost estimates. Since MoE models need to load all parameters into memory, their energy use is higher than what we estimated, but it should still be a good approximation.
$0.002460
for 1,000,000 tokens
$0.002460
per million tokens
Total Energy: 0.016186 kWh
Total Energy: 0.016186141 kWh
Energy per Token: 1.619e-8 kWh
Total FLOPs: 6.40e+16
Precision: FP16
0.003933 kg CO₂
for 1,000,000 tokens
0.003933 kg CO₂
per million tokens
Region: Google us-central1
Carbon Intensity: 0.243 kg CO₂/kWh
PUE Factor: 1.20x
Energy Calculation: The calculator uses the active parameter count when estimating compute. Each token triggers roughly floating-point operations, so total energy is , where is hardware efficiency in FLOPs/Joule.
• Overall parameters represent the full model size (all experts for MoE).
• Active parameters are the subset actually multiplied for a single token. Dense models haveactive = overall; MoE models often have active ≪ overall.
This distinction means MoE models show lower compute-energy and cost here than equally sized dense models. We do not currently account for memory bandwidth or expert-routing overhead, so real-world MoE energy can be somewhat higher.
Carbon Emissions: , where is the region-specific carbon intensity (kg CO₂/kWh). Selecting cleaner grids (lower ) therefore reduces emissions even when energy use is unchanged.
Hardware Assumptions: Based on NVIDIA H100 specifications (~ FLOPs/Joule, conservative estimate). Precision improvements (FP16/FP8) increase efficiency by 2×/4× respectively.
Variable definitions: – active parameters (billions); – token count; – total energy in kilowatt-hours; – regional carbon intensity (kg CO₂/kWh); – hardware efficiency (FLOPs/J).
Sources: Özcan et al. (2023), "Quantifying the Energy Consumption and Carbon Emissions of LLM Inference via Simulations"