Backblaze B2 — Goodput Calculator (Neocloud) · Guide
How to read the demo · what the numbers mean · the math behind them
← Back to demo

How this demo works

From plain English to the math. Starts simple, gets deep. Designed so anyone — a finance lead, an infrastructure VP, or an engineer — can follow the story and verify the numbers.

Estimate only — not a quote. Figures throughout this document are projections derived from the Configure inputs and Backblaze list pricing on the date shown. Actual results depend on your specific workload, cluster utilization, contracted rates, and implementation. This is for discussion purposes and does not constitute a sales quote, pricing commitment, service-level agreement, or warranty. The I/O wait and flash-replacement defaults are working defaults pending pilot validation — instrument your own cluster and override the Configure fields before relying on these numbers for budget decisions. Final terms require a signed agreement.
The 30-second version

Want the math? Jump to §6. Want to validate against your own workload? §7. Quick walkthrough of the demo? §4.

What's in here

  1. Start here — what's "goodput" and why should I care?
  2. The opportunity — who has this problem
  3. What this tool shows
  4. Demo Walkthrough
  5. Every field, explained — and what we left out
  6. The math — formulas and derivation
    1. Goodput Recovered
    2. Checkpoint Stall Eliminated
    3. Flash Footprint Displaced
    4. Revenue Unlocked
    5. Tier Premium and Net ROI
    6. The bandwidth-capability cap
  7. Assumptions and what to validate before showing externally
  8. FAQ — questions you'll get from engineers and CFOs

1. Start here — what's "goodput" and why should I care?

The kitchen story

Imagine a very expensive chef. The chef costs $1,000 an hour. The chef can cook fast, but only if the ingredients show up on time. If the delivery truck is late, the chef just stands there with a knife. You still pay the chef.

That's what's happening to GPUs every day. An H100 GPU rents for about $3 to $4 per hour. A cluster of 256 H100s costs more than $1,000 per hour to run. AI companies pay that whether the GPUs are training a model or sitting there waiting for the next batch of data.

Throughput vs goodput

Two words that sound the same but mean different things:

A truck with a 200 mph top speed and zero packages delivered has high throughput and zero goodput. A truck driving 60 mph and delivering every package on schedule has lower throughput and perfect goodput. Goodput is what you actually get paid for.

For an AI training cluster, goodput is the percentage of GPU time spent actually computing, not waiting on storage. A "95% goodput" cluster means 5% of the time the GPUs are idle — waiting for the next shuffle of training data, or waiting for a checkpoint to finish flushing to disk.

Why a few percentage points matter so much

A 256-GPU H100 cluster at $4/hr per GPU is $1,024/hr. Over a month (720 hours) that's $737,000.

If your goodput is 95% (5% idle), you waste ~$37,000/mo on idle GPU time.
If you can get to 99% goodput, you save ~$30,000/mo on the same hardware bill.
At 512 GPUs the same delta becomes ~$60,000/mo.

That's the whole idea in one paragraph: storage that keeps GPUs busy converts idle GPU dollars into billable training dollars. The faster and steadier the storage, the more of the GPU bill is doing useful work.

When does goodput actually start to matter?

Two conditions have to be true at the same time. If either one isn't, the goodput math reduces to zero (and Standard B2 is the right answer).

ThresholdWhere it tips
Cluster size large enough that 1% idle = real money ~32+ GPUs (~$100/hr cluster cost). Below this, even significant goodput improvements are too small to justify a tier upgrade.
Working set large enough that storage actually stalls the cluster ~3+ PB hot working set, OR bursty checkpoints ≥ 100 GB on a 64+ GPU cluster. Below this, Standard B2's 50 Gbps already keeps the GPUs fed.

Below both thresholds — Profile 1 territory in the demo — Standard B2 is the right answer and the goodput tiles correctly read $0. The tool will recommend Standard when you click "Auto-select optimal tier" in this regime.

Above both thresholds — Profile 2 and Profile 3 territory — the goodput story dominates. The Net ROI math turns positive for an Overdrive tier sized to the cluster's demand.

2. The opportunity — who has this problem

The four customer segments this is built for

SegmentWhat they sellTheir painHow Overdrive helps
GPU cloud providers
"Neoclouds" renting GPU capacity by the hour
GPU capacity by the hour to AI builders Margin is GPU $/hr minus storage and ops. Every idle GPU minute is dead margin. Flash is expensive — paid for whether customers re-read it or not. Higher goodput → more billable GPU hours from the same hardware. Flash displacement → smaller flash footprint to amortize.
GPU orchestration platforms
Schedulers that pack training jobs across clusters
Software that schedules and packs training jobs across clusters Their value prop IS goodput — they promise "we make your GPUs busier." If the underlying storage stalls, the scheduler can't fix it. Overdrive is the storage layer that makes their orchestration claims defensible end-to-end.
AI infrastructure providers
Bare-metal training, fine-tune-as-a-service, inference platforms
Compute platforms measured in tokens/sec or training $/epoch SLA pressure from customers who measure everything. A 2-minute checkpoint stall × 4 jobs × 24 hours is hours of lost training per day. Checkpoint stall elimination — directly recovers customer-visible throughput.
Data providers for AI training
Web-scale crawlers, media aggregators, dataset publishers
Crawl, curate, and package petabytes of training data for downstream AI builders Petabytes of cold data re-read on every training cycle. Flash for this is economically impossible; slow object storage stalls every downstream training run. Overdrive serves PB-scale catalogs at training-cluster line rate, without the flash bill.

What they all share

The core idea: the conversation about AI infrastructure storage is moving away from storage line-item comparisons. The bigger question is whether storage can convert idle GPU dollars into billable GPU dollars and let you displace expensive flash. This demo models that math.

3. What this tool shows

The page has four big numbers across the top — the "hero tiles." Each is one piece of the goodput story.

TileWhat it answersWho cares most
Goodput Recovered / mo"How much of my GPU bill stops being wasted on I/O wait?"CRO, CFO — direct revenue impact
Checkpoint Stall Eliminated"How much GPU time stops being burned during checkpoint flushes?"Platform engineer, AI customers
Flash Footprint Displaced"How much of my expensive flash can I retire because B2 can serve the hot working set?"Infrastructure VP, finance
Revenue Unlocked / mo"What can I charge customers for the flash I just freed up?"CRO — the resale story

Why these four, and not a simple storage cost comparison?

Storage is a small fraction of any AI training bill. Even cutting it 80% doesn't move the conversation. GPU spend is 10–100× the storage spend, which means a 1% goodput improvement is worth more than a 50% storage discount. This tool surfaces that math directly instead of burying it inside a per-TB rate sheet.

How to read the baseline

When the page loads on Standard B2 (50 Gbps), every goodput tile shows $0 with an explainer ("Standard tier — no goodput delta to claim"). That's intentional, not a bug. Standard B2 IS the baseline — there's no delta to compare against itself. Switch to an Overdrive tier (100, 200, 300, 400, or 800 Gbps) to see the math kick in and the dollar values change.

4. Demo Walkthrough

The fastest way to learn what the tool does is to click through it. The walkthrough below takes about five minutes.

  1. Open the page. Profile 2 (frontier pretraining — 25 PB dataset, 256 H100s) is the default scenario. The hero tiles show roughly $600K/mo total monthly impact on 300 Gbps Overdrive — that's the headline number this profile produces.
  2. Click the Standard B2 tier button. Every Overdrive-dependent tile drops to $0. This is the baseline — Standard B2 is the comparison point and has no "Overdrive delta" by definition.
  3. Click "⚡ Auto-select optimal tier." The tool picks the Overdrive tier with the highest Net ROI for the configured workload and shows the reasoning in the live event stream below.
  4. Switch to Profile 1 (inference / fine-tune — 1–3 PB workload). The optimizer will recommend Standard B2, not Overdrive. At this scale the math doesn't justify the Overdrive premium — and the tool says so honestly.
  5. Switch to Profile 3 (video / multimodal at scale — 100 PB, 512 GPUs). Flash Footprint Displaced jumps to about 24 PB. This profile illustrates the flash-displacement story for very large data libraries.
  6. Click "Run." The full 8-stage pipeline executes against the demo bucket. Hero tiles fill in progressively as each stage completes; the "Open in B2 console" link on each finished stage card surfaces the actual object in the bucket so you can verify nothing is simulated. Prefer to skip the wait? Click the small "skip" link next to the Run button — tiles populate from the math immediately.

Tuning the scenario to your workload

Open the Configure panel to adjust every input directly — cluster size, GPU $/hr, checkpoint cadence, hot working set %, flash $/TB-mo, and I/O wait %. The hero tiles and Tier Comparison panel update live as you change any field. The Tier Comparison panel breaks every tier down (goodput / stall / flash / premium / net ROI) so you can compare them side-by-side.

Switching to Full pipeline mode

The Run button executes all 8 pipeline stages end-to-end in sequence: Data Lake → Training Prep → Checkpoints → Model Registry → RAG Knowledge → KV Cache → Inference → Exhaust. The first 5 stages carry the goodput / stall / flash-displacement headline economics; the last 3 (RAG, KV Cache, Exhaust) demonstrate the full set of B2 patterns in a production AI pipeline. Individual stage cards also have their own Run / Re-run buttons if you want to demo a single pattern in isolation.

5. Every field, explained — and what we left out

Fields included

Workload profile

FieldDefaultWhy it's here
Profile 1 — inference / fine-tune1–3 PB, 16 GPUsCredibility profile. Math correctly recommends Standard B2 here.
Profile 2 — frontier pretraining25 PB, 256 H100s (default)The scenario where Overdrive economics typically carry. Frontier LLM pretraining is the canonical fit.
Profile 3 — video / multimodal at scale100 PB, 512 GPUsThe flash-displacement story for the largest accounts.

Flash cost model

FieldDefaultWhy it's here
Drive price ($/TB hardware)$490 (QLC) / $570 (TLC)Q1 2026 reference pricing for 30.72 TB enterprise drives at volume.
Amortization (months)36Standard 3-year refresh cycle most CFOs assume.
DC overhead (%)50Power + cooling + fabric + ops markup on raw drive cost. 30-60% is typical for high-density flash.
Sell price ($/TB-mo to customers)$100What a neocloud charges its customers for flash. Used in Revenue Unlocked math.

Goodput scenario

FieldDefaultWhy it's here
Cluster size (GPUs)256Drives cluster $/hr in every goodput formula.
GPU $/hr$4.00H100 on-demand rate. Reserved instances are lower; CFO will tune.
Model size70BAuto-fills checkpoint size (~13 bytes/param for full training state per MLCommons MLPerf Storage benchmark).
Checkpoint size (GB)800Hourly checkpoint of a 70B model. MLPerf anchor buttons next to the input auto-fill canonical sizes: 8B → 105 GB, 70B → 912 GB, 405B → 5.29 TB, 1T → 15 TB (MLCommons MLPerf Storage).
Checkpoint cadenceevery 1 hrFrontier pretraining typical. Faster cadence = more stall payoff for Overdrive.
Concurrent jobs4Multi-tenant clusters checkpoint each job independently. Multiplies the stall payoff linearly.
Hot working set (% of dataset)32%Fraction of the dataset that has to stay on flash for active shuffled reads.
Flash $/TB-mo (displacement)$100The flash cost basis Overdrive is replacing. Mirrors the sell-price field above.
Std / Overdrive I/O wait %4.0% / 0.8%Working defaults — should be validated against your cluster's telemetry before treating as authoritative.
Overdrive flash-replace %80%Fraction of hot working set Overdrive can serve at line rate (so flash isn't needed for it).

Fields we deliberately left out

What we removedWhy
"Training runs / month" An earlier version multiplied per-run load-time savings by runs/month and produced implausible $30M+ Net ROI numbers. Real training I/O isn't sequential per run — it's continuous shuffled reads plus checkpoint flushes. Removing this knob keeps the numbers honest.
"S3 egress fees" / "S3 storage rate" The old pitch lived on the S3 comparison. The reframe puts the GPU dollar conversation first; comparing storage line items is a distractor at this stage of the pitch.
"Cold tier vs warm tier vs hot tier" sliders Too much detail for a 5-minute opener. Rolled into a single "Hot working set %" input. Engineers who want the breakdown can read the stage flash-freed fractions in the source.
"$/TB·mo for B2 Overdrive" as an input Hardcoded per tier — these are published Backblaze rates, not tunable inputs. The Standard rate is $6.95/TB·mo; Overdrive tiers are listed in §6.5.
"Per-GPU bandwidth need" as a UI knob Set as a constant (1.0 Gbps/GPU, the conservative lower bound from published H100 training measurements). Heavier or lighter workloads can override the constant in the source; the assumption is called out in §7 so it can be validated against real telemetry.
A separate "Calculator" panel Used to have its own inputs duplicating Configure. Removed for clarity — one set of inputs drives one set of outputs.

6. The math — formulas and derivation

Every formula below is what the tool computes. Each is presented with a worked example using Profile 2 defaults so the chain of reasoning is reproducible by hand.

The premise behind the math. Public sources establish that storage performance materially affects GPU economics: The math below converts these established storage-side levers into per-tier dollar impact for the operator's economic model. Defaults are positioned mid-range conservative; the Configure panel lets you tune every input to your fleet's measured telemetry.

6.1 — Goodput Recovered / mo

cluster_$_per_hr      = cluster_size × gpu_$_per_hr
io_wait_delta         = max(0, (io_wait_std% − io_wait_od%) / 100)
training_hrs_per_mo   = 720 × cluster_utilization        // default 80% → 576 hrs
goodput_$_per_mo      = cluster_$_per_hr × io_wait_delta × tier_capability × training_hrs_per_mo

What it measures

The dollar value of GPU cluster time that's no longer spent in I/O wait. If Standard B2 leaves the cluster waiting on storage 4% of the time, and Overdrive brings that to 0.8%, then 3.2% of the cluster's hourly bill becomes productive work instead of idle GPUs — applied only during the 576 effective training hours each month (a production cluster spends ~20% of every month idle: maintenance, queue gaps, restart windows).

Worked example (Profile 2 on 300 Gbps Overdrive)

cluster_$/hr        = 256 × $4    = $1,024
delta               = (4.0 − 0.8) / 100  = 0.032
capability          = min(1, 300 / (256 × 1)) = 1.0
training_hrs_per_mo = 720 × 0.80 = 576
goodput             = 1,024 × 0.032 × 1.0 × 576
                    ≈ $18,874 / mo
Why the capability cap? A 100 Gbps tier can't fully feed a 256-GPU cluster (256 GPUs × 1 Gbps/GPU = 256 Gbps required). The smaller tier only serves 100/256 ≈ 39% of the demand, so it only delivers 39% of the I/O wait improvement. Without the cap, the model would credit any Overdrive tier with full goodput recovery regardless of size — an obvious overstatement once you do the bandwidth math.

6.2 — Checkpoint Stall Eliminated

Why checkpoint stall is a real economic lever, not a model artifact. Public sources confirm checkpoint overhead is one of the largest underclaimed sources of GPU idle time in production training: The default checkpoint sizes used in the math are from the MLCommons MLPerf Storage benchmark: 8B → 105 GB, 70B → 912 GB, 405B → 5.29 TB, 1T → 15 TB. The Configure panel exposes these as model-size anchor buttons.
stall_std_min          = (ckpt_GB × 8) / standard_gbps / 60
stall_od_min           = (ckpt_GB × 8) / tier_gbps     / 60
delta_min              = max(0, stall_std_min − stall_od_min)
ckpts_per_hr           = 60 / cadence_min
training_hrs_per_mo    = 720 × cluster_utilization              // default 80% → 576 hrs
hours_recovered_per_mo = (delta_min / 60) × ckpts_per_hr × training_hrs_per_mo × concurrent_jobs
stall_$_per_mo         = hours_recovered_per_mo × cluster_$_per_hr

What it measures

Every training run flushes checkpoints periodically. While flushing, the cluster stalls — every GPU waits on the checkpoint write to complete. Standard B2 at 50 Gbps takes about 2.1 minutes to flush an 800 GB checkpoint. Overdrive at 200 Gbps does it in 33 seconds. That 1.6 minute delta, repeated hourly across 4 concurrent training jobs over the 576 effective training hours, adds up.

Worked example (Profile 2 on 300 Gbps)

stall_std           = (800 × 8) / 50  / 60 = 2.13 min
stall_od            = (800 × 8) / 300 / 60 = 0.36 min
delta               = 1.78 min
ckpts/hr            = 60 / 60 = 1
training_hrs_per_mo = 720 × 0.80 = 576
hours/mo            = (1.78/60) × 1 × 576 × 4 jobs
                    = 68.4 hours
stall_$/mo          = 68.4 × $1,024
                    ≈ $69,972 / mo
Multi-tenant cluster modeling: the tool multiplies stall recovery by concurrent_jobs. In a multi-tenant cluster each job checkpoints on its own cadence, so the payoff scales linearly with the number of concurrent flushers. Single-tenant clusters should set concurrent_jobs = 1 in the Configure panel for an unmultiplied result.

6.3 — Flash Footprint Displaced

Hot/cold split is the lever VAST and WekaIO have monetized for years. Their published whitepapers and customer references repeatedly cite hot working sets in the 20–40% range of total training corpus depending on epoch size, shuffle pattern, and dataset scale. Cold tail data (60–80%) is the displaceable portion the Overdrive flash-replace % math operates on. The default 32% hot working set is mid-range conservative against those public ranges. Should be validated against your fleet's actual hot/cold telemetry — instrument with object-access timestamps to derive your real number.
dataset_PB     = pipeline_training_TB / 1,000     (decimal PB, matches spec §4)
hot_PB         = dataset_PB × hot_working_set% / 100
displaced_PB   = hot_PB × overdrive_replace% / 100 × tier_capability
displaced_TB   = displaced_PB × 1,000
flash_$        = displaced_TB × flash_$_per_TB_per_mo
b2_overdrive_$ = displaced_TB × tier_storage_rate
net_savings    = max(0, flash_$ − b2_overdrive_$)

What it measures

The amount of flash storage a customer no longer needs to provision because Overdrive can serve the hot working set at line rate. For a 100 PB customer with a 30 PB hot tier, Overdrive eliminating 80% of that hot-tier flash provisioning is a 24 PB reduction. At $100/TB-mo flash and $19/TB-mo B2 Overdrive, that's ~$1.9M/mo of net storage savings.

Worked example (Profile 2 on 300 Gbps)

dataset       = 25,000 TB / 1,000 = 25 PB
hot           = 25 × 0.32        = 8 PB
displaced     = 8 × 0.80 × 1.0   = 6.4 PB (6,400 TB)
flash_$       = 6,400 × $100     = $640,000
b2_od_$       = 6,400 × $19      = $121,600
net           = $640,000 − $121,600
              ≈ $518,400 / mo

6.4 — Revenue Unlocked / mo

freed_TB        = Σ STAGE_FLASH_FREED_TB[stage] for stage in completed_stages
margin_per_TB   = flash_sell_price − b2_storage_rate
revenue_$_per_mo = max(0, freed_TB × margin_per_TB)

What it measures

The neocloud's resale opportunity. As each demo stage offloads its data tier to B2, that flash capacity is freed up. The neocloud can resell that flash to a new GPU customer at the going rate (~$100/TB-mo) while paying B2 only the storage rate (~$6.95–$19/TB-mo depending on tier). The margin is recurring revenue.

Per-stage flash-freed assumptions

StageFlash freedWhat stays hot
Data Lake90% of training corpus~10% re-training cache
Checkpoints75% of checkpoint historyActive + last 2-3 checkpoints
Model Registry24% of model tierCurrently deployed model
RAG Knowledge7.5% of model tierVector index + embeddings
KV Cache8.25% of model tierHot KV blocks for inference
Exhaust70% of logs72-hour log window

These per-stage fractions are educated estimates. Validate against actual customer deployment patterns before externalizing as authoritative numbers.

6.5 — Tier Premium and Net ROI (Tier Comparison table)

storage_TB         = training_TB + checkpts_TB + models_TB + logs_TB
standard_$_per_mo  = storage_TB × $6.95
tier_$_per_mo      = max(tier_min_monthly_commit,
                         storage_TB × tier_storage_rate + tier_network_fee)
tier_premium       = max(0, tier_$_per_mo − standard_$_per_mo)

net_ROI_per_mo     = goodput_$ + stall_$ + net_flash_$ − tier_premium

This is what the "Auto-select optimal tier" button uses to pick the best tier. The winner is whichever Overdrive tier maximizes net ROI for the configured workload. If no Overdrive tier produces positive ROI (small workloads where the tier premium exceeds the gains), it recommends Standard B2.

6.6 — The bandwidth-capability cap

cluster_demand_gbps = cluster_gpus × BANDWIDTH_PER_GPU_GBPS  (default: 1.0 Gbps/GPU)
tier_capability     = min(1, tier_gbps / cluster_demand_gbps)

An Overdrive tier that can't deliver enough bandwidth for the cluster can't fully deliver its benefits. The capability cap prorates Goodput Recovered and Flash Displaced by the fraction of cluster demand the tier can actually serve.

Examples:

Why 1.0 Gbps/GPU? Published H100 training measurements report roughly 2–4 GB/s aggregate read bandwidth for 8 H100s on shuffled training data — i.e., 2–4 Gbps per GPU at the upper bound. We picked 1.0 as a conservative lower bound because real training I/O overlaps with compute (the GPU isn't blocking on every read; the data loader prefetches). For more bandwidth-intensive workloads (e.g., very small shards, low-cache-locality reads) this can be tuned upward in the source.
Why this number differs from the GenAI tool's 0.3 Gbps/GPU default. The two calculators answer different questions and therefore use different capability-cap derivations: Both arrive at a defensible cap; they start from different observable inputs because their audiences plan against different constraints. If you ever need apples-to-apples comparison, override the constant in either tool's source to match the other.

7. Assumptions to validate against your workload

The numbers in this tool are conservative defaults pending pilot validation. Before relying on any of these numbers for a real budget decision, validate the inputs below against your own telemetry. Every Configure field can be tuned to match your environment.
AssumptionDefaultNotesHow to validate
Cluster utilization %80%Production-reservation typical Fraction of the month the cluster is actively training. 720 hr/mo × 80% = 576 effective training hours. Production reservations often run higher (90%+) on reserved capacity; research clusters can run lower. The math multiplies goodput recovered + stall recovery by this — overstating utilization overstates monthly impact linearly.
Standard I/O wait %4.0%Working default Pull cluster utilization telemetry from a representative training run. GPU SM efficiency in nvidia-smi or DCGM gives the inverse — the fraction of time GPUs were doing useful compute.
Overdrive I/O wait %0.8%Working default Same approach in an environment with sustained 200+ Gbps storage throughput. If you don't yet have one, treat this as the modeled outcome and re-measure after a pilot.
Per-GPU bandwidth demand1.0 Gbps/GPUConservative lower bound Measure actual training-read GB/s, divide by GPU count. Heavier workloads (small shards, low cache locality) can be 2–4 Gbps/GPU.
Flash $/TB-mo (sell price basis)$100Mid-market reference Use your own flash sell price or internal chargeback rate. Common range is $80–$150.
Overdrive flash-replace %80%Modeled outcome This is a workload claim, not a pricing input. Validate by piloting Overdrive against a representative hot working set.
Stage flash-freed fractions90 / 75 / 40 / 30 / 55 / 70%Educated estimates from typical AI pipeline patterns Compare against your retention policy. Numbers vary by training cadence and how much old data stays warm.
QLC flash hardware cost$490/TBPublic reference (2026 Q1, 30.72 TB drives in volume) Use your own quote. Sustained-volume agreements often come in lower.

Where the I/O wait baselines come from

The 4.0% (Standard) and 0.8% (Overdrive) defaults sit in the middle of public ranges reported by storage and ML infrastructure vendors. They are not measurements of your specific workload — they describe what well-tuned vs un-tuned training-storage stacks typically look like in the industry.

SourceWhat it reports
AWS S3 Mountpoint & S3 Express One Zone performance documentationCold-stream I/O wait on training workloads in the high-single-digit range without local NVMe staging. Throughput improves dramatically with prefetch + parallel range reads (the pattern this tool models).
MLPerf Storage benchmark (MLCommons)Uses I/O wait as the proxy for "good vs bad storage." The 2–8% envelope separates well-tuned from un-tuned object-storage stacks across published submissions.
VAST Data & WekaIO public whitepapers + production telemetryCite measured I/O wait reductions on un-tuned object-storage backed training in the 3–10% band depending on shard pattern.
Hyperscaler + academic telemetry (AWS reports, Azure reports, Meta SPEC paper, Google Pathways)I/O wait can swing 1–15% depending on dataset size, shard pattern, dataloader concurrency, and checkpoint cadence. Azure specifically reports checkpoint overhead averaging 12% of total training time, up to 43% in some large-model scenarios.
What we don't have: Backblaze does not yet publish a customer-pilot data point of measured baseline I/O wait. The defaults are positioned in the middle of the public envelope as conservative starting points, but the only way to validate the number for your specific workload is to instrument your training run. The Configure panel is where you override the defaults with measured data when available.

8. FAQ

Where do the 4% / 0.8% I/O wait numbers come from?

They are working defaults pending validation against telemetry from a representative training run. The "Where the I/O wait baselines come from" subsection in §7 above cites the public source ranges. To produce numbers you can budget against, take a measurement of your own cluster (GPU SM efficiency from nvidia-smi or DCGM gives the inverse of I/O wait %) and update the Configure fields accordingly.

Why does Profile 2 default to 300 Gbps, not 200 Gbps?

The optimizer computes Net ROI for every tier and picks the highest. For a 256-GPU cluster with 1 Gbps/GPU demand, 300 Gbps is the first tier where capability reaches 100% — bigger tiers add tier premium without proportional benefit. Adjust your cluster size or per-GPU bandwidth demand and the optimizer will choose differently.

Why isn't there an object-storage vs object-storage comparison?

The headline math here measures GPU-productivity impact, not storage line-item savings. A 5% improvement on a multi-hundred-thousand-dollar monthly GPU bill is worth more than even a large percentage discount on a storage line item. The tool is intentionally focused on the GPU-economic story.

What if my workload's training I/O is heavier than 1 Gbps/GPU?

The BANDWIDTH_PER_GPU_GBPS constant in the source is the conservative lower bound. For heavier workloads (small shards, low cache locality), tune it upward and the math will recompute. The Tier Comparison "Capability" column shows whether a given tier can keep up with the resulting demand.

The demo only uploads ~200 MB. Why are you projecting petabyte numbers?

The live stage runs demonstrate the patterns end-to-end — multipart uploads, parallel range reads, mid-run checkpointing, model registry, inference cold-start — using small data so the demo finishes in under a minute. The dollar projections run against the configured workload (Profile 1/2/3 in the workload preset). Each completed stage card explicitly says "↓ Projects to X PB freed" to flag the scale shift.

Why does the page open with Configure expanded?

So you can see — and change — every input that drives the headline numbers. Nothing is hidden behind a magic button. If you want a cleaner view, click the Configure summary to collapse it; the hero tiles stay populated.

Can you explain the Capability column in the Tier Comparison table?

It's the fraction of the cluster's read demand the tier can actually serve. Yellow (under 100%) = the tier is undersized for this cluster; goodput and flash benefits are prorated. Green (100%) = the tier saturates the workload; full benefit. The optimizer prefers the smallest 100%-capable tier because anything larger adds tier premium without adding benefit.

Why doesn't the Goodput tile change when I switch between Overdrive tiers?

It does — but it only changes when the bandwidth capability changes. On a 256-GPU cluster, every Overdrive tier at 256 Gbps and above delivers the same goodput recovery because they all saturate the workload at 100% capability. Smaller tiers (100 Gbps, 200 Gbps) show lower goodput because they're undersized. This is the model behaving honestly — once a tier is big enough to feed the cluster, more bandwidth doesn't recover more goodput.

What does "Standard rates — recommend Overdrive to recover goodput" mean? Is Standard bad?

No. For small workloads Standard is the right answer (Profile 1 is built around exactly that case). The note is just saying Standard is the baseline by definition, so there's no delta to recover when comparing it to itself. At Overdrive tiers the math measures the delta from the Standard baseline.

What does the Clean up button do? Is it safe?

Cleanup deletes only objects under the configured DEMO_PREFIX (default goodput-demo/). The confirmation prompt names the exact bucket and prefix being wiped. Other prefixes in the same bucket are untouched. Even so — use a dedicated demo bucket if you're running this against your own credentials.


Live demo: https://genai.backblazedemos.xyz/goodput/
Author: Kevin Lott · klott@backblaze.com