Backblaze B2 — Goodput Calculator (GenAI) · Guide
← Back to demo

How this demo works

From plain English to the math. Built for GenAI teams that pay for flash they don't own. Designed so anyone — a finance lead, a CTO, or a platform engineer — can follow the story and verify the numbers.

The 30-second version

Want the math? Jump to §6. Want to validate against your own workload? §7. Quick walkthrough of the demo? §4.

What's in here

  1. Start here — you don't own the flash
  2. The opportunity — who has this problem
  3. What this tool shows
  4. Demo walkthrough
  5. Every field, explained — and what we left out
  6. The math — formulas and derivation
    1. Goodput recovered
    2. Hot-storage $ avoided
    3. Migration egress covered (one-time)
    4. Total tenant $ saved
    5. Rehydration $ reclaimed (run-start)
    6. Per-tier ROI math + recommendation
    7. Why 5 PB is the gate
  7. Assumptions to validate against your workload
  8. FAQ — what your engineers and CFO will ask
  9. How B2 fits into the inference layer

1. Start here — you don't own the flash

Source-backed evidence anchors (workload reality check). The public record supports the problem narrative this calculator addresses, but does not validate any single B2-specific number as an "industry default." Defensible anchors from public disclosures:

What the public record does NOT directly validate: a fixed B2/Overdrive-specific I/O wait uplift, a universal "rehydration % of premium working set" default, or a uniform checkpoint cadence. These are pilot-tunable inputs in the tool — substantiate against your own telemetry before relying on them for budget decisions.

The rented-vault story

Imagine renting a 50 TB safe-deposit vault for everything you own — including your winter coat, tax returns from 2015, and twelve boxes labeled "miscellaneous, someday." The bank charges you premium-vault rates every month whether you visit or not. You'd never tier your possessions this way if you owned the vault yourself — you'd put the active stuff in the vault and the rest in a $6/month storage unit nearby that you can reach in an hour.

That's most GenAI teams' storage today. Your GPU cloud (or hyperscaler) charges you for persistent flash by the TB-month at premium rates. You can't amortize it like an owner would. Your bill doesn't distinguish hot data you're actively training on from cold data you read once a quarter. So most teams leave everything on the hot tier "just in case."

At PB scale, storage isn't a footnote — it's six-figure-monthly money

Storage costs scale linearly with dataset size, and at petabyte scale the line item gets very loud:

Where 5 PB livesRateMonthly bill
AWS S3 Standard$23 / TB·mo$115,000 / mo
Neocloud (blended)$75 / TB·mo$375,000 / mo
Neocloud premium NVMe only$100 / TB·mo$500,000 / mo
Backblaze B2$6.95 / TB·mo$34,750 / mo

At 25 PB on a neocloud the storage line is over $2.5 million per month — bigger than the GPU bill for a 256-GPU cluster. Storage is not a rounding error here. It's the headline number.

The GPU bill is the other half of the story

An H100 GPU rents for $3 to $4 per hour. A 64-GPU cluster at $4/hr is $184,000/mo. A 256-GPU cluster is over $737,000/mo. Whenever those GPUs are stalling on I/O instead of computing, the meter keeps running.

A 64-GPU H100 cluster at $4/hr is $256/hr — $184,000/mo.

If 5% of that time is GPU-idle while waiting on I/O, that's ~$9,200/mo of fully-billed compute doing nothing.
Drop I/O wait to 1.5% and you reclaim ~$6,450/mo on the same hardware.
At 256 GPUs the same delta becomes ~$25,800/mo.

Storage that keeps GPUs fed converts idle GPU dollars into billable training dollars. The faster and steadier the storage pattern, the more of the GPU bill is doing useful work.

Egress is the lock-in tax — and it's the biggest one

If your training data lives on a hyperscaler — AWS S3, Azure Blob, Google Cloud Storage — every byte you ship out costs $0.08–$0.12/GB. At 5 PB that's $450,000 to move once. At 50 PB it's $4.5 million. At 100 PB, $9 million.

Your dataset isn't on AWS because their GPUs are best.
It's on AWS because moving it costs more than buying a year of GPU time. That's lock-in by economics, and it's why customers stay long after the math stopped working in their favor.

Neoclouds have figured this out. CoreWeave's Zero Egress Migration program covers your hyperscaler exit fee because they know the lock-in is the only thing keeping you from leaving. Backblaze B2 takes it one step further: $0 ongoing egress to verified compute partners (CoreWeave, Equinix Metal, Vultr, phoenixNAP, plus the CDN partners Cloudflare, Fastly, bunny.net, CacheFly). Move once to B2, and your data is permanently unlocked. You can shop for more cost-effective GPUs every quarter without paying ransom to switch.

The unlock is the value prop. More cost-effective storage matters. Reclaimed GPU time matters. But the line that changes the business is "your data is no longer the reason you can't move." Once that's true, every other lever — storage rate, GPU price, contract terms — becomes negotiable.

When does any of this start to matter?

The tool's math reduces gracefully — if your situation doesn't carry the savings, every relevant tile reads small and Section 4's tier breakdown table recommends Standard B2. Three thresholds, any one of which makes the math interesting:

LeverWhere it tips
Hot-storage spend large enough that tiering matters You're paying $50+/TB·mo for hot storage and have a multi-TB dataset that's mostly cold. Any neocloud premium NVMe, any S3 Standard at scale.
GPU spend large enough that 1% idle = real money ~16+ GPUs (~$60/hr cluster). Below this, even significant I/O wait reductions are small absolute dollars — though they may still matter to your training velocity.
Dataset large enough to consider Overdrive throughput ~5 PB. Below this, Standard B2 is the right answer (it's the picker's hard gate). See §6.7.

If none of these is true for you — small inference team, modest dataset, modest GPU bill — Standard B2 still helps, but the dashboard will show modest numbers. We won't manufacture a story.

2. The opportunity — who has this problem

Four customer shapes this tool was built for

SegmentTheir realityWhere B2 helps
Model labs
Training frontier or near-frontier models
Multi-PB dataset, 64–512 rented GPUs. Both a big storage bill and a big GPU bill. Hourly checkpoints producing 100+ GB artifacts. All four tiles carry meaningful numbers. Most likely candidate for the 5+ PB Overdrive conversation.
Fine-tune startups
Taking open-source models, fine-tuning on customer data
1–5 PB dataset, 16–64 GPUs. Heavy storage spend because customer data accumulates; GPU spend moderate. Hot-storage tiering dominates. Egress matters if customer data sits on a hyperscaler.
Inference platforms
Serving model APIs at scale
Many model versions, KV caches, embedding stores. Smaller datasets but lots of artifacts. GPU spend depends on QPS. Model registry + KV cache stories. Storage tiering. Cold-start improvement from B2 + parallel range reads.
AI app companies with heavy data pipelines
RAG, agent platforms, vertical AI products
Less GPU spend, but evolving dataset and embedding stores re-built often. The "pack large objects, ship once" pattern, plus B2 list-rate storage at $6.95/TB·mo.

What they share

The core idea: tenants can't fix the flash bill by tiering harder — their provider doesn't sell tiers that way. But they can change which storage holds which data. Move the cold portion to B2 and the math improves immediately, without changing GPU provider or framework.

3. What this tool shows

This is a Story Mode demo — not a calculator. The hero tiles don't pre-fill from your inputs and sit there waiting to be read. They start blank. When you click "Calculate my savings," the math is derived line by line in a narration panel while the actual pipeline stages run against your B2 bucket. Each tile fills in only after the relevant stages finish — and gets a "validated by Stage X" annotation so a CFO can see exactly which real B2 operation backed each number.

Why not show the numbers immediately? Numbers shown before they're earned feel like marketing. Numbers built up in front of you, tied to real operations, feel like science. CFOs trust the second kind. The ~60-second reveal trades a couple seconds of patience for a number you can defend.

The four hero tiles, and what validates each

TileWhat it answersValidated byWho cares most
Goodput recovered / mo "How much of my GPU bill stops being burned on I/O wait?" Stages 1 (Data Lake) + 2 (Training Prep) — observed multipart upload + parallel range-read throughput against your bucket. Validates the read pattern works; doesn't prove your real workload will hit the projected I/O wait. CTO, infra, CFO
Hot-storage $ avoided / mo "What's the cold portion of my dataset costing me on premium tier?" Stages 3 (Checkpoints) + 4 (Model Registry) + 5 (Inference) — durable object storage for checkpoints, model artifacts, and inference-load files, so these don't need hot-tier replicas. CFO, platform team
Migration egress covered "What does 'move once, train anywhere' save me?" Math only — egress is a hyperscaler line item, not a B2 operation. The tile is honestly labeled "math only · not a B2 stage" so you know this number isn't earned the same way. CTO, compute strategy
Total tenant $ saved · Year 1 The single CFO number — recurring × 12 + one-time migration, in one figure. Sum of the three above, filled last. CFO, CRO

The narration panel

The largest area below the tiles streams the story as it builds. Three line types:

The throughput tier selector (inline at bottom of workload box)

At the bottom of Section 2 (Your workload), below the cross-cloud toggle and Advanced assumptions section, a compact row of pills lets you model the workload at any B2 throughput tier — Standard, 100, 200, 300, 400, 800 Gbps. The page auto-selects the best-Year-1 tier on load and re-snaps when inputs change. Below 5 PB, Standard is gated regardless of math. At 5+ PB, the picker compares Year-1 customer value across all tiers and recommends whichever wins. The recommended tier is outlined in green with a ★; the active tier is highlighted in cyan. Clicking a tier sticks — input edits no longer override your manual choice (until you switch presets).

The per-tier breakdown table (Section 4, appears after the run)

After the analysis completes, Section 4 shows a side-by-side comparison of all six tiers — Storage rate, Storage $ avoided, Goodput recovered, Stall recovered, Rehydration $, and Year-1 customer value for each. The recommended row is highlighted in green; the currently-active selection has a cyan left-border. Click any row to switch the hero tiles and Year-1 Total to that tier without re-running the demo.

The dashboard tells the truth at every tier. If no Overdrive option carries positive Year-1 over Standard, Standard is recommended and the dashboard says so. The 5 PB gate keeps the recommendation conservative below that threshold; above it, the breakdown table is the actual answer at your specific workload.
Why Overdrive auto-selects at 5+ PB: a 256-GPU H100 cluster reads at roughly 256 Gbps to stay fed. Standard B2's 50 Gbps physically can't deliver that throughput, so the cluster sits storage-bound — paying for GPU hours while waiting on I/O. The math now reflects this: at higher cluster sizes, Standard's GPU-goodput savings shrink (capability-capped at ~20% for 256 GPUs), and Overdrive naturally wins.

Why this layout, not a simple storage cost comparison?

A traditional storage calculator only shows the storage line item. But at PB scale your storage bill, your GPU bill, and your egress lock-in tax are all six-figure-or-more numbers — and all three move when you put your dataset on B2. A per-TB rate sheet hides the GPU goodput recovery, hides the egress freedom, and lets you keep believing the storage line is small. The Story Mode reveal surfaces all three side by side, with each tile either validated by a real B2 operation or honestly labeled "math only" so a CFO can trace every number to either a measurement or a defensible formula.

How to read the data-location toggle

The three-button toggle (Hyperscaler / Neocloud / On-prem) selects the customer's environment. Picking a mode then reveals a row of storage-tier sub-chips with verified per-tier $/TB·mo rates that auto-fill the Hot rate. Pick Custom to enter a contracted rate.

Egress field default: $0/GB across all modes. Egress only fires when you flip on the "Cross-cloud or cross-region training?" toggle — same-region S3 → EC2 is free, neoclouds are $0 in-cluster, on-prem is local. See §Cross-cloud / cross-region toggle below for the rate sub-choice.

If you'll stay fully inside your hyperscaler same-region (data on S3, training on EC2/EKS/SageMaker — same region, no cross-region or cross-cloud movement), the Hyperscaler-mode math intentionally shows $0 egress: The cleanest math for an all-AWS same-region team is the storage delta plus goodput, not a migration unlock.

If you're paying AWS egress today to read data across regions or into a different cloud (CoreWeave, Lambda, etc.), turn on the Cross-cloud or cross-region training? toggle to model that ongoing cost. That's the scenario where Hyperscaler mode's egress value prop genuinely lands.

The cross-cloud / cross-region toggle

Below the workload inputs there's an opt-in checkbox: "Cross-cloud or cross-region training? Data and GPUs split across regions/clouds — egress applies. Toggle on only if that's you." It's off by default because most teams train where their data lives (same region, same cloud = $0 egress).

When toggled on, two sub-chip options appear:

The choice auto-fills the Egress field. Override it with your contracted rate (multi-PB customers commonly negotiate 30–50% below list). A "Full dataset reads / mo" input also appears so the math can model recurring egress.

Quick check: (1) Where is your training compute relative to your data? Same region & cloud → leave toggle off. (2) Same cloud, different region → toggle on, pick Cross-region. (3) Different cloud → toggle on, pick Cross-cloud. Set the cycles input to match how often training reads the whole dataset.

4. Demo walkthrough

The whole experience is one big button. Configure your workload first (or use a preset), then click "Calculate my savings" and let the ~60-second reveal play. The path below is what to expect.

Before the run

  1. Pick a data-location mode. The default is Hyperscaler. Switch if your training data lives somewhere else.
  2. Pick a workload preset or edit the eight inputs directly. The defaults are conservative and defensible (see §7).
  3. Hero tiles read "—". That's intentional — they fill in as the analysis runs.

During the run (~60 seconds)

Click "▶ Calculate my savings." The narration panel below starts streaming and the pipeline stages on the right animate in sequence. Roughly this beat structure:

~TimeWhat happensWhat fills in
0–3sNarration: establishing shot — three cost buckets (storage, compute, migration egress) translated into plain English.
3–12sStage 1 (Data Lake) runs. Narration shows observed multipart upload throughput against the bucket.
12–20sStage 2 (Training Prep) runs. Parallel range-read observed throughput.Goodput recovered tile fills in with a count-up animation. Annotation: "validated by Stages 1+2."
20–45sStages 3 (Checkpoints) + 4 (Model Registry) + 5 (Inference) — real PyTorch training, mid-run checkpoint, versioned bundle, cold-start inference.Hot-storage $ avoided tile fills in. Annotation: "validated by Stages 3+4+5."
45–50sEgress math narrated. In Hyperscaler mode: one-time provider exit egress that Backblaze covers for qualifying migrations. In Neocloud mode: the line acknowledges "$0 — egress already free."Migration egress covered (one-time) tile fills in (or shows "$0" muted on neocloud).
50–55sYear-1 Total computed and announced.Total tenant $ saved · Year 1 tile fills in. Animation done.
55s+Section 4 tier breakdown table appears with all six tiers side-by-side and the recommended row highlighted in green."Show the math for the active tier" expander reveals the formula breakdown inline.

After the run

Tiles stay populated. The Calculate my savings button changes to Recalculate. Edit any input — the run resets, blanks the tiles, and a fresh reveal plays. Useful for showing "what if your dataset doubles?" or "what if you moved to a neocloud?" in real time during a customer briefing.

Things to try in the dashboard

  1. Start here: Hyperscaler mode + Mid-scale tenant preset. This is the standard arc and the default opening view.
  2. What if you were on a neocloud? Switch to Neocloud mode and Recalculate. The default sub-chip switches to DFS / persistent $70 and the Hot rate auto-jumps from $23 to $70/TB·mo. Pick Lambda Filesystem $200 to model a premium-only customer, or Custom to enter a contracted blended rate.
  3. Does this apply at smaller scale? Click the Inference + fine-tune preset and Recalculate. The total drops to small numbers; the tier breakdown recommends Standard B2. The tool stays conservative below 5 PB.
  4. See where Overdrive carries: Click Large tenant cluster or Enterprise tenant preset and Recalculate. The tier table shows a higher Overdrive tier recommended (highlighted in green) with a per-tier breakdown of cold storage + GPU + stall + hot freed − premium across all six options.
  5. Test the conservative gate: Set dataset to 3 PB and cluster to 8 GPUs manually and Recalculate. The 5 PB gate triggers Standard regardless of any per-tier math. The breakdown table still shows the per-tier numbers, but the recommendation stays Standard below the threshold.

Verifying nothing is simulated

Every stage event in the narration corresponds to a real B2 operation against your configured bucket. The "✓ Real B2 multipart upload: X MB in Y s → Z Gbps" lines show observed wall-clock numbers, not canned text. You can open the B2 web console while the demo runs and watch the objects land in real time. The hero tile projections apply your inputs through the formulas in §6; the throughputs and byte counts in the narration are measurements from the demo run.

5. Every field, explained — and what we left out

Fields included

Data-location mode

Where your training data lives today. Determines which tile leads and what egress rate gets used.

Workload inputs (8)

FieldWhat it isDefaults
Cluster size (GPUs)Number of GPUs in your training/inference cluster. Multiplied by GPU $/hr to get cluster cost.8 / 32 / 128 / 512 / 1024 by preset
GPU $/hrWhat you actually pay per GPU per hour. H100 typical: $3–4 on-demand, lower on reserved. Use your real rate.$3.50–$4.00
Premium working set (PB)TOTAL premium-flash footprint — raw corpus + processed shards + intermediate features + checkpoints + active model versions + KV cache + hot logs. NOT just the training token corpus, which is usually much smaller (FineWeb is 15T tokens at ~44 TB; LLaMA 1 disclosed ~4.7 TB of curated text).0.1 / 1 / 5 / 25 / 100 by preset
Checkpoint size (GB)One complete checkpoint — weights + optimizer state + meta. MLPerf anchor buttons next to the input auto-fill: 8B → 105 GB, 70B → 912 GB, 405B → 5.29 TB, 1T → 15 TB (MLCommons MLPerf Storage benchmark; ~13 bytes/param).14 (7B) / 80 / 800 (70B) / 1600 (405B)
Checkpoint cadence (min)How often each job flushes a checkpoint. Frontier pretraining: 30–60 min. Safety-critical runs: 15 min. Inference: rarely.60 (hourly)
Hot-tier rate ($/TB·mo)Auto-filled by the storage-tier sub-chip you pick in Box 1. Verified list rates (2026-05-28): Hyperscaler S3 $23 · EBS gp3 $80 · FSx Lustre $140. Neocloud DFS $70 · Lambda FS $200 · AI Object $30. On-prem flash $50 · hybrid $30. Pick Custom to type a contracted/blended rate (multi-PB customers typically negotiate 30–50% below list).$23 (hyperscaler S3) / $70 (neocloud DFS) / $50 (on-prem flash)
Egress rate ($/GB)Default $0 across all modes (same-region S3→EC2 is free; neoclouds are $0 in-cluster; on-prem is local). Only fires when the cross-cloud / cross-region toggle is on. Toggle's sub-choice fills the rate: hyperscaler cross-region $0.02 · cross-cloud $0.09; neocloud cross-region $0 · cross-cloud $0.09; on-prem cross-region $0.02 · cross-cloud $0.05. Override with your contracted rate.$0 (default) / sub-choice-set when toggle on
Cross-cloud or cross-region training? (toggle)Opt-in. Off by default. Flip on only when your training compute and data are split across regions or clouds (e.g. data on AWS, GPUs on CoreWeave). A two-chip sub-choice (Cross-region vs Cross-cloud) reveals so you pick the right egress rate.off
Full dataset reads / mo (shown when toggle is on)How often training touches the entire dataset. Default 1 = single monthly pass. Weekly iteration ≈ 4. Daily ≈ ~30. Subset training counts fractional. The input most worth measuring against your actual cross-cloud egress invoice.1

Advanced assumptions (collapsed by default — click to reveal)

Every previously-hidden modeling knob is now an editable input. Default values are conservative; expose them when you want to verify how the math holds up.

FieldWhat it isDefault
Concurrent training jobsIndependent jobs sharing the storage. Multiplies stall recovery. Frontier labs 4–8; small shops 1.4
Cluster utilization %Fraction of the month the cluster is actively training. 80% is industry typical for production reservations. Affects Goodput recovered proportionally.80%
I/O wait today %Your current I/O wait. Defaults by mode (5% hyperscaler, 2% neocloud, 3% on-prem) but should be measured against your own telemetry to validate for your environment.per mode
I/O wait with B2 %Modeled I/O wait after migrating to B2 + demo pattern. Default 1.5% — conservative end of pre-staged workflows pending pilot telemetry.1.5%
Training run duration (days)How long a single training run takes. Combined with epochs/run + working set, sets required B2 prefetch bandwidth via required = working_set × epochs × 8 / (run_days × 86400). Replaces the old static "B2 Gbps per GPU" knob — physics-derived demand scales with workload size.14 days
Epochs through working set / runHow many times the working set passes through B2 per training run. Modern LLM training: 1–3 typical (Llama 3 reportedly ~2 passes on 15T tokens); fine-tunes 5–10+. Drives total prefetch volume per run.2
Training runs / monthDistinct training runs per month. Each kicks off with a cold-to-hot rehydration phase. Frontier labs 1–4/mo; fine-tune teams 20+/mo. Drives rehydration math.4
Cold % rehydrated / runFraction of dataset that needs to be rehydrated from B2 to local NVMe at the start of each run (the cold portion that wasn't warm from the previous run).5%
Flash retained (serving + pre-stage) %Persistent flash for inference serving + KV cache + active logs, plus pre-staged training shards rotating through flash. Only applies to flash sub-chips (EBS gp3, FSx Lustre, neocloud DFS / Lambda Filesystem, on-prem premium flash, hybrid tiered). Hidden and pinned to 0% for object sub-chips (S3 Standard, AI Object warm) — those customers have no flash to retain. Raise above 20% if heavy serving or many concurrent runs.20% (flash) / 0% (object)
Overdrive uplift (pt) pilotOptional additional I/O wait reduction at Overdrive tiers. Defaults to 0. Bandwidth-side benefits already flow through cap, stall, and rehydration. Set above 0 ONLY when pilot telemetry from a specific customer shows tail-latency / RPS / burst improvements not captured by bandwidth alone. Click the ⓘ icon on the dashboard input for the full rationale + pilot-measurement protocol.0 pt

Fields we deliberately left out

Not modeledWhy
Engineering time to migrateReal and significant. Depends on your stack, framework, and codebase. We don't claim to know yours. Subtract it from the one-time migration savings in your own analysis.
Framework changes / dataloader refactorSome patterns (random-access tensor reads) need NVMe. The B2 demo pattern assumes you can pack and parallel-range-read — true for most training workloads, not for every research code path.
Per-job concurrency varianceDemo assumes one steady scenario; real fleets have peak/trough. Use your average, not your peak.
Latency claims for compute partnersB2's verified partner egress is free, but latency depends on routing. We don't quote sub-10 ms unless your specific path has Private Network Interconnect.
S3 IA / Glacier / Wasabi comparisonsDifferent tradeoffs around retrieval time, minimum storage duration, and per-request fees. B2 is the closest one-for-one Standard-tier swap; the other archival tiers fit different shapes.
Compliance, sovereignty, contract overheadReal factors that may favor or disfavor B2 in specific regulated environments. Out of scope for a cost calculator.

6. The math — formulas and derivation

All numbers in the dashboard derive from your inputs through the formulas below. Every formula here matches the JavaScript in the dashboard exactly — you can verify by reading the source.

The premise behind the math. Public sources establish that storage performance materially affects GPU economics: The math below converts these established storage-side levers into per-tier dollar impact for your training pipeline. Defaults are positioned mid-range conservative; the Configure panel lets you tune every input to your fleet's measured telemetry.

6.1 — Goodput recovered / mo

baseline_delta  = max(0, io_wait_baseline − io_wait_with_B2) / 100
overdrive_extra = (tier > Standard ? 0.032 : 0)         [pilot/SE-adjusted; see note]
total_delta     = baseline_delta + overdrive_extra
gpu_reclaimed   = cluster_GPUs × $/hr × total_delta × cap × training_hrs_per_mo

What it measures

GPU cluster time that stops being paid for while idle, because B2's "pack large objects, parallel-range-read to local NVMe" pattern keeps the GPUs fed instead of waiting on naive object streams.

About the Overdrive uplift (defaults to 0): the math has an optional additional I/O wait reduction term that applies only at Overdrive tiers — exposed in the Advanced inputs. It defaults to 0. The reason: Overdrive's bandwidth-side benefits are already captured by the capability cap (higher tier → higher cap → more goodput recovered), the stall recovery term (faster checkpoint flush), and the rehydration term (faster cold-to-hot pull). Adding an uplift on top would double-count.

What WOULD justify a non-zero uplift is RPS / tail-latency / burst behavior that bandwidth alone doesn't capture — for example, pilot measurements showing Overdrive delivers tighter p99 read latency, fewer rate-limit backoffs, or better multi-tenant isolation than Standard. This shows up most in inference fleets, fine-tunes on many-small-file datasets, and metadata-heavy workflows. For bulk LLM pretraining, leave it at 0.

How to set from pilot data: customer instruments goodput on their existing storage, swaps to the Overdrive tier under evaluation, runs the same workload pattern, and measures I/O wait % before vs after. The delta (if positive) is the uplift to enter. Cite the customer (under NDA) when defending it externally.

Public sources (AWS, Azure, VAST telemetry) establish that storage performance materially affects GPU economics — they do not validate a universal Overdrive-vs-Standard uplift percentage. The 0 default keeps the math honest until pilot data exists.

Baseline I/O wait values

ScenarioBaseline I/O wait %With B2 patternWhy this number
Hyperscaler-resident dataset (naive S3 streaming)5.0%1.5%Mid-range of the 3–8% envelope reported across published sources (see below). Modeled outcome, not measured on your specific workload — instrument your real wait via iostat / NCCL profiling / framework profiler to validate for your environment.
Neocloud-resident dataset (already on fast PV)2.0%1.5%Premium persistent volumes start better, so the delta is smaller. The pattern still helps but the dollar value is small — storage and egress dominate this audience's economics.
On-prem flash (well-tuned local storage)3.0%1.5%Modeled outcome — between hyperscaler streaming and neocloud premium PV. Same measurement caveat applies.

Where the baseline ranges come from

The defaults above sit in the middle of public ranges reported by storage and ML infrastructure vendors. They are not measurements of your specific workload — they describe what well-tuned vs un-tuned training-storage stacks typically look like in the industry.

SourceWhat it reports
AWS S3 Mountpoint & S3 Express One Zone performance documentationCold-stream I/O wait on training workloads in the high-single-digit range without local NVMe staging. Throughput improves dramatically with prefetch + parallel range reads (the B2 pattern this tool models).
MLPerf Storage benchmark (MLCommons)Uses I/O wait as the proxy for "good vs bad storage." The 2–8% envelope is what separates well-tuned from un-tuned object-storage stacks across published submissions.
VAST Data & WekaIO public whitepapers + production telemetryCite measured I/O wait reductions on un-tuned object-storage backed training, with starting points in the same 3–10% band depending on shard pattern.
Hyperscaler + academic telemetry (AWS reports, Azure reports, Meta SPEC paper, Google Pathways)Show I/O wait can swing 1–15% depending on dataset size, shard pattern, dataloader concurrency, and checkpoint cadence. Azure specifically reports checkpoint overhead averaging 12% of total training time, up to 43% in some large-model scenarios.
What we don't have: Backblaze does not yet publish a customer-pilot data point of measured baseline I/O wait. The defaults are positioned in the middle of the public envelope as conservative starting points, but the only way to validate the number for your specific workload is to instrument your training run. The Advanced input — and the visible amber "default — measure yours" hint on the dashboard — is where you override the default with measured data when available.
Sensitivity range in the executive brief. Because the I/O wait baseline is unmeasured per-customer, the brief.html artifact shows Year-1 customer value as a range across the 3–8% public envelope (low end / mid / high end) in addition to the single point estimate at the configured baseline. A CFO reading the brief sees honest variance rather than a precise-looking single number — your real number sits somewhere in the range, and the brief explains how to measure it.

Worked example — Mid-scale tenant on hyperscaler

64 GPUs × $4/hr × ((5% − 1.5%) / 100) × 720 hr/mo
  = 256 × 0.035 × 720
  = $6,451 / mo

6.2 — Storage $ avoided / mo

moved_fraction  = 1 − flash_retained%
storage_avoided = max(0, dataset_TB × moved_fraction × ($hot_rate − $tier_rate / TB·mo))

What it measures

The bulk of your workload migrates to B2 at the tier the picker recommends — Standard, or one of the five Overdrive tiers. No cold/hot split inside B2. One bucket, one rate, one bill — for the migrated portion.

flash_retained is conditional on the sub-chip storage type.

Numerical effect at 5 PB Mid-scale:

The flash-displacement story is much bigger when the customer is paying for actual flash. For S3-resident customers, the story is "cheaper object storage + free egress to compute partners + recovered goodput" — not flash displacement.

Why no cold/hot split inside B2. The migrated workload uses one B2 rate end-to-end; modeling separate rates for hot vs cold slices would add complexity without changing how the migration actually happens. The calculator stays single-rate for the migrated portion.

Worked example — Mid-scale tenant on hyperscaler, Standard tier (20% retained)

moved           = 5,000 × 0.80 = 4,000 TB
delta_rate      = $23 − $6.95 = $16.05 / TB·mo
storage_avoided = 4,000 × $16.05 = $64,200 / mo

Worked example — Mid-scale tenant on hyperscaler, 200 Gbps Overdrive (20% retained)

moved           = 4,000 TB
delta_rate      = $23 − $17 = $6 / TB·mo
storage_avoided = 4,000 × $6 = $24,000 / mo

Worked example — same workload on neocloud (blended $75/TB), 200 Gbps Overdrive (20% retained)

moved           = 4,000 TB
delta_rate      = $75 − $17 = $58 / TB·mo
storage_avoided = 4,000 × $58 = $232,000 / mo

Note that Standard always produces the biggest storage savings (it has the cheapest per-TB rate). Overdrive tiers win on Year-1 customer value when their GPU goodput + stall + rehydration benefits exceed the storage-savings gap to Standard. The picker computes this balance and recommends accordingly — see §6.6.

If the B2 rate is higher than your current hot rate (e.g., 800 Gbps at $29 vs hyperscaler $23), storage_avoided clamps to $0 — paying more per TB than the status quo doesn't produce storage savings.

6.3 — Migration egress covered (one-time)

egress_one_time = dataset_TB × 1,000 GB/TB × $egress_rate

What it measures

The one-time provider exit egress cost — calculated as working_set_PB × 1,000,000 × egress_rate_per_GB. For qualifying migrations, Backblaze covers this amount as a switching incentive, so customer cash migration egress is modeled as $0. The tool labels this tile "Migration egress covered (one-time)" — it is value the customer receives, but it is NOT recurring storage savings and should not be confused as such.

Concept names (Rev 2):
provider_exit_egress = working_set_PB × 1,000,000 × egress_rate_per_GB
migration_egress_covered = provider_exit_egress // Backblaze-covered for qualifying migrations
customer_cash_migration = max(0, provider_exit_egress − migration_egress_covered) = $0
year1_recurring_savings = recurring_monthly × 12
year1_customer_value = year1_recurring_savings + migration_egress_covered

Worked example — 5 PB hyperscaler migration at AWS rates

provider_exit_egress     = 5 × 1,000,000 × $0.09 = $450,000
migration_egress_covered = $450,000  (Backblaze covers for qualifying migrations)
customer_cash_migration  = $0
Why this isn't $450,000/mo: the migration happens once. Going forward, training reads from B2 to verified compute partners (CoreWeave, Equinix Metal, Vultr, phoenixNAP, etc.) are $0/GB. The recurring benefit — "I don't have egress fees in my forecast anymore" — is hard to put in a tile but easy to put in a CFO conversation.
One-time migration egress coverage: The calculator estimates the customer's provider exit fee as working_set_PB × 1,000,000 × egress_rate_per_GB. For qualifying migrations, Backblaze covers this amount, so customer cash migration egress is modeled as $0. The covered amount is shown as "Migration egress covered (one-time)" and may be included in "Year-1 customer value incl. migration coverage," but it is not recurring storage savings.
Migration project costs such as engineering time, validation, dual-running, security review, or retraining remain outside the model. The Year-1 customer value tile is gross of these costs, not net.

All modes — egress default is $0

The Egress field defaults to $0 across Hyperscaler, Neocloud, and On-prem modes. Same-region S3 → EC2 is free; neoclouds are $0 in-cluster; on-prem is local. The Migration egress covered tile mutes to $0 in this default configuration.

Egress only fires when the "Cross-cloud or cross-region training?" toggle is on. The toggle's two sub-chips auto-fill the rate: Cross-region (hyperscaler $0.02 · neocloud $0 · on-prem $0.02) or Cross-cloud (hyperscaler $0.09 · neocloud $0.09 · on-prem $0.05). Override either rate with the customer's contracted egress.

Cross-cloud / cross-region training (opt-in)

If data and GPUs are split across regions or clouds (e.g. data on AWS, training on CoreWeave, or data in us-east-1 and GPUs in us-west-2), you pay recurring egress every time training reads the dataset. Flipping the toggle reveals a region/cloud sub-choice plus a "Full dataset reads / mo" input, and the math becomes:

ongoing_egress_per_mo = dataset_TB × 1,000 GB/TB × cycles_per_mo × $egress_rate

Worked example — Mid-scale tenant on AWS S3, training on CoreWeave (cross-cloud), 4 reads/mo

5,000 TB × 1,000 GB/TB × 4 cycles × $0.09/GB
  = $1,800,000 / mo recurring

This adds to the recurring monthly savings (alongside GPU and hot-storage), and the Year-1 Total now includes ongoing × 12 + migration once. At this scale: ($1.8M × 12) + $450K = $22.05M Year 1.

Worked example — same workload but cross-region (us-east-1 → us-west-2) instead of cross-cloud

5,000 TB × 1,000 GB/TB × 4 cycles × $0.02/GB
  =   $400,000 / mo recurring

Cross-region is 4.5× cheaper than cross-cloud per GB, but it's still real recurring egress — and still meaningful at multi-PB scale.

Don't flip this on without asking. Most customers train where their data lives. Surfacing a $20M+/yr egress savings to a customer who doesn't actually do cross-cloud or cross-region training discredits the rest of the dashboard. The toggle is opt-in for a reason — confirm the scenario in discovery first.

6.4 — Year-1 customer value (incl. migration coverage)

year1_recurring_savings = recurring_monthly × 12
year1_customer_value    = year1_recurring_savings + migration_egress_covered

Year-1 customer value puts the recurring savings (storage, goodput, stall, rehydration) into a single annual number a CFO can drop into a forecast row, plus the one-time Backblaze-covered migration egress that lands the customer at B2 in the first place.

Not "net." The tile is labeled "Year-1 customer value · incl. migration coverage" — NOT "Year-1 net" (the legacy Rev 1 label). It does not subtract customer-paid migration project costs (engineering time, validation, dual-running, security review, retraining). Those costs are real but outside the model. Finance reviewers should treat this as gross customer value, not a P&L line.

Worked example — Mid-scale tenant (128/5 PB) on Hyperscaler, picker → Overdrive 200 Gbps

storage_avoided       = 4,000 TB × ($23 − $17)                = $24,000 / mo
goodput_recovered     = $512/hr × 3.5pt × 1.0 × 576 ≈ $10,322 / mo  (uplift=0)
stall                                                          ≈ $31,437 / mo
rehydration_reclaimed                                          ≈ $17,062 / mo
recurring_monthly                                              ≈ $92,261 / mo
year1_recurring_savings  = recurring × 12                       ≈ $1,107,132
migration_egress_covered = 5 × 1,000,000 × $0.09               = $450,000
year1_customer_value     = year1_recurring + migration_covered ≈ $1.56M ✅

Year 2 onward is the recurring number alone (~$1.1M/yr). The migration egress coverage doesn't repeat, but the freedom it bought — switching GPU providers any time at $0/GB to verified partners — does.

Worked example — same workload on Neocloud (blended $75), picker → Overdrive 200 Gbps

storage_avoided          = 4,000 TB × ($75 − $17)              = $232,000 / mo
goodput_recovered        = $512/hr × 0.5pt × 1.0 × 576 ≈ $1,475 / mo  (uplift=0)
stall                                                           ≈ $31,437 / mo
rehydration_reclaimed                                           ≈ $17,062 / mo
recurring_monthly                                               ≈ $291,414 / mo
× 12 months                                                = $3,884,904 / yr
year1_recurring_savings                                         ≈ $3,496,968
migration_egress_covered = 5 × 1,000,000 × $0                  = $0  (neocloud free egress)
year1_customer_value                                            ≈ $3.50M ✅

Neocloud case has no migration egress (the provider doesn't charge to leave). Year-1 customer value is the recurring monthly stream alone — bigger storage gap to B2 (the customer was paying $75/TB·mo before) drives the larger result.

6.5 — Rehydration $ reclaimed (run-start cold-to-hot pull)

cold_rehydrate_GB        = working_set_TB × 1000 × (rehydrate_pct / 100)
rehydrate_std_hrs        = (cold_rehydrate_GB × 8) / 50 / 3600
rehydrate_tier_hrs       = (cold_rehydrate_GB × 8) / tier_gbps / 3600
rehydrate_delta_hrs      = max(0, rehydrate_std_hrs − rehydrate_tier_hrs)
raw_rehydrate_month_hrs  = rehydrate_delta_hrs × runs_per_month
rehydrate_month_hrs      = min(raw_rehydrate_month_hrs, training_hrs_per_month × 0.30)
rehydrate_reclaimed      = rehydrate_month_hrs × cluster_hourly       // $/mo

Cap prevents unrealistic claims where Standard rehydration would exceed a feasible monthly operating window.

What it measures

Every training run starts by pulling its cold portion of the dataset from durable storage into local NVMe so GPUs can read it at line rate. During that pull, GPUs are allocated and billed but idle. Standard B2 (50 Gbps) takes a long time on big datasets; Overdrive at 200–800 Gbps cuts that wait dramatically. This formula values the idle GPU $ avoided per month.

Worked example — Large tenant cluster (256 GPUs, 25 PB, 4 runs/mo, 5% rehydrated)

cold_GB_per_run    = 25,000 × 1000 × 0.05 = 1,250,000 GB (1.25 PB)
rehydrate_std_hrs  = (1,250,000 × 8) / 50  / 3600 ≈ 55.6 hr/run
rehydrate_300_hrs  = (1,250,000 × 8) / 300 / 3600 ≈ 9.3 hr/run
delta_hrs          = 55.6 − 9.3 = 46.3 hr/run
rehydrate_$_per_mo = 46.3 × 4 runs × $1,024/hr
                  ≈ $189,600 / mo
Why this changes the picker. Without this term, the picker's tier choice depends almost entirely on hot-displacement math, which at hyperscaler hot rates becomes net-negative above 100 Gbps. The picker would land at 100 Gbps even for huge clusters because the model didn't see the run-startup idle cost. Adding rehydration makes bandwidth requirements scale with dataset, not just GPU count — matching real-world experience.
Backblaze production reference for the rehydration tier math. The Overdrive tiers' deliverable throughput is grounded in production customer references: Backblaze has customers driving 100+ Gbps sustained to B2 buckets using the same operational pattern this tool models (packed large objects, parallel range reads to local NVMe). The rehydration formula above assumes the configured tier delivers its rated throughput at sustained line rate — backed by that production reference, not by demo-bench numbers.

Assumptions to validate

6.6 — Per-tier ROI math + recommendation

The dashboard's Section 4 (Box 4) shows a side-by-side breakdown of all six tiers — Standard plus five Overdrive options — and highlights the recommended tier in green. The recommendation is whichever tier carries the highest Year-1 customer value at your configured workload. The math is honest at every tier: if no Overdrive tier nets positive at your scale, Standard wins and the dashboard says so.

The four levers per Overdrive tier

required_aggregate_gbps = working_set_GB × epochs_per_run × 8 / (run_duration_days × 86400)
cap = min(1, tier.gbps / required_aggregate_gbps)

storage_avoided    = max(0, dataset_TB × ($hot_rate − tier.storagePerTB))
gpu_goodput        = cluster_$/hr × wait_delta × cap × 720
stall              = (Δ_ckpt_min / 60) × ckpts/hr × concurrent × 720 × cluster_$/hr   [Overdrive only]
rehydrate_reclaimed = Δ_rehydrate_hrs × runs/mo × cluster_$/hr

recurring_monthly = storage_avoided + gpu_goodput + stall + rehydrate_reclaimed
                  + cross_cloud_egress (if any)
year_1            = recurring_monthly × 12 + one_time_migration
About the capability cap derivation — and why this differs from the Neocloud tool. This tenant tool derives the bandwidth demand from your training pipeline (working_set × epochs × 8 / run_duration_seconds) — i.e., the throughput required to finish your run in the time you budgeted. It uses 0.3 Gbps/GPU as the underlying default per-GPU constant in the source, reflecting pre-staged workflows where the GPU reads from local NVMe most of the time and only occasionally pulls from B2.

The Neocloud operator tool answers a different question — "can this tier feed my whole cluster at line rate?" — and uses cluster_gpus × 1.0 Gbps/GPU as a conservative worst-case capacity envelope. Both arrive at a defensible cap; they start from different observable inputs because their audiences plan against different constraints (your pipeline runtime vs. their fleet capacity).

Worked example — Mid-scale tenant on Neocloud at $75/TB blended, Overdrive 200 Gbps

Inputs: 128 GPUs, $4/hr, 5 PB working set, $75 hot rate, 20% flash retained,
         4 runs/mo, 5% rehydrated, 800 GB ckpt @ 60 min, OD uplift 0 (default)
moved_TB                 = 5,000 × 0.80                       = 4,000
storage_avoided          = 4,000 × ($75 − $17)                = $232,000 / mo
gpu_goodput              = baseline 0.5pt × cap 100% × hrs   ≈ $1,475 / mo  (uplift=0)
stall                    = (1.6/60) × 1 × 576 × 4 × $512     ≈ $31,437 / mo
rehydration_reclaimed    = capped at 30% × 576 (not binding)  ≈ $17,062 / mo
recurring_monthly                                              ≈ $281,974 / mo
year1_recurring_savings  = recurring × 12                      ≈ $3,383,688
migration_egress_covered = 5 × 1,000,000 × $0                  = $0  (neocloud)
year1_customer_value     = year1_recurring + migration_covered ≈ $3.38M ✅

Worked example — Sub-5 PB workload at neocloud → picker recommends Standard (gate)

Inputs: 32 GPUs, $3.50/hr, 1 PB working set, $75 hot rate, 20% flash retained
moved_TB                 = 800
storage_avoided @ Std    = 800 × ($75 − $6.95)                = $54,440 / mo
gpu_goodput              = baseline only (no OD uplift)       ≈ $403 / mo
stall                    = $0 (Standard tier)
rehydration_reclaimed    = $0 (Standard tier — baseline)
recurring_monthly                                              ≈ $54,843 / mo
year1_recurring_savings                                        ≈ $658,116
migration_egress_covered = $0  (neocloud)
year1_customer_value                                           ≈ $658K ✅
Picker recommendation: Standard B2 (gated by 5 PB threshold).
   Below 5 PB, Overdrive premium doesn't pay back regardless of math.

The picker uses two physics gates (5 PB working-set threshold + 50% capability floor) and then optimizes for highest Year-1 customer value among eligible tiers. At small workloads, Standard naturally wins because cold-storage savings dominate and Overdrive's higher per-TB rate erodes them.

6.7 — Why 5 PB is the gate

The picker uses a hard dataset < 5 PB → Standard gate before running the per-tier math. Above 5 PB, the picker compares Year-1 customer value across all six tiers and recommends the highest. Below 5 PB, no tier-vs-Standard math is consulted — the recommendation is Standard regardless.

This is an SE-judgment threshold, not a derivation from pricing math. Below 5 PB, the cold-storage savings dominate any per-tier differences and the marginal Year-1 gains rarely justify the operational complexity of moving to an Overdrive contract. Above 5 PB, the throughput-driven benefits (GPU goodput uplift, checkpoint stall, hot-data displacement) typically start outpacing the per-TB premium on the hot working set, and the math sorts out which specific tier wins.

Section 4's breakdown table still shows the math at every tier — including below 5 PB — so you can see the marginal Year-1 differences for yourself. But the recommended pill stays on Standard until the dataset crosses 5 PB.

The 5 PB gate is a guardrail against overstating value. Some workloads under 5 PB might net positive at Overdrive 100 (the math will show this in the breakdown table). The gate exists to keep recommendations conservative at smaller scales, where the migration + ops cost would likely consume the marginal gain. If you want to model an Overdrive tier anyway, click the tier pill manually — sticky overrides survive.

7. Assumptions to validate against your workload

Every projection in the tool is grounded in a defensible assumption. Before you defend the numbers externally — to your CFO, to a partner, to a procurement team — validate these five.

AssumptionDefaultHow to verifyIf different
I/O wait baseline (% of GPU time) 5% (hyperscaler) / 2% (neocloud) iostat, nvidia-smi dmon, NCCL profiling, or your framework's profiler. Watch for "data loader wait." Dial the GPU tile up or down by the same ratio. If your real wait is 2%, halve the projected number.
I/O wait with B2 pattern 1.5% Modeled outcome pending pilot telemetry. Run the demo against your bucket with a representative shard size. The terminal shows real bytes/s. Same scaling — adjust the delta in the formula. Below 1.5% is rare for any object-storage-backed pattern even with pre-staging.
Flash retained for serving + pre-stage % 20% (flash sub-chips) / 0% (object sub-chips) Persistent flash for inference serving + KV cache + active logs, plus pre-staged training shards. Only applies to flash sub-chips (EBS, FSx, neocloud DFS / Lambda FS, on-prem flash, hybrid). For object sub-chips (S3 Standard, AI Object warm), the calculator pins to 0% because the customer has no flash to retain — 100% of the working set migrates. Big lever for flash customers. The retained slice doesn't move to B2 and produces no savings — it stays at your current rate. For object customers the input is hidden because it doesn't apply.
Hot-tier $/TB·mo (BLENDED) $23 hyperscaler / $75 neocloud / $50 on-prem BLENDED weighted-average across your current tiers, NOT premium-only list. Look at your bill. Use the line-item rate, not list. Linear effect on storage tile. If you're already on an IA tier ($12/TB·mo) the delta is smaller; the migration-coverage story still stands.
Overdrive uplift (pt) pilot 0 pt (default) — set above 0 only when pilot data quantifies it Additional I/O wait reduction at Overdrive tiers. NOT industry-validated — substantiate against your own telemetry to validate for your environment. Set 0 for the conservative case. Linear effect on Goodput tile at Overdrive tiers; zero at Standard.
Rehydration cap 30% of monthly training hours Caps the per-month rehydration recovery so the "Standard would take this long" comparison can't exceed a feasible operating window. Bounds the Rehydration tile in worst-case scenarios. Binding only at very large working sets.
GPU $/hr $3.50–$4.00 Your invoice. Reserved 1-year H100 commitments are often half on-demand. Linear effect on GPU tile.
Full dataset reads / mo (cross-cloud only) 1 (single monthly pass) Your actual cross-cloud egress invoice is the gold standard — divide it by (dataset_GB × $egress_rate) to back out the real number. Otherwise: 4 ≈ weekly retraining, 30 ≈ daily, fractional for subset training. Linear effect on the cross-cloud egress tile. This is the most assumption-heavy input in the model and the most worth measuring.
Defaults are intentionally cautious (5% I/O wait, 32% hot, $4/hr on-demand H100). If your real numbers are better, the tile values will scale up — that's a more durable conversation than dialing the inputs aggressively at the start.

8. FAQ — what your engineers and CFO will ask

Why does the tool make me wait 60 seconds before showing the numbers?

So you can see them being earned. The math is narrated step by step while real B2 operations validate the pattern claims (pack large objects, parallel range reads, checkpoint to object storage, cold-start from B2). Each tile fills in only after the stages that back it have finished. The cost is ~60 seconds of patience; the benefit is a number a CFO can trace to a real operation, not a vendor sheet. If you want the calculator-only experience, ask Backblaze marketing about the standalone tool — it strips the demo execution and is meant for top-of-funnel landing-page use.

Are these numbers real or simulated?

Both, honestly labeled. The B2 operations are real — actual multipart uploads, actual parallel range reads, actual PyTorch training, real bytes-moved cost meter. The narration shows observed throughput (e.g. "150 MB in 1.2s → 7.5 Gbps"). The dollar values in the hero tiles are projections — your inputs (dataset PB, cluster, rates) applied through the formulas in §6. Three of the four tiles get "validated by Stage X" annotations because real ops back the pattern claims. The Egress tile is annotated "math only · not a B2 stage" because egress is a hyperscaler line item, not something B2 demonstrates by executing.

Can I re-run with different inputs without reloading the page?

Yes. After a run completes, the button changes to "Recalculate." Change any input or switch modes and click again — the tiles blank, the stages re-queue, and the reveal plays fresh. This is how you do "what if we were on a neocloud?" or "what if the dataset doubled?" comparisons live in front of a customer.

I'm already on a neocloud. Egress is free for me. Is there anything for me here?

Yes — the storage-tiering story. Neoclouds charge for persistent NVMe at $50–$100+ per TB·mo. Even a 1 PB dataset with 70% cold-eligible data is ~$65K–$130K/mo of premium storage you don't need to keep hot. Switch the toggle to Neocloud on the dashboard; the egress tile dims and the storage tile carries the story. We don't manufacture an egress story you can't claim.

Why does the tool sometimes recommend Standard even at 5 PB+?

Because the 5 PB gate just unlocks Overdrive consideration — the actual per-tier math then has to validate it. If your cluster is small, your checkpoint cadence is infrequent, or your dataset is mostly cold, the per-TB Overdrive premium on displaced hot data outpaces the throughput uplift it buys back. Section 4's breakdown table shows this explicitly — each tier's storage cost, GPU $/mo, stall $/mo, hot-freed $/mo, premium, and Year-1 customer value all visible side-by-side. Whichever tier carries the highest Year-1 customer value is the recommendation. We'd rather lose an oversold deal than win one we can't defend three months later.

What about S3 IA, Glacier, or Wasabi? Aren't those also cost-effective?

The closest one-for-one swap for Standard-tier training storage is B2. The others fit different shapes.

The demo only uploads a few hundred MB. Why are you projecting petabyte numbers?

Because the bytes that flow during the live demo are real — multipart uploads, parallel range reads — but the data volume would be impractical at full scale during a 60-second demo. The point is to validate the pattern works against your bucket. The projection then applies your real PB/GPUs/rates through the formulas in §6. You can verify each formula by reading the dashboard's source — it's about 50 lines of plain JavaScript.

What's the migration cost that isn't shown here?

Engineering time. Refactoring your dataloader to use parallel range reads instead of a naive S3 stream. Possibly changing checkpoint format to consolidate per-rank files into a single tarball before upload. For most teams this is one engineer-week — meaningful, but small compared to the migration egress fee on a multi-PB dataset. We don't model it because it varies wildly by stack.

Does this work with PyTorch / JAX / TensorFlow / DeepSpeed?

Yes — the demo uses PyTorch but the pattern (pack large objects, parallel range read, work on local NVMe) is framework-agnostic. Your dataloader changes; nothing else does.

Can I trust the I/O wait baselines (5% / 2%)?

They're conservative defaults — many real workloads stall worse, especially on hyperscalers with naive single-stream object reads. If your real wait is 8%, your reclaimed GPU $ scales up linearly. If yours is 2% on hyperscaler (well-optimized dataloader), scale it down. The dashboard inputs let you set this; we suggest measuring before relying on the result for budget decisions.

What does the Clean up button do? Is it safe?

It deletes every object under the configured DEMO_PREFIX in the bucket and resets the dashboard state. Use it between customer demos so cost numbers start fresh. It will not touch anything outside that prefix. Recommended: point the demo at a dedicated bucket, not one with production data.

How does this relate to the goodput tool at /goodput/?

The goodput tool is the same kit aimed at infrastructure companies that own the flash (neoclouds, GPU clouds, hyperscalers building AI platforms). Their economic levers are different: they amortize flash hardware, sell tiering capacity to their tenants, and can buy multi-PB Overdrive tiers commercially. This tool is for the tenants on the other side of that relationship — the GenAI teams paying for that flash by the GB-month. Same B2; different math.

Is there a "just the calculator" version for our marketing website?

Yes — a stripped-down, embeddable version of the same math is being built as a separate marketing asset. No B2 backend, no stage execution, no 60-second reveal — just inputs → numbers → "talk to sales" CTA. Right for top-of-funnel pages where a prospect needs an instant directional projection before they're ready for a real demo. This tool (Story Mode) is for the deeper conversation that comes after.

9. How Backblaze B2 fits into the inference layer

If you're building or operating an inference service for AI models, this section explains where Backblaze B2 belongs in your architecture — and where it doesn't.

The core point

Backblaze B2 is not the live GPU KV cache. It does not sit in the token-by-token serving path. The hot inference path is latency-sensitive — every millisecond matters — and requires GPU memory, CPU memory, Redis-like in-memory systems, or local NVMe.

B2 sits one layer outside the hot path. It is durable object storage for the artifacts that feed and rebuild the faster cache tiers: model bundles, prewarm data, cache snapshots, manifests, common prompt prefixes, serialized prefix-cache artifacts (if your inference stack supports them), and disaster-recovery copies of cache-related data.

The layers, from hottest to coldest

Request → Load balancer → Inference workers
                            ├─ GPU KV cache:        hottest, fastest
                            ├─ CPU / local NVMe:    warm tier
                            ├─ Cluster cache:       shared warm tier
                            └─ Backblaze B2:        durable cold tier
                                                    prewarm source, DR copy

Each layer further from the GPU is slower but more durable and lower-cost. B2 sits at the bottom of that stack — durable, geographically distributed, and meant to be read at startup or recovery time, not on every token.

B2's role during normal inference

B2 is read at scale-out, startup, or recovery — never in the per-request critical path.

KV cache prewarm

Cold-starts hurt most when the most common prompts haven't been seen yet. B2 helps by keeping the source of truth for prewarm material:

  1. Store common system prompts, tool schemas, RAG documents, tenant-specific context bundles, and a prewarm manifest in B2.
  2. When a node or cluster starts, it downloads the manifest from B2.
  3. The inference workers run prefill jobs for the most common prefixes listed in the manifest.
  4. The resulting KV cache lands in GPU / CPU / local NVMe cache — not back in B2.
  5. Cold-start pain on common prompts is reduced because the cluster is warm before it accepts traffic.

The output of prewarm lives in the hot tier. B2 supplies the inputs.

For inference stacks that support exporting KV blocks

Some serving frameworks can serialize and restore KV cache blocks. This pattern is framework-specific — vLLM has discussed it, others vary. When it's available:

For disaster recovery

When an inference fleet, region, or provider goes down, B2 is where you recover from:

New fleet nodes coming up after a failure pull from B2 to rebuild their warm tiers. The model loads from B2. The prewarm material loads from B2. The KV cache repopulates as the prefill jobs run.

What B2 is and isn't, plainly

B2 is:
B2 is not:
B2 reduces: