Engineering metric

Uptime %. The promise you made customers — measured in nines.

Uptime is the percentage of time your service is actually available. It looks like a simple reliability number, but it's really a promise: the one you wrote into your SLA, and the one customers quietly judge you against every time the product is slow or down. The numbers sound close — 99%, 99.9%, 99.99% — but the gaps between them are enormous, and every minute of downtime is customer trust spent. Understanding what those nines actually cost, and setting one you can keep, matters more than chasing the highest number you can put on a page.

What it is

The percentage of time your service is available, over a period. The promise in your SLA, made measurable. Usually expressed in "nines" — 99.9% is "three nines" — because each additional nine is roughly a 10x reduction in allowed downtime.

Measurement period

Monthly / rolling.

Measured against your SLA window — monthly or rolling. The "nines" translate to concrete downtime budgets: 99.9% is about 8.8 hours a year, 99.99% is about 53 minutes.

Formula
Uptime (available time)
Total time in period
× 100

Each extra nine is ~10x less downtime. The leap from 99% to 99.99% is far larger than it looks.

When to review

Monthly.

Review against your SLA monthly — and watch incidents in real time. A breached SLA is customer trust spent and, often, contractual credits owed.

Why it matters

Every minute of downtime is trust spent.

Uptime is the most visceral promise a SaaS company makes. When your product is down, your customer's work stops — and unlike a missing feature or a slow response, downtime is felt immediately, by everyone, at once. It's the fastest way to spend the trust you spent years building, and for customers who run their business on your product, repeated downtime is one of the most reliable reasons they start looking for an alternative. The number is engineering's to defend, but its consequences land squarely in retention.

The deceptive part is the math. The difference between 99% and 99.9% sounds trivial — a tenth of a percent — but 99% allows about three and a half days of downtime a year, while 99.9% allows under nine hours. 99.99% allows under an hour. Each additional nine is roughly a tenfold reduction in tolerable downtime, and roughly a stepwise increase in the engineering investment required to achieve it. That's why uptime is a business decision as much as a technical one: more nines cost real money and effort, and the right target is the one that matches what your customers actually need and what you can reliably deliver.

99% sounds close to 99.9% — but one allows three and a half days of downtime a year, the other under nine hours. Each nine is a 10x leap, in both reliability and cost.

Benchmarks

The nines — and what each one costs in downtime.

These are the standard availability tiers, with the annual downtime each allows. For a $1–10M SaaS, three nines (99.9%) is a credible, defensible target; four nines is strong but expensive. Set the SLA you can actually keep — a promise you breach costs more trust than a humbler one you honor.

At riskBelow 99%
More than three and a half days of downtime a year. For a product customers run their business on, this is a churn driver and a trust problem. Below 99% means availability isn't being engineered for — it's being left to chance, and customers feel it.
Watch99% · two nines
About 3.65 days of downtime a year. Tolerable for low-stakes or internal tools, but thin for SaaS customers depend on daily. A reasonable starting point, but most SMB SaaS should be working toward three nines.
Healthy99.9% · three nines
Under nine hours of downtime a year — the credible, defensible standard for most SMB SaaS. Customers feel the product is reliable, and the SLA is achievable without heroic infrastructure spend. The right target for most $1–10M companies.
Strong99.99%+ · four nines
Under 53 minutes of downtime a year — genuinely strong, and genuinely expensive. Worth it for mission-critical products and enterprise contracts that demand it; over-investment for many SMB tools. Don't promise four nines you can't afford to engineer.

Protecting uptime

Three plays that actually move it.

Uptime is won by preventing incidents and recovering fast when they happen — and by promising only what you can keep. The plays run in that order.

— 01 Set an SLA you can actually keep

A humble promise honored beats a bold one breached.

The most common uptime mistake happens before any downtime: promising more nines than you can deliver. A customer told 99.9% who gets it trusts you; one promised 99.99% who gets 99.9% feels let down by the identical performance. Set the availability target you can reliably hit, write it into the SLA honestly, and over-communicate during incidents. Reliability of the promise matters as much as the uptime itself.

— 02 Prevent the incidents you can

Most downtime traces to a change you shipped.

A large share of outages come from deployments — which is exactly why a low change failure rate and small, frequent, easily-rolled-back deploys protect uptime directly. Stable releases, good test coverage, and the ability to undo a bad change in minutes are the unglamorous engineering work that keeps the nines intact. Uptime isn't mostly about exotic infrastructure; it's about not breaking your own product.

— 03 Recover fast when it breaks

Downtime budget is spent in minutes — so minutes matter.

When an incident does happen, your time to restore is what determines how much of your downtime budget it consumes. With only nine hours a year to spend at three nines, a single incident that takes hours to resolve can blow your whole annual budget. Fast detection, clear runbooks, and quick rollback turn a potential SLA breach into a blip customers barely notice.

Common mistakes operators make with Uptime.

Promising more nines than you can keep.
An SLA is a promise, and a breached promise costs more trust than a humbler one honored. Don't write 99.99% into a contract you can only deliver 99.9% against — you'll owe credits and lose confidence for performance that would have looked great under an honest SLA. Promise the availability you can reliably hit, then defend it.
Underestimating the gap between the nines.
99% and 99.9% sound nearly identical and aren't — one allows three and a half days of downtime a year, the other under nine hours. Each additional nine is roughly a 10x reduction in tolerable downtime and a stepwise increase in cost. Treating the jump as trivial leads to either under-investing in reliability or over-promising on the SLA.
Treating uptime as purely an infrastructure problem.
Most downtime traces to a change you shipped, not to a data-center failure. That makes uptime as much about deployment discipline — low change failure rate, small deploys, fast rollback — as about redundant infrastructure. Chasing more nines through hardware while shipping unstable releases is solving the wrong half of the problem.
Chasing more nines than customers need.
Each additional nine costs real money and engineering effort, and not every product needs four of them. Over-investing in availability your customers don't require is capital and attention spent where it won't move retention. Match the target to what your customers actually need to run their business — for most SMB SaaS, three nines is the right, defensible place.

Read alongside

Most downtime is a deploy that went wrong.

Uptime and change failure rate are two sides of the same coin — a large share of outages come from changes you shipped. Keep the failure rate low and deploys small, and the nines mostly take care of themselves.

Change Failure Rate guide

How Upbeat helps

The SLA promise, tracked against the trust it protects.

Uptime is engineering's to defend, but its consequences land in retention. Upbeat keeps uptime against your SLA on the leadership scorecard next to churn and the other reliability signals — so a slipping availability trend is visible as a business risk, not just an engineering metric, before it costs you the customers it's quietly frustrating.

Promise the nines you can keep.

Upbeat tracks uptime against your SLA next to churn and the reliability metrics that protect it — so a slipping availability trend reads as the business risk it is, before it costs you customers.

Become a design partner