Deferred CRAC Maintenance and the Compounding Risk to Data Centre Uptime

Skipping a single CRAC service visit saves $800 to $1,500. A cooling failure costs $250,000 or more. The maths should be obvious, yet deferred maintenance remains the leading contributor to precision cooling failures in Australian data centres.

The false economy of skipping services

Every facilities manager has been asked to defer maintenance to save budget. The logic seems reasonable: the CRAC units are running fine, temperatures are stable, and the next quarterly service can wait another month or two. The savings are real and immediate. The risk is invisible until it is not.

The problem is that CRAC maintenance is not like painting a building, where deferral simply means a cosmetic decline. Precision cooling systems are closed-loop mechanical systems where small issues compound into large failures at a rate that catches people off guard.

A filter that is two months overdue restricts airflow across the evaporator coil. The coil runs colder than designed, forming ice in humid conditions. The compressor cycles more frequently to compensate. Each additional start cycle stresses the motor windings and scroll mechanism. Compressor run hours accumulate faster, pushing the unit closer to its next major service interval. And because the filter was not changed, condensate drainage is also affected, creating water management issues that would not exist with a clean filter.

One deferred filter change has now created three additional failure pathways, none of which are visible on the BMS.

How risk compounds

The Uptime Institute has tracked data centre outage causes for over two decades. Their 2024 survey found that 70% of significant outage incidents involved power or cooling systems. Of those cooling-related incidents, the majority traced to one of three root causes:

Standby units that failed when activated due to deferred maintenance (low refrigerant, stuck valves, degraded compressors that had not been tested under load)
Undetected refrigerant leaks that accumulated over multiple skipped service intervals
Sensor calibration drift that masked real temperature conditions until a thermal event was already underway

All three are maintenance activities. All three are routinely deferred.

The compounding nature of cooling risk is what makes it different from other building systems. A deferred electrical inspection does not make the next inspection harder. But a deferred CRAC service means the next service takes longer, costs more, and is more likely to uncover problems that have progressed from minor to major.

The numbers behind a thermal event

The financial case for maintenance is straightforward once you quantify the downside.

A quarterly CRAC service visit for a typical DX unit costs $800 to $1,500, depending on unit type and location. Annual maintenance cost per unit: $3,200 to $6,000.

A thermal event at a colocation facility in Australia:

Direct cooling repair: $5,000 to $25,000 (emergency compressor replacement, refrigerant recharge, after-hours callout premiums)
IT equipment damage: $50,000 to $500,000+ (servers, storage, and network equipment exposed to temperatures above 35 degrees for more than 15 minutes begin experiencing component failures)
Downtime cost: At $9,000 per minute (2025 industry average), a 30-minute thermal event costs $270,000 in direct business losses
SLA penalties: Colocation providers face contractual penalties of 5x to 10x the monthly service fee for each hour of downtime
Insurance implications: Insurers are increasingly requesting maintenance records. A claim following a cooling failure with documented deferred maintenance may be partially or fully declined

The total exposure from a single thermal event ranges from $300,000 to over $1 million. Against annual maintenance costs of $3,200 to $6,000 per unit, the return on maintenance investment is roughly 50:1 to 300:1.

Refrigerant: the slow-motion risk

Refrigerant leaks are the most insidious form of deferred maintenance risk. A small leak (50 to 100 grams per year) is undetectable without pressure testing or electronic leak detection. The unit continues to operate, but with progressively less refrigerant charge.

At 90% charge, cooling capacity drops approximately 5%. At 80% charge, the compressor runs hotter and longer, accelerating wear. At 70% charge, the expansion valve cannot maintain proper superheat, risking liquid slugging and compressor damage.

A technician performing a quarterly service catches low charge early, before it cascades. A technician arriving after two skipped services finds a unit at 75% charge with a compressor running 15% over rated amperage, and a repair bill three to four times what a timely top-up and leak repair would have cost.

Under the Australian HFC phase-down (Ozone Protection and Synthetic Greenhouse Gas Management Act 1989, amended), refrigerant costs are rising. R410A prices have increased roughly 40% since 2020, and R407C is increasingly difficult to source. Every kilogram lost to a slow leak costs more to replace than it did last year.

The standby maintenance paradox

Data centre cooling designs build in redundancy: N+1, N+2, or 2N configurations where standby units cover for any primary unit failure. This redundancy is the last line of defence against thermal events.

But redundancy only works if the standby units actually function when called. And standby units are the first to have maintenance deferred, because they are not actively cooling the room and their absence from the rotation is not immediately felt.

The paradox: the units you maintain least are the ones you depend on most during a crisis.

Standby CRAC units should receive identical maintenance frequency to active units. Additionally, they should be rotated into active service at regular intervals (monthly at minimum) and verified under full load during each rotation. A standby unit that has not run at full capacity for three months is an unknown quantity, not a safety net.

Building a maintenance case for management

Facilities teams often struggle to secure maintenance budgets because the ROI is invisible. Nothing happened, therefore the spending was unnecessary. This is prevention paradox at work.

Four approaches that help:

Track and report near-misses: Every time a service visit catches low refrigerant, a failing fan bearing, or a drifting sensor, document it as an avoided incident with estimated cost-of-failure. Over 12 months, these add up to a compelling narrative.

Benchmark against industry data: The Uptime Institute, ASHRAE, and the Australian Data Centre Association all publish reliability data. A data centre spending less on cooling maintenance than the industry median is statistically more likely to experience a thermal event.

Insurance alignment: Many data centre insurance policies now include maintenance compliance requirements. Demonstrating adherence to manufacturer-recommended service intervals can reduce premiums and, more importantly, protect claims eligibility.

Energy cost tracking: Deferred maintenance increases energy consumption. A unit with dirty coils and low refrigerant uses 15 to 25% more electricity than a well-maintained unit. Tracking kWh per unit before and after each service visit demonstrates direct operational savings that offset the service cost.

What a proper CRAC maintenance programme includes

For Australian data centres, a proper programme covers:

Quarterly visits (all units, active and standby):

Filter inspection and replacement
Condenser coil inspection and cleaning
Refrigerant pressure check and leak scan
Compressor amperage measurement against baseline
Supply and return air temperature verification
Condensate drain inspection
Belt tension and fan bearing check
Control system alarm log review
Sensor calibration spot-check

Annual visits (in addition to quarterly):

Full refrigerant circuit leak test (electronic detection)
Compressor oil analysis
Electrical connection torque check
Control system firmware review
Full sensor calibration
Standby unit full-load test
Condenser deep clean (chemical wash if required)

Biennial visits:

Compressor valve plate or scroll inspection (where accessible)
Expansion valve test and replacement if required
Full electrical safety test
Condition assessment report with remaining-life estimate

We provide scheduled CRAC maintenance programmes for all major brands (Vertiv/Liebert, Stulz, Uniflair, Daikin Applied, Climaveneta, Mitsubishi Heavy) across Brisbane, Sydney, and Melbourne. Contact us to review your current maintenance schedule against best practice.