Ask an experienced reliability engineer what federal facilities maintenance culture looks like from the outside, and the answer is rarely flattering. Some teams are candid about their challenges and genuinely looking for better approaches. Others present a picture that doesn’t reflect what’s happening on the floor. That gap can exist within the same organization, even between facilities doing nearly identical work.
These cultures share a maintenance framework that wasn’t designed for the current challenges faced by federal facilities. With 83% of NASA’s facilities surpassing their expected lifespan and an estimated annual maintenance gap of nearly $300 million, this issue is common across the federal sector. Reliability-centered maintenance (RCM) provides a solution, aligning maintenance with how assets support the mission, rather than how long it has been since they were last serviced.
The reactive maintenance trap
When facility managers are asked about obstacles to their maintenance programs, they often cite funding issues. While funding gaps are genuine, pouring more money into a flawed maintenance system only wastes resources. Factors such as budget constraints, data quality and organizational culture all contribute to reactive maintenance. Inaccurate or poorly structured data, or data collected without a clear purpose, limits the effectiveness of RCM from the outset.
Culture is the most difficult barrier to overcome. Maintenance teams often patch recurring failures because they lack the time or authority to investigate root causes. Additionally, the institutional knowledge of how a facility functions is lost when experienced maintainers leave or retire. Many managers abandon RCM just before it begins to show results.
Start with function, not failure
RCM begins with a simple question: What must this asset do, how well must it perform and in what context does it operate? Not what the original equipment manufacturer recommends or what an arbitrary schedule dictates, but what happens to operations, safety and mission if this system cannot fulfill its intended function. This question changes how resources get deployed.
That understanding drives a criticality-based prioritization that sorts assets into tiers. At the top sit systems where failure creates safety hazards or shuts down critical operations:
- HVAC serving secure or environmentally controlled spaces
- Primary electrical infrastructure
- Utilities feeding mission-essential processes
These justify condition-based monitoring, targeted inspections for early signs of deterioration and pre-stocked spare parts. Below these sit support systems where failure causes manageable disruption, and at the lowest tier, allowing assets to run to failure becomes a deliberate resource decision.
Teams are often surprised during a criticality assessment to discover that assets they assumed were identical require fundamentally different strategies. For example, a facility may have 120 overhead roll-up doors, with some being low-criticality assets and others among the highest, depending on the building they serve and what is behind them.
Air handlers follow a similar pattern: crucial in laboratories but less critical in warehouses. Applying a fixed maintenance schedule to all assets of the same type wastes resources on equipment that doesn’t require it, while truly critical systems might fail at the worst times. When everything is treated as critical, the term loses its meaning.
NASA’s tiered maintenance program applies this approach at scale. Facilities are evaluated based on mission relevance and assigned to one of four tiers, with maintenance goals ranging from sustaining or enhancing the condition of mission-critical infrastructure to providing minimum service levels for non-essential buildings. To define these goals, engineers conducted in-person assessments at every NASA center.
Matching strategy to consequence
Assets are prioritized by importance, allowing maintenance strategies to be adjusted accordingly. Critical systems include condition-based monitoring, inspections for early signs of deterioration and pre-stocked spare parts. If failures occur despite these measures, root cause analysis is employed to prevent recurrence.
Mid-tier assets are monitored less often and undergo simpler procedures, focusing more on managing risks within acceptable limits rather than preventing every failure. For lower-tier assets, reactive maintenance is often the most cost-effective approach when failures have minimal impact, and proactive maintenance provides little economic value.
Salas O’Brien supported NASA in developing and implementing these criticality assessments across its centers, evaluating more than 320,000 equipment assets to define maintenance priorities based on mission relevance. The early results are measurable. Tiered maintenance implementation has already produced more than $50 million in agency-wide cost avoidance. Over the next ten years, the agency projects a 75% reduction in its maintenance funding gap and $810 million in reduced deferred maintenance obligations by allocating existing resources where they have the greatest impact.
Getting started without shutting down
Facilities operating 24/7 often assume RCM requires long shutdowns, but it doesn’t. The foundation is existing data, primarily from the computerized maintenance management system (CMMS), for most facilities. However, before any meaningful analysis can occur, asset naming must be consistent. Inconsistencies, like an air handler logged as “AH12” in one record and “Air Handling Unit-12” in another, make pattern detection impossible.
Once the data is cleaned, work orders, maintenance logs and operator notes reveal patterns indicating which assets fail repeatedly. That record supports a criticality assessment conducted in a conference room, not on the plant floor. Cross-functional teams evaluate what failure would mean for each major system in terms of operations, without taking anything offline. Starting with a pilot on ten to fifteen critical systems before expanding is the most practical approach. NASA followed this model, initially at three centers, before scaling across all eleven agency locations.
The cultural shift is the hardest part
Skepticism about RCM is natural, especially for facilities that have seen initiatives come and go without results. Instead of relying solely on theoretical arguments to differentiate this approach, consider what peer agencies have already proven. Facilities facing similar budget constraints and aging infrastructure have adopted RCM, successfully lowering failure rates, deferring maintenance needs and reducing emergency repair expenses. The NASA example given above is one such case.
The harder internal challenge is tribal knowledge because it’s an accumulated understanding of how a facility operates that resides with individual maintainers rather than in any documented system. That expertise is real and worth preserving, and RCM doesn’t ask organizations to abandon it.
Instead, RCM urges leadership to commit to the process beyond the point of discomfort. During the first six to twelve months, maintenance hours and costs usually increase as persistent problems are properly addressed. By eighteen to twenty-four months, failure rates decrease, and the investment becomes worthwhile. Programs that exit before reaching that inflection point never realize the return.
Invest in the people first
More than any specific framework or methodology, what truly makes RCM effective is the commitment of dedicated personnel. This starts by empowering maintenance teams to document issues whenever they notice them — not only during scheduled inspections, but also during normal operations. It also requires dedicating time to identify root causes rather than repeatedly patching the same problem and paying attention when a technician reports a slow leak before it escalates. The best part is that these actions are inexpensive. They can transform a team that merely fixes issues into one that deeply understands the facility they oversee.
Maintenance staff who feel genuine ownership of the assets in their care, are trusted to identify problems and are given time to solve them are the foundation of any effective program. The agencies that manage aging infrastructure most effectively over the next decade won’t necessarily be the ones with the largest budgets. They’ll be the ones who invested in their people first, stopped treating assets as equal, and managed them based on what truly matters.
Aaron Thompson is a reliability engineer at Salas O’Brien.
Copyright
© 2026 Federal News Network. All rights reserved. This website is not intended for users located within the European Economic Area.

