High Reliability Requires More Than Providing Spares
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
It is sometimes optimistically hoped that a space life support system can be kept working throughout a long duration mission by repairing failed components, as long as sufficient spares are flown. It is usually assumed that the components have constant known failure rates. Then the needed numbers of spares can be computed to have any particular probability that all failed components can be replaced by available spares. This approach can provide high reliability if its favorable assumptions, including constant known failure rates, are satisfied. Other favorable assumptions are that the failures are statistically independent, repair will be successful without causing further failures, and all failures are due to internal component failures. These assumptions are not usually justified. The failure rates may be estimates that are inadequately verified because of insufficient testing. Failure rates may change due to materials substitutions, manufacturing changes, redesigns to fix failures, and new failures caused by redesigns. Failures that are not statistically independent may result from one common cause, such as a design or manufacturing error or a cascade of cause and effect, possibly caused by an external event such as a power outage. Repair may be unsuccessful or cause damage. Many failures occur at component interfaces or at the overall systems level, not within isolated components. Other failures causes are completely external to the system, due to assembly, maintenance, and operational errors or to unexpected environmental challenges. Replacement with sufficient spares can compensate for expected internal component failures but may not be able to cope with unpredictable design and manufacturing flaws, human errors, and environmental impacts. Reliability estimates based on providing sufficient spares to compensate for expected failures may be far too high. They are essentially upper bounds on reliability that might be approached if many frequent but often unconsidered failure causes can be eliminated.
Description
ICES511: Reliability for Space Based Systems
The 49th International Conference on Environmental Systems was held in Boston, Massachusetts, USA on 07 July 2019 through 11 July 2019.