Heroic Reliability Improvement in Manned Space Systems
MetadataShow full item record
System reliability can be significantly improved by a strong continued effort to identify and remove all the causes of actual failures. Newly designed systems often have unexpected high failure rates which can be reduced by successive design improvements until the final operational system has an acceptable failure rate. There are many causes of failures and many ways to remove them. New systems may have poor specifications, design errors, or mistaken operations concepts. Correcting unexpected problems as they occur can produce large early gains in reliability. And improved technology in materials, components, and design approaches can increase reliability. The reliability growth is achieved by repeatedly operating the system until it fails, identifying the failure cause, and fixing the problem. The failure rate reduction that can be obtained depends on the number and the failure rates of the correctable failures. Under the strong assumption that the failure causes can be removed, the decline in overall failure rate can be predicted. If a failure occurs at the rate of x per unit time, the expected time before the failure occurs and can be corrected is 1/x, the Mean Time Before Failure (MTBF). Finding and fixing a less frequent failure with the rate of x/2 per unit time requires twice as long, time of 1/(2 x). Cutting the failure rate in half requires doubling the test and redesign time and finding and eliminating the failure causes. Reducing the failure rate significantly requires a heroic reliability improvement effort.