36 years ago on January 28th, the ill-fated STS-51-L mission launched and, 73 seconds into flight, Space Shuttle Challenger broke apart due to an O-ring seal failure in the right Solid Rocket Booster (SRB). 17 years later, nearly to the day, the Space Shuttle Columbia broke apart upon re-entry after suffering a foam debris strike to one of its wings during launch. Both incidents resulted in the death of all 7 crew members aboard.
Before I get into the lessons learned during the investigations of both tragedies, I want to first honor the courage of the brave astronauts of these missions by remembering them below. They all ended up sacrificing their lives for the mission of advancing space flight, scientific study, and exploration.
STS-51-L (Challenger) Crew | STS-107 (Columbia) Crew |
Francis Scobee, Commander
Michael Smith, Pilot Ellison Onizuka, Mission specialist Judith Resnik, Mission specialist Ronald McNair, Mission specialist Gregory Jarvis, Payload specialist Christa McAuliffe, Payload specialist (part of the Teacher in Space Project) |
Rick Husband, Commander
William C. McCool, Pilot Michael P. Anderson, Mission specialist Kalpana Chawla, Mission specialist David M. Brown, Mission specialist Laurel Clark, Mission specialist Ilan Ramon, Payload specialist |
On the surface, these two events appear as independent, engineering accidents, caused by the failure of different components in a very complex system. The reality, however, is much simpler: “In both cases, engineers initially presented concerns as well as possible solutions… Management did not listen to what their engineers were telling them”. That quote is taken directly from the Columbia Accident Investigation Board (CAIB) report. Engineers, in both instances, presented concerns about a new situation they thought posed a risk to safety. However, the NASA culture prioritized launch timeline, budget, and chain of command, often normalizing risks that were not completely understood in order to “stay on schedule”. The managers listened through this filter and that of past experience to trivialize the risks presented since O-ring and foam risks had been discussed and approved for flight before. They weighed the safety risk against cost and schedule. Lacking the curiosity and courage to gather more information, ask more questions and make a better informed decision (possibly at the expense of costly delays), they elected to push forward and inevitably cost the lives of 14 crew members.
Challenger Disaster
Why did NASA managers conclude that the O-ring seal erosion was acceptable in the flight readiness review and ignore the warnings of engineers to delay the launch due to cold temperatures that could impact O-ring integrity? The management accepted the success of previous flights taken with O-ring erosion as evidence of safety. Richard Feynman, renowned physicist, details the flaws in this conclusion in his investigative report following the accident.
The Rogers Commission was tasked with investigating the cause of the Challenger accident and to recommend steps to prevent such a disaster from ever happening again. Feynman, the only scientist assigned to the Commision, discovered that there were “enormous differences of opinion as to the probability of a failure with loss of vehicle and of human life.” He found that working engineers estimated a roughly “1 in 100” chance of failure, whereas managers estimated as low as a “1 in 100,000” chance of failure. This stark contrast can be attributed to a failure in listening to the engineers by management.
Feynman notes, “in determining if flight 51-L was safe to fly in the face of ring erosion in flight 51-C, it was noted that the erosion depth was only one-third of the radius. It had been noted in an experiment cutting the ring that cutting it as deep as one radius was necessary before the ring failed. Instead of being very concerned that variation of poorly understood conditions might reasonably create a deeper erosion this time, it was asserted, there was ‘a safety factor of three.’” Management’s conclusion of the engineers erosion findings was that if the erosion depth in a single mission was only ⅓, they could fly it 3 times before it failed. Feynman, however, takes concern with their “strange use of the engineer’s term ‘safety factor.’” He provides a clear example of the typical use of a “safety factor” when designing a bridge: “If a bridge is built to withstand a certain load without beams permanently deforming, cracking, or breaking, it may be designed for the materials used to actually stand up under three times the load. This ‘safety factor’ is to allow for uncertain excesses of load, or unknown extra loads, or weaknesses in the material that might have unexpected faults, etc. If now the expected load comes on to the new bridge and a crack appears in a beam, this is a failure of the design. There was no safety factor at all; even though the bridge did not actually collapse because the crack only went one-third of the way through the beam. The O-rings of the Solid Rocket Boosters were not designed to erode. Erosion was a clue that something was wrong. Erosion was not something from which safety can be inferred.” As Feynman indicates, if a crack appeared in a beam on the bridge it would be repaired or replaced, not assumed that it could take further load before failing completely.
However, it was not just in hindsight, that these issues were discovered. Roger Boisjoly, a booster rocket engineer at NASA contractor Morton Thiokol in Utah, wrote a memo to management in which he predicted “a catastrophe of the highest order” involving” loss of human life” should they launch in the unusually freezing cold, untested, Florida temperatures of that morning. At first Thiokol managers agreed with Boisjoly and the other engineers and formally recommended a launch delay. But NASA officials on a conference call challenged that recommendation. The pressure from NASA caused the Thiokol managers to overrule the engineers and told NASA to go ahead and launch. Boisjoly was forced to watch the launch, praying he was wrong. When the rocket successfully left the launch pad he was relieved, having predicted the booster would fail on the pad. 73 seconds later his worst fears were realized.
Columbia Disaster
Why did NASA managers conclude that the foam debris strike 81.9 seconds into Columbiaʼs flight was not a threat to the safety of the mission, despite the concerns of their engineers? The management had already concluded that foam strikes were not a concern. In fact, the CAIB notes in their report that “Both Challenger and Columbia engineering teams were held to the usual quantitative standard of proof. But it was a reverse of the usual circumstance: instead of having to prove it was safe to fly, they were asked to prove that it was unsafe to fly.”
Similar to the O-ring erosion, management was inclined to dismiss the foam impact as dangerous, despite a lack of understanding of the size or location of the debris impact. Again, with a lack of information, they assumed that all foam debris strikes were incidental and did not pose a significant risk to the mission or the vehicle.
The engineers, however, sought to gain a better understanding of the debris impact specifics to more accurately assess the safety risk. Going against their management, engineers continued to work the problem. A request was placed for Department of Defense assistance with retrieving ground or satellite images of the Shuttle wing to determine the exact location and size of any potential damage. Within 90 minutes management had cancelled the request for imagery.
In an e-mail that he did not send but instead printed out and shared with a colleague, NASA Engineer Rodney Rocha wrote, “In my humble technical opinion, this is the wrong (and bordering on irresponsible) answer … not to request additional imaging help from any outside source. I must emphasize (again) that severe enough damage … combined with the heating and resulting damage to the underlying structure at the most critical location … could present potentially grave hazards. The engineering team will admit it might not achieve definitive high confidence answers without additional images, but, without action to request help to clarify the damage visually, we will guarantee it will not … Remember the NASA safety posters everywhere around stating, “If it’s not safe, say so?” Yes, it’s that serious.”
Management instead elected to rely upon a flawed model analysis of the possible damage. The tool used (typically used for assessing much smaller ice debris damage) underestimated the size of the debris and therefore the resulting impact and damage. As a result, on February 1, while reentering the atmosphere, the damage caused by the foam debris impact allowed hot atmospheric gases to penetrate the heat shield and destroy the internal wing structure, which caused the spacecraft to become unstable and break apart.
Conclusions
The Space Shuttle program was brought to an end less than a decade after Columbia, in 2011. The program launched 135 missions with 2 resulting in complete loss of vehicle and crew. That is a 1/68 rate of failure, even worse than Feynman’s noted “1 in 100” estimate by the conservative engineers.
As has been evidenced, it was not necessarily engineering failures that resulted in the death of the 14 crew members. As the CAIB report indicates, “these engineers could not prove that foam strikes and cold temperatures were unsafe, even though the previous analyses that declared them safe had been incomplete and were based on insufficient data and testing. Engineersʼ failed attempts were not just a matter of psychological frames and interpretations. The obstacles these engineers faced were political and organizational.” The lack of listening to these types of concerns had been baked into the NASA culture. “The organizational structure and hierarchy blocked effective communication of technical problems. Signals were overlooked, people were silenced, and useful information and dissenting views on technical issues did not surface at higher levels.” While their engineers were speaking to safety, management was listening for timeline and budget impacts.
As leaders it is our responsibility to be open and curious to the concerns raised by team members. We have a responsibility to those we lead to hear and value the information they bring to the table. Most teams are not dealing with the same life-or-death dire consequences that the Shuttle Engineers grappled with day in and day out. Even so, we should learn from these tragic lessons and help construct our organizations, environments, and cultures to center around successful and intentional listening.
In organizations, information exists at the bottom most levels and authority exists at the top. In addition to improving listening skills, another way to combat this issue is to push authority down into the lower levels of organizations. Simon Sinek provides an example epitomizing this challenge and the effective results of this approach in his retelling of the story of Naval Captain David Marquette.
As leaders, if we listen with intention and with purpose to our teams we can better learn the risks and opportunities that surround us. If we then also give authority to those team members who have the information, we enable them to act upon it. Perhaps then a launch or descent would have been delayed and lives saved. Perhaps our teams feel safer to express concerns, feel their input is valued, and continue to share and act upon the information that will make our team most successful. Perhaps then, most importantly, we earn and keep trust; the cornerstone of high performing teams.