Metres and milliseconds: an almost catastrophic near-hit

6274

A near collision between two Japanese airliners is a chilling reminder of the importance of situational awareness, at every level. Adrian Park analyses a multi-level breakdown that almost killed at least 664 people.

On 31 January 2001, the passengers of two Japan Airlines (JAL) aircraft came terrifyingly close to total obliteration in the sky. While people on board JAL 907 (a Boeing 747) and JAL 958, a (McDonnell Douglas DC-10) enjoyed the pleasantries of inflight refreshments, their respective aircraft were climbing, turning and then descending at the request of air traffic control (ATC).

The manoeuvres had put them on dangerously convergent flight paths. As the passengers sipped quietly away at their drinks, decisions at the front of the aircraft as well as decisions on the ground were being made on their behalf.

The decisions were flawed. The decisions were the expressions of a flawed perception of reality and 91 people would be injured as a result. In the aftermath further decisions—system-wide decisions—were also made. These too were faulty decisions—made on a faulty systems-wide perception of reality. As a result 71 people, mostly school children, would die in a later, almost identical accident.

‘Situational awareness’ (SA) is not normally a term associated with fare-paying passengers. It is, after all, a little hard to have ‘good SA’ when one is packaged into a cramped airline seat which may or may not have a window. But none of the people involved in this near tragedy had sufficient SA for an accurate warning of the terrifying events about to occur:

  1. At 3:46pm Japanese time Flight 907 was cleared by ATC to climb to Flight Level 390.
  2. The air traffic controller was a task-saturated trainee. At the time of the accident he had been attempting to handle 14 different aircraft and made 37 radio transmissions in nine minutes. Some of these transmissions were misunderstood or unheard by the various aircraft above and below.
  3. The supervising air traffic controller was slow to intervene. When he finally did, he misidentified Flight 907’s call-sign. He then instructed Flight 907 to descend while its traffic collision avoidance system (TCAS) demanded a climb. This caused the Boeing 747 to match almost perfectly the flight path of the converging DC-10 for 20 horrifying seconds as it too descended. The two aircraft were in effect racing downwards—the DC-10 crew engaged the speed brakes in response to the TCAS’ stentorian ‘Descend, Descend!’ command.
  4. A catastrophic collision was only avoided by the aggressive last-moment manoeuvring by the 747 pilot. The pilot put the 747 into a savage bunt, which sent galley carts, cabin attendants and unrestrained passengers into the ceiling. Of the 427 people on board 91 were injured—nine seriously.

A mere 10 metres—the height of a suburban tree—was all that separated the two aircraft as JAL 907 initiated its last-second manoeuvring. Passengers and crew of JAL 907 later reported the shocking proximity of JAL 958 in various ways, all chilling: ‘the other aircraft was so close I thought its tail would snag our aircraft’; ‘it appeared to fill the whole window’; and ‘I have never seen a plane fly so close—I thought we were going to crash’.

If JAL 907 had collided with JAL 958 the result would have been 664 deaths.

It would have been the worst aviation accident ever—surpassing even the infamous Tenerife accident in which 583 people died. Thankfully both JAL 907 and 958 later landed, with JAL 958 undamaged and JAL 907 experiencing minor damage. While this breakdown in localised SA was bad enough, its severity would pale compared with the systems-wide SA failure that was beginning to unfold.

To understand why SA can be applied to awareness at a systems level (and how it can fail at a systems level) one has to understand the basics of SA. There are many technical definitions of SA. The chief scientist of the US Air Force, Mica R. Endsley’s definition is probably the most ubiquitous:
‘the perception of elements in the environment within a volume of time and space, the comprehension of their meaning, and the projection of their status in the near future’.

But E. C. Adam’s is the most practical: ‘knowing what is going on so you can figure out what to do’. Normally, at its best, SA is a relatively accurate and workable construct of reality. But at its worst SA is a delusion. Delusions are obviously a problem for captains or controllers in command of high-capacity heavy jets. But nor are they desirable for boards and committees expected to maintain collective SA at a systems-wide level.

Localised situational ‘unawareness’—situational ignorance—is as obvious as a galley cart hitting the cabin ceiling; that is bad enough. But at a systems level such ignorance is way more severe. Here the processing demands required by genuine SA (and its potential effects) have moved beyond single aircraft to whole fleets. They have moved to massive systems of interlacing regulation, support structures and complex curriculums. Here innumerable organisational, bureaucratic and institutional ‘inputs’ have to be usefully processed into what we could call a ‘far horizon’ SA construct. But even at this level the essence of systems SA is the same as localised SA—to truly know what is going on so something can be done about it.

To borrow from meteorology, the difference between localised SA and systems SA is the difference between weather and climate. Weather is an event happening right now. Climate considers localised events as part of a pattern over an extended time. Knowing the current weather situation is knowing a local snapshot. Knowing the climate is knowing a globe’s worth of seasons over many years. The difference between a situational weather threat and a climatological threat is the difference between parking undercover during a storm and reducing carbon emissions over 20 years. Both rely on SA and both have consequences, but on radically different scales.

There are important SA lessons to learn at the local context from JAL 907.

One obvious lesson is to obey TCAS resolution alerts over conflicting ATC instructions. Another is to understand the difficulties of assessing and then initiating timely avoidance manoeuvres at high altitude in a heavy jet. Another is the need for adequate management of ATC workload. But these are all lessons of what we could call ‘near horizon SA’. There are also important ‘climatological’ lessons to learn—especially with regard to how systems SA can be detrimentally affected.

When JAL 907 landed, police were waiting to escort the crew from the aircraft. Criminal charges were later laid against the pilots and the air traffic controllers. This triggered a 10-year legal process involving numerous trials, appeals and re-trials. The pilots were cleared, but the controllers were charged. In the world of aviation safety where good systems-wide SA requires good systems-wide reporting, this was a troubling development. Relatively recent literature has noted the increasing impost of judicial proceedings during aviation investigations (28 criminal cases between 2000 and 2009). Criminalisation of human error casts a blame shadow over crews. Crews concerned about litigation are less likely to report, and thus unreported mistakes, which are uncontrolled hazards, hide in judicial shadows. To put this in SA terms, far-horizon SA is reduced to the interior of a courtroom where client rights are in view rather than accident prevention. When the gavel falls, litigants may draw satisfaction from seeing ‘justice’ done, lawyers may be paid and the media may have a story, but the collateral damage is ignorance and shrunken SA.

Adding to poor systems SA was a slow investigative and regulatory response. While lawyers argued back and forth in court, systems-wide SA was being further congested as the Japanese Aircraft and Railway Accident Investigation Commission (ARAIC) sluggishly conducted its investigation and ICAO made an even more sluggish response. The investigation alone took 18 months of laborious examination producing a tome of analyses, findings and recommendations eventually bulking out to a 267-page document. A significant number of these findings were directed at ICAO recommending, amongst other things, legislation to give TCAS resolutions alerts priority over conflicting ATC instructions. In the time it took to produce the investigation findings, four more near-collisions were reported involving similar disregard for TCAS primacy. It would be another four years after this before the recommendations were enacted.

Situational awareness requires the timely gathering of accurate information.

In the cockpit of JAL 907, information regarding the proximity of the convergent aircraft was confused by contradictory commands. The controller, handling 14 different aircraft, issued a ‘descend’ command to the crew of JAL 907 while TCAS issued a ‘climb’ command. One SA construct was now interfering with another. Which one was reality? Which one was the delusion? While the crew figured it out, the delay in appropriate action nearly killed 664 people. But while this was serious enough, delays at the systemic level were about to result in something worse then a near collision.

As the investigative and regulatory wheels ground slowly along an accident eerily similar to the JAL 907 incident unfolded. On 1 July 2002, a Boeing 757 freighter and a Bashkirian Tupolev TU154M collided overhead Uberlingen, Germany. All 71 people on board were killed. The Tupolev pilots were found to have responded to ATC instructions conflicting with their on-board TCAS alert. Eleven days after the Uberlingen collision the Japanese report into the JAL 907 was released with the TCAS recommendation as one of its key action items. (It would be many more years before the legislation was finally codified in the relevant regulations.)

When JAL 907 nearly collided with JAL 958 the global system of aviation received an alert. Something was wrong procedurally. Something was wrong systemically. Something appropriate and timely needed to be done. Instead the sluggishness of regulatory change meant crews over Germany repeated the same mistake made some 18 months earlier—this time with deadly consequences.

A situational awareness breakdown at the crew level had caused a near-hit, the breakdown at the systems level caused a collision. In the JAL criminal trial, witnesses were called, questions asked, facts gathered, pronouncements made, the ‘guilty’ charged, acquitted, re-charged (under appeal) and eventually prison sentences handed out. But in all the investigating and judicial questioning only one question really mattered: would any of it prevent an accident? The answer had already been made horribly clear in a crash at night over Germany—no.

Suggested reading

  • Toward a theory of situation awareness in dynamic systems, M.R. Endsley, 1995.
  • Fighter cockpits of the future. Proceedings of 12th IEEE/AIAA Digital Avionics Systems Conference (DASC), E.C. Adam, 1993.
  • Just Culture: Balancing Safety and Accountability, Sidney Dekker, 2007.
  • Flying in the Face of Criminalization: the Safety Implications of Prosecuting Aviation Professionals for Accidents, Sofia Michaelides-Mateou & Andreas Mateou, 2010.

1 COMMENT

  1. Excellent article … but there remains a another problem of S/A …. why did the controller instruct descent when the TCAS instructed the 747 to climb ?
    There is a sub-conscious tendency when one realises they’ve made a mistake to want to undo it. So when the ‘sphincter’ factor occurs, a mistaken climb instruction automatically becomes a ‘descent’ instruction. Unfortunately too much time may have passed and so because the TCAS is re-calculating solutions, its likely the controller will be wrong more often than not. In this case, whilst we don’t know what the DC10 was doing/ instructed .. the descent ‘worked’.
    Controllers need a TCAS alert function on their SSR display’s so they know what the pilot is being instructed by the equipment.

Leave a Reply