International Journal

on Marine Navigation

and Safety of Sea Transportation

Volume 3

Number 3

September 2009

307

1 INTRODUCTION

The bathtub curve is a widely used figure to describe

the failure rate of a product during its lifetime. The

curve consists of three phases: the "infant mortality"

period, the "normal operating" period and the "wear

out" period, see Figure 1. This kind of failure curve

is typical for complicated technical systems, such as

cars, consumer electronics and computer hardware,

for instance. The high failure rate in the beginning of

the lifetime of a complicated automation system is

mainly explained by the existence of latent software

errors. The frequency of software-based failures is

high in the beginning, but it decreases throughout

the lifetime of the software product, as illustrated by

the yellow line Figure 1. The explanation is that a

piece of software does not wear out or fail, but all

failures or malfunctions are caused by latent soft-

ware errors, or bugs. There are more bugs in a new

software product, but as the product gets older, the

latent errors are gradually being found and correct-

ed. Provided that the software updates are made cor-

rectly, i.e. new errors are not created when software

bugs are being eliminated, the failure rate steadily

decreases.

It can be assumed that the failure rate curve of an

Integrated Navigation System (INS) of a ship has al-

so the shape of a bathtub during its lifetime. There is

some evidence about higher failure rates of new

INSs, although extensive statistical data seems not

be available about this matter. After 1994 on Finnish

waters, there have been several groundings caused

by a failure or a malfunction of the navigation and

steering system of the ship [1]. Almost all the sys-

tems involved were new and the failures fell in the

category of infant mortality failures.

Figure 1. The bathtub curve

There are factors that seem to promote the infant

mortality problem of INSs of ships. The first one is

Time

Failure

Rate

Infant

Mortality

Period

Normal Operation Period

Wear Out

Period

Early ”infant mortality”

failures, software bugs etc.

Random failures

Ageing, ”wear out”

The Problem of "Infant Mortality" Failures of

Integrated Navigation Systems

S. Ahvenjarvi

Satakunta University of Applied Sciences, Rauma, Finland

ABSTRACT: This paper deals with the problem of high failure rate often experienced on ships that are

equipped with a new integrated navigation system. These "infant mortality" failures of the navigation system

can form a significant risk to the safety of the ship, as the history has shown. Some accidents caused by this

type of failures are briefly discussed. The paper highlights some factors that promote this problem. One of the

most important factors is the low degree of standardisation of the bridge systems. Another factor is the in-

completeness of the self diagnostics of a new system. The role of the self diagnostics is crucial in coping with

failures, because the redundancy of the navigation systems is typically based on manual activation of the

back-up device or function. The necessary corrective action by the user can be delayed too much if the self

diagnostics of the system is not able to detect the failure. Proper testing of the new system during the harbour

trials and the sea trials as well as utilisation of efficient failure analyses techniques is important for reducing

the safety risk caused by the infant mortality failures. In the end of the paper, some practical experiences of

using FMECA and HAZOP analysis in the development of the integrated navigation system of a large cruise

vessel are presented.

308

the lack of standardisation. The INSs of ships are

typically tailor-made. The risk for unknown failure

modes and unknown software errors in such systems

is higher than in standardised, mature systems.

Gradually, as the unknown failure modes are found

and eliminated, the failure rate decreases. When the

ship and its navigation equipment have passed the

infant mortality period, the probability of an acci-

dent due to unknown dangerous faults decreases.

2 SOME ACCIDENT CASES

A fault of a critical component of the navigation sys-

tem of the ship was the initial cause of the following

five real accident cases: Grounding of the passenger

ferry M/S Silja Europa in the Swedish archipelago

close to Stockholm in January 1995, grounding of

the tanker ship M/T Natura in front of Sköldvik in

October 1998, grounding of the ro-ro passenger fer-

ry M/S Finnfellow in Åland in April 2000 and the

grounding of the passenger ferry M/S Isabella in

Åland in December 2001 and the grounding of M/S

Royal Majesty close to the east coast of the USA in

June 1995 (OTK, 1995; NTSB, 1997; OTK, 1998,

2000 and 2001).

A remarkable feature about these five cases is

that the failed equipment was rather new. M/S Silja

Europa was constructed less than two years before

the accident. M/S Royal Majesty was constructed

three years prior to the accident. M/T Natura was

constructed five years prior to the accident. The

compass system of M/S Finnfellow was upgraded

only 13 months before the accident. The INS of M/S

Isabella had been renewed around six years prior to

the accident. So the average age of the failed equip-

ment was around 3,5 years, which is not much when

it is compared with the typical lifetime of a ship, 25

to 30 years. So the critical faults of the five accident

cases were not caused by ageing or "wear out", but

by the "infant mortality" of the equipment. This ap-

plies even to the Royal Majesty case: Although the

original fault, i.e. separation of the signal cable from

the GPS antenna, can be considered a random fail-

ure, the other factors fall into the "infant mortality"

class.

3 FACTORS THAT PROMOTE THE PROBLEM

OF "INFANT MORTALITY" FAILURES

A critical fault in the INS of a ship represents a high

safety risk especially in restricted waters and in are-

as with high traffic density. The south-west coast of

Finland, for instance, is surrounded by a wide archi-

pelago area. Navigation on these waters is very de-

manding. In Figure 2, there is a sample from the sea

chart of the archipelago area close to the city of Tur-

ku. The fairways are winding with the minimum

fairway breadth only ca. 150 metres.

Figure 2. A sample of the archipelago of the south-west coast

of Finland

In this area, the available time margin to avoid a

grounding after a critical failure can be only a few

seconds. For instance, the grounding of the ro-ro

passenger ship M/S Finnfellow in April 2000 took

place only 85 seconds after a fatal gyro compass

failure. Even though the deck officers noticed the

abnormal turning of the ship 30 seconds after the

failure, it was too late to avoid the grounding (OTK,

2000). Figure 3 shows the position of the ship only

90 seconds after the failure!

Figure 3. Position of M/S Finnfellow 90 seconds after the criti-

cal compass failure (OTK, 2000)

The risk of an accident caused by unknown fail-

ure modes is high when recovery of the failure is

dependent on proper corrective action of the user.

This is still the case in most of the new INS installa-

tions. The user has to activate the back-up device or

309

function if the active unit fails. In order to be able to

do so, the user has to be aware of the operational sta-

tus and condition of the system and its critical com-

ponents. The self diagnostics of the system is crucial

for the user to maintain the situation awareness and

to be able to react quickly and correctly to failures. It

can be seen quite easily from the past accidents, that

the user of the INS can have serious difficulties in

registering a failure in the system, if the system does

not give any alarm about the situation. The failure

detection delay can in some cases be extremely long,

as was in the M/S Royal Majesty case.

The user's dependency on the self diagnostics

makes unknown "infant mortality" failure modes es-

pecially dangerous, because they have not been an-

ticipated by the software engineer who designed the

self diagnostics of the system. In other words, an

unknown "infant mortality" failure will probably not

cause an immediate alarm. It is interesting that actu-

ally the designers of the system make a double error

when they do not recognise a dangerous failure

mode: no measures will be taken to eliminate the

failure mode in concern AND it will not be ensured

that the self diagnostics of the system is able to de-

tect the failure.

Poor or incomplete self diagnostics is a typical

problem of new technical systems. Development of

proper self diagnostics is expensive and it may be

one of the last things to be developed to a new prod-

uct. The weaknesses of the self diagnostics may be-

come apparent to the user and to the manufacturer

years after the commissioning of the system. Nancy

Leveson states that ”the carefulness in designing and

testing is too often directed to the normal operation

of the system, while the unexpected and erroneous

states get much less attention” (Leveson 1995, p.

400). The development of this important area of sys-

tem safety need new regulations about self diagnos-

tics, for instance demonstration of the completeness

of the self diagnostics as a part of the type approval

procedures.

The disability of the system to detect failures and

malfunctions, i.e. the deficient self diagnostics, is a

serious weakness of new INSs. Increasing complexi-

ty of the systems and the relatively short lifetime of

product generations seem to promote this weakness.

Another factor is the lack of standardisation. In order

to make the detection of abnormalities quick and re-

liable, there has to be much knowledge about the

structure and operation of the individual parts and

devices of the system, and a lot of practical

knowledge about the use and the operation of the

system. Fulfilment of these requirements is difficult

without better standardisation of INS systems. It is a

well known fact that the INSs are too often tailor-

made entities. Even sister-ships are too often

equipped with different INS setups. However, in or-

der to reduce the safety risk caused by "infant mor-

tality" failures, that kind of tailoring should be

stopped. From this point of view, it would be ideal if

there were only very few alternative system setups

available on the market. Moreover, the INS manu-

facturers should make the lifetime of a product gen-

eration as long as possible. Unfortunately, due to the

competition (obviously), the manufacturers tend to

introduce new product generations more and more

frequently. That is a wrong strategy for reducing the

safety risk caused by "infant mortality" failures.

4 WHAT CAN WE DO ABOUT IT?

Alternative methods to reduce the safety risk caused

by "infant mortality" failures of INSs would be

1 to increase the lifetime of INS product genera-

tions

2 to improve standardisation of INSs

3 to require better testing of new products from

INS manufacturers, including the demonstra-

tion of completeness of the self diagnostics

4 to make the INSs more fault tolerant

5 to apply different failure analysis methods to

new systems prior commissioning

The first method is difficult to accomplish. The

competition on the market seems to force the manu-

facturers to introduce new innovative system genera-

tions every other year. As customers, are we happy

about this situation? Would we prefer a fully tested,

reliable INS in stead of a brand new system with all

latest features - and with all those "infant mortality"

problems? The customers must realise the im-

portance of this matter and ask for reliability rather

than for new architecture or new functions. Would it

be a good idea to establish a www-based failure reg-

ister for the INS products on the market. The data-

base could be maintained by all users of the INSs. It

could give the customers some idea about the relia-

bility of different products on the market and hence

make the reliability more important also for the

manufacturers.

The second and the third method would require

new regulations from the international shipping

community. The new concept of e-navigation should

be used for this purpose. Thorough failure mode

testing and demonstration of the completeness of the

self diagnostics should be included in the type ap-

proval test requirements. Introduction of new system

generations would become more difficult, which

would also support the increase of product lifetimes.

The fourth method would consist of automatic re-

covery functions in fault situations. This is the most

powerful method to reduce the risk of accidents due

to a single failure in the system - no matter if it was

an "infant mortality" failure or something else. Au-

310

tomatic redundancy has been successfully applied in

many areas of safety critical automation, such as dy-

namic positioning of offshore vessels and automatic

flight management of modern passenger aircrafts.

The fifth method is already in use. The difficulty

in making proper failure analysis for a new INS is

that the manufacturer has got the best and the most

important information about the system. It is well

known that the all failure analysis methods, such as

the Failure Mode, Effect and Criticality Analysis

(FMECA), is very much dependent on the quality of

the data about the technical structure and the soft-

ware of the analysed system. In practise, the manu-

facturer is the only party that possesses this infor-

mation and thus can make a good and

comprehensive failure analysis for the product. The

author of this paper has coordinated recently two

failure analysis projects for large INS systems of

passenger cruise ships (see Ahvenjärvi, 2005). These

projects confirmed that the manufacturer of the sys-

tem, indeed, plays the key role in analysis of a new

product. It turned out that an FMECA made by the

manufacturer(s) and commented by the shipyard /

the owner of the ship, combined with a Hazard and

Operability Analysis (HAZOP) can give useful re-

sults for reducing the risk of an accident due to un-

known failure modes. The problem of these methods

is that you can never know, if all failure modes - or

even most of them - have been detected in the analy-

sis. Actually it is unrealistic to assume that all possi-

ble failure modes have been found by using these

techniques. Suokas et al. (1988) studied the validity

of different methods of identifying accident contrib-

utors in process industry systems. The study showed

relatively low validity figures for the FMEA, only

17 % of contributors of hazards could be identified

by applying FMEA. Other methods were not better

than FMEA. Thus it can be assumed that even the

combined use of FMECA and HAZOP would cover

less than half of all potential failure modes, i.e. the

other half of the "infant mortality" failures would

remain unpredicted.

5 CONCLUSIONS

A brand new INS with updated architecture and a

new software with the latest innovations is not nec-

essarily the best choice for a ship, especially if it will

be sailing in areas with narrow fairways or dense

traffic. A new system suffers from the "infant mor-

tality" failure phenomenon discussed in this paper.

The problem is a combination of three factors: in-

creased failure rate (due to hardware failures and

software errors) in the beginning of the operational

time of the system, unknown failure modes and in-

completeness of the self diagnostics of the system.

As the result, the user may lose the control of the

situation, if a failure hits the system and it is not ca-

pable of giving an alarm about it. The risk of an ac-

cident is high if the time margin to make a corrective

action is short. Several accidents have taken place

due to this kind of "infant mortality" failure.

Obviously the most powerful methods to reduce

the risk of this kind of accidents is to make the life-

time of product generations longer and by placing

more strict requirements for testing of new systems

before they can be taken into use. Standardisation

would also be a useful way to limit the number of

different types of INSs and hence to reduce the risk

of unknown failure modes. These methods, however,

require international cooperation and new regula-

tions. Perhaps a web-based failure database could al-

so be useful to encourage the system manufacturers

to put a higher priority on reliability and safety than

on introduction of new features and new design as

frequently as possible. Risk evaluation techniques,

such as FMECA and HAZOP can also be used to

analyse potential failures of a new INS, but it should

be realised that even a good analysis will cover only

a fraction of all possible unknown failure modes.

REFERENCES

Ahvenjärvi, S. (2005). Failure Analysis of The Navigation and

Steering System of Freedom of the Seas, paper at the 125th

Anniversary Conference of Maritime Training in Rauma,

October 6-7, 2005

Leveson, N. (1995). Safeware, Addison-Wesley Pub Co. USA

National Transportation Safety Board, NTSB (1997). Ground-

ing of the Panamanian passenger ship Royal Majesty on

Rose and Crown shoal near Nantucket, MA, June 10,1995

(Marine accident report NTSB/MAR-97/01). Washington

DC: NTSB

Onnettomuustutkintakeskus, ’OTK’ (1995): The Grounding of

the M/S SILJA EUROPA at Furusund in the Stockholm

Archipelago on 13 January 1995. Report N:o 1/1995.

Onnettomuustutkintakeskus, Helsinki.

Onnettomuustutkintakeskus, ’OTK’ (1998): M/T NATURAN

karilleajo Emäsalon edustalla 13.10.1998. Report C 8/1998.

Onnettomuustutkintakeskus, Helsinki. (in Finnish)

Onnettomuustutkintakeskus, ’OTK’ (2000): M/S

FINNFELLOW, Grounding near Överö in Aland, April 2,

2000. Report B 2/2000 M. Onnettomuustutkintakeskus,

Helsinki

Onnettomuustutkintakeskus, ’OTK’ (2001): Matkustaja-

autolautta ISABELLA, pohjakosketus Staholmin luona Ah-

venanmaalla 20.12.2001. Report B 1/2001. Onnetto-

muustutkintakeskus, Helsinki. (in Finnish)

Palady, P. (1995). Failure Modes and Effects Analysis, PT

Publications Inc, West Palm Beach, USA

Suokas, J. & Pyy, P. (1988). Evaluation of the validity of four

hazard identification methods with event descriptions.

Valtion Teknillinen Tutkimuskeskus (VTT). Espoo,

Finland..