17 June-21 June 2024
Perth, WA or On-line, UTC + 8 hours
19 August-23 August 2024
Perth, WA or On-line, UTC + 8 hours
11 November-15 November 2024
Perth, WA or On-line, UTC + 8 hours
LEADERS AND MANAGERS
Please enquire
Perth, WA or On-line, UTC + 8 hours
Firstly, we need to make sure that procedures and techniques for maintenance, inspection and testing are suitable for SIL 3.
Next, we need to confirm that the devices are suitable for SIL 3 service. The preferred approach is to establish suitability through evidence of prior use, in accordance with IEC 61511-1 §11.5.3 to §11.5.6. A sufficient volume of evidence from operating experience with this make and model of device would be needed.
Systematic capability could be considered as an alternative to prior use because the devices are claimed to be compliant with IEC 61508. An increase of +1 in SC is allowable only there is sufficient independence between the elements. These two sensors cannot be sufficiently independent because they are identical. At least one sensor would need to be replaced with a different device that complies with IEC 61508 and has SC 3 capability. The two sensors would need to sufficiently independent with respect to common cause failures unless both are upgraded to SC 3.
The complicated way uses the formula PFDG ≈ ((1-b ).λDU T1)²/ 3 + b .λDU .T1 / 2
≈ (0.9 x 1.3 x 10-2 x1)²/ 3 + 0.1 x 1.3 x 10-2 x 1 / 2
≈ 4.5 x 10-5 + 6.5 x 10-4
≈ 7.0 x 10-4
You can ignore the 1-b term, it makes no difference:
PFDG ≈ (λDU T1)²/ 3 + b .λDU .T1 / 2
≈ (1.3 x 10-2 x1)²/ 3 + 0.1 x 1.3 x 10-2 x 1 / 2
≈ 5.6 x 10-5 + 6.5 x 10-4
≈ 7.1 x 10-4
You can never ignore the b .λDU .T1 / 2 term.
If we approximated λDU ≈ 0.01pa:
b .λDU .T1 / 2 ≈ 0.1 x 0.01 x 1/2 ≈ 5 x 10-4
The easy way uses the simple approximation PFDG ≈ 2/ 3 b .λDU .T1
≈ 0.7 x 0.1 x 1.3 x 10-2 x 1 ≈ 9 x 10-4
Using λDU ≈ 0.01pa gives an answer that is close enough:
≈ 0.7 x 0.1 x 0.01 x 1 ≈ 7 x 10-4
Remember that failure rate always vary, so PFD estimates are never precise
Analyse the root cause of the failures and determine remedial actions to keep failure rates below the target.
Interfaces between the SIS and any other systems may have adverse impact on safety functions. Interfaces can lead to systematic failures.
Non-safety functions may interfere with safety functions. External interfaces may increase vulnerability to cyber-attack.
In practical terms the volume of operating experience needed for both the Route 2H and IEC 61511 methods is about the same. Enough device operating time is needed to measure 2 or 3 similar failures at the claimed failure rate. If the claimed rate is 0.01 pa (1 failure in 100 years, or about 10^6 hours) then about 200 to 300 device-years of experience should be enough to demonstrate that the rate is realistic and achievable.
Route 2H requires failure rates to be estimated with a 90% statistical confidence level, assuming that electronic components have true and fixed values of failure rates that can be measured accurately and repeatably.
The IEC 61511 method is based on Route 2H but allows 70% confidence levels, the same as used for Route 1H.
If only 3 failures have been measured, then a 90% confidence level estimate is 1.4 x higher than a 70% confidence level estimate.
Strictly speaking, confidence levels can only be applied to estimates of parameters that have a true value that can be measured. IEC 61511 states that failure rates can vary widely between applications. IEC 61511 requires credible and traceable reliability data measured in a similar operating environment.
The HFT = 1 for 2oo3 (HFT = N-M), so type B devices must be shown to have a SFF ≥ 90% (e.g. using FMEA) and a safety manual will be needed to show compliance with IEC 61508.
Route 1H applies to devices that comply with IEC 61508.
For example, the 3 components could be any of these: proximity sensor, sensor mounting bracket, junction box, signal cable, power supply, marshalling box, high speed counter card.
The failure modes could be stuck on, stuck off, open circuit, short circuit, electronics failure, and so on.
In this example all these failure modes would be dangerous failures because they prevent a high speed from being sensed correctly.
These failure modes could all be detected by sensor comparison or by plausibility checking (i.e. turbine is running but zero speed is sensed).
2oo2 1E-02
1oo1 5E-03
2oo3 1E-03
1oo2 7E-04
2oo4 4E-04
1oo3 3E-04
Studies into the causes of major accident events over the past 30 or 40 years consistently show that they result from multiple systematic problems in the way safety systems are engineered, operated, maintained, and managed.
Yes, safety functions may be designed to prevent a hazardous event from occurring or to mitigate the effects of an event by limiting the consequences.
For example, safety functions may be applied in fire and gas detection to shutdown ventilation systems, isolate fuel sources or to initiate fire suppression systems.
HFT is the ability of a component or subsystem to continue to be able to undertake the required safety instrumented function in the presence of one or more dangerous faults in hardware.
HFT is required to compensate for uncertainty in design assumptions and uncertainty in failure rate data.
The standards were developed in response to increasing complexity of safety related systems.
The complexity increases the risk of systematic failure; more than 90% of the failures are systematic in nature and can be prevented or controlled through quality techniques, procedures and practices.
Functional safety standards apply quality management techniques in a deliberate and detailed manner to achieve and maintain risk reduction, and the level of effectiveness is increased in proportion to the risk reduction required.
Safety functions reduce the risk of a specific hazardous event by at least an order of magnitude.
Safety functions implement a specific safety action either to put equipment into a safe state in response to a detected hazard (demand mode) or to keep equipment in a safe state (continuous mode).
Safety functions always have 3 subsystems: sensor, logic and final element.
≈ 4 x 10-5 (sensor) + 9 x 10-6 (logic solver, analog in) + 7 x 10-4 (valves)
≈ 7.5 x 10-4
≈ 8 x 10-4
Do not show two significant figures because that implies better precision than is credible. It would be very misleading to say that the answer is 7.49 x 10-4.
In reality the uncertainty in the failure rates is something like +/- 50% at best.
It would be just as valid to estimate PFDG ≈ 10-3
443 out of 4,000 in one year is close enough to 1 in 10 per annum.
In semi-quantitative analysis we choose between values in orders of magnitude: 0.01 pa, 0.1 pa or 1 pa.
The rate of process control failures as causal event is usually taken as 0.1 pa, and that would be appropriate in this case.
One significant figure of precision is appropriate because we can only estimate hazardous event rates and hazardous consequences to the nearest order (or perhaps half-order) of magnitude.
The rate of causal events is not pre-determined by any fixed parameters.
Causal events are never purely random.
The rates depend on human behaviour, equipment condition and on environmental factors.
The consequence of an event may depend on many factors cannot be predicted with accuracy.
See 61511-1 5.2.6.1.4, Note 2:
– Stage 1 – After the hazard and risk assessment has been carried out, the required protection layers have been identified and the safety requirement specification has been developed.
– Stage 2 – After the safety instrumented system has been designed.
– Stage 3 – After the installation, pre-commissioning and final validation of the safety instrumented system has been completed and operation and maintenance procedures have been developed.
– Stage 4 – After gaining experience in operating and maintenance.
– Stage 5 – After modification and prior to decommissioning of a safety instrumented system.
Note that IEC 61511 §17.2.6 requires the Stage 5 FSA before modification activity begins on the system. In practice Stage 5 FSA starts before the modification begins and finishes when records are available to show that the modification has been successfully completed and validated.
λDU = 6 x 10-8/h in a 2oo3 arrangement
PFDG ≈ 6.( (1-bD).λDD + (1-b ).λDU)² .λDU. λD (T1/2) .λDU. λD (T1/3) + b.λDU (T1/2)
PFDG ≈.λDU ² .T1² + b.λDU (T1/2)
≈ (6 x 10-8 x 8760)² + 0.15 x 6 x 10-8 x (8760 /2)
(note that b is multiplied by 1.5 for 2oo3 voting)
≈ 3 x 10-7 + 4 x 10-5
≈ 4 x 10-5 (notice that the b.λDU.T1/2 strongly dominates the result, again)
The RRF ≈ 800 or ≈ 1,000 depending on how you choose to apply the rounding. I would class this as SIL 2. It is almost good enough for SIL 3 but close to the borderline.
We would need to improve the RRF to be confident of achieving SIL 3, for example we could reduce the inspection and test interval.
It should be planned in advance, taking into account the systematic capability required, the degree of novelty, complexity and familiarity and also the degree of risk.
See IEC 61511-1 §7.2.6 NOTE 1:
Selection of techniques and measures for the verification process and the degree of independence depends upon a number of factors including degree of complexity, novelty of design, novelty of technology and required SIL.
IEC 61511-1 §12.5.2 requires the application program and documentation to be reviewed by a competent person not involved in its original development.
See 61511-1 5.2.7: the stage at which formal configuration control is to be implemented needs to be specified in planning
In principle change control should be applied as soon as items are released for use or released for testing.
Refer to IEC 61511 section 5.2.7
5.2.7.1.1 Procedures for configuration management of the SIS during the SIS and software safety life-cycle phases shall be available; in particular, the following should be specified:
• the stage at which formal configuration control is to be implemented;
• the procedures to be used for uniquely identifying all constituent parts of an item (hardware and software);
• the procedures for preventing unauthorized items from entering service.
Configuration management applies to any item that is subject to version changes, including hardware, software, application program, firmware, programming tools and utilities.
The RRF needed is the intermediate event frequency divided by tolerable frequency: RRF = 10-3pa / 10-5pa = 100, i.e. SIL 2 (or arguably SIL 1)
10-1 pa x 0.01 x 1 x 1 x 1 = 10-3 pa
Nil because the IEF = tolerable frequency
2 or 3 fatalities = Extensive: 10-5pa
Once in the past 6 years is closer to 1 in 10 years than to once every year.
It is not frequent enough for ‘Frequent’, 1pa.
It is more appropriate to class it as ‘High’, 10-1pa
= λS + λDD / λS + λDD + λDU
= 500 + 200 / (500 +200 + 1500) = 700 / 2200 = 32%
DO NOT INCLUDE ‘no effect’ failures
IEC 61511-1 does not say who is responsible. It needs to be planned:
5.2.2.1 Persons, departments, organizations or other units which are responsible for carrying out and reviewing each of the SIS safety life-cycle phases shall be identified and be informed of the responsibilities assigned to them.
5.2.4 Planning
Safety planning shall take place to define the activities that are required to be carried out along with the persons, departments, organizations or other units responsible to carry out these activities. This planning shall be updated as necessary throughout the entire SIS safety life-cycle (see Clause 6) and carried out to a detailed activity level commensurate with the role the individual or organization is performing in the SIS safety life-cycle.
The main objective is to make a judgement as to the functional safety and safety integrity achieved by every SIF of the SIS.
Refer to IEC 61511-1 §5.2.6.1.5
Prior to hazards being introduced confirm:
Further FSA plannedThe FSA should also judge the systematic integrity. FSA should review evidence of appropriate functional safety management and evidence of sufficient systematic integrity, such as records of verification and compliance with appropriate techniques and measures.
According to IEC 61511-1:
5.2.6.1.2 The membership of the FSA team shall include at least one senior competent person not involved in the project design team (for stages 1, 2 and 3) or not involved in the operation and maintenance of the SIS (for stages 4 and 5).
NOTE When the assessment team is large, consideration should be given to having more than one senior competent individual on the team who is independent from the project team.
According to IEC 61508-1:
8.2.15 The minimum level of independence of those carrying out a functional safety assessment shall be as specified in Tables 4 and 5. Product and application sector international standards may specify, with respect to compliance to their standards, different levels of independence to those specified in Tables 4 and 5. The tables shall be interpreted as follows:
– X: the level of independence specified is the minimum for the specified consequence (Table 4) or safety integrity level/systematic capability (Table 5). If a lower level of independence is adopted, then the rationale for using it shall be detailed.
– X1 and X2: see 8.2.16.
– Y: the level of independence specified is considered insufficient for the specified consequence (Table 4) or safety integrity level/ systematic capability (Table 5).
8.2.16 In the context of Tables 4 and 5, only cells marked X, X1, X2 or Y shall be used as a basis for determining the level of independence. For cells marked X1 or X2, either X1 or X2 is applicable (not both), depending on a number of factors specific to the application. The rationale for choosing X1 or X2 should be detailed. Factors that will make X2 more appropriate than X1 are:
– lack of previous experience with a similar design;
– greater degree of complexity;
– greater degree of novelty of design;
– greater degree of novelty of technology.
NOTE 1 Depending upon the company organization and expertise within the company, the requirement for independent persons and departments may have to be met by using an external organization. Conversely, companies that have internal organizations skilled in risk assessment and the application of safety-related systems, that are independent of and separate (by ways of management and other resources) from those responsible for the main development, may be able to use their own resources to meet the requirements for an independent organization.
NOTE 2 See 3.8.11, 3.8.12 and 3.8.13 of IEC 61508-4 for definitions of independent person, independent department, and independent organization respectively.
NOTE 3 Those carrying out a functional safety assessment should be careful in offering advice on anything within the scope of the assessment, since this could compromise their independence. It is often appropriate to give advice on aspects that could incur a judgement of inadequate safety, such as a shortfall in evidence, but it is usually inappropriate to offer advice or give recommendations for specific remedies for these or other problems.
8.2.17 In the context of Table 4, the consequence values for the specified level of independence are:
– Consequence A: minor injury (for example temporary loss of function);
– Consequence B: serious permanent injury to one or more persons, death to one person;
– Consequence C: death to several people;
– Consequence D: very many people killed.
The consequences specified in Table 4 are those that would arise in the event of failure of all the risk reduction measures including the E/E/PE safety-related systems.
IEC 61511-1 §5.2.6.1.5
Prior to the identified hazards being present (i.e., Stage 3) the FSA team shall confirm that:
Due diligence in our duty of care means we need to:
Audit examines compliance with procedures, processes and practices.
Functional safety assessment takes into account audits but goes much further. It makes an overall judgement about the functional safety achieved by the system.
Most failures can be anticipated given the condition of the equipment, its age and its environment (process conditions and ambient conditions).
Both age and wear can usually be monitored by measuring and trending condition indicators. Typical indicators include:
• Actuator force or stem torque for a valve
• Leakage rate
• Voltage versus current for a transmitter (corrosion and increasing impedance in circuits)
• Transmitter response time, spectral characteristics of process signals (indicating changes in sensor systems such as contamination)
• Temperature of components, enclosures
• Moisture content
• Salinity
• Vibration levels and spectral characteristics
• Cracking, discolouration due to heat or radiation
• Brittleness
• Clearances
• Backlash
• Spring tension
• Wall thickness
• Number, extent and duration of process perturbations
• Alarm frequency and duration
• Alarm suppression frequency and duration
All failures and all demands must be analysed.
Performance measurement validates the assumptions in the design:
• Demand rates
• Failure rates
Unexpected behaviour must be analysed
Those responsible for O&M also need to review assumptions regarding factors such as occupancy and corrosion.
IEC 61511-1 §19.2.9,
See also 12.4.2, 14.2.4, 15.2.6, 16.2.2 e and 16.3.3
• H&RA
• Details of equipment used for SIFs and SRS
• Organisation for maintaining FS
• Procedures to achieve and maintain FS
• Modification information
• Safety manuals
• Records of the design, implementation, test and validation
• Application program documentation
• Installation and commission records
• Records of proof test and inspection
The SRS and probability of failure (PF) quantification set benchmarks for proof test interval and proof test coverage.
Proof test and inspection plans should be based on FMEDA studies (or similar) to achieve the required coverage, given knowledge of the anticipated failure modes and the diagnostics that have been implemented.
The plans need to take into account accessibility for testing.
The likelihood or rate of ‘never detected’ failures should be minimised by design, ensuring that all anticipated failures can be revealed by diagnostics, inspection or testing.
Plans should consider staggered test intervals.
Independence of maintainers may be needed to improve systematic integrity and to reduce common cause failure.
Based on IEC 61511-1 Clause 5:
Organisation, responsibilities
Competence
Management of recommendations
Performance management
Assessment and audit
Revision and change control
Configuration management
Based on IEC 61511-1 Clause 6:
Safety lifecycle – definition of activities, outputs and responsibilities (including verification activities for each output)
Based on IEC 61511-1 Clause 7:
Verification plan
Based on IEC 61511-1 Clause 16:
Procedures for both routine and abnormal activities
Preventive and breakdown maintenance activities
Procedures, techniques and measures
Response to faults and failures
Operation of bypasses
Monitoring compliance
Analysis of performance and unexpected behaviour
Collection of failure rate and demand data
Inspection and proof testing procedures
Records that need to be kept
Timing of activities
Based on IEC 61511-1 Clause 17:
Management of modifications
Based on IEC 61511-1 Clause 19:
Information management
Use templates based on IEC 61511-1 clauses 11 and 12, and verify using checklists based on those clauses.
Demonstrate compliance to IEC 61508-2 and/or -3 and provide the information equivalent to a safety manual in accordance with the two Annex D lists,
OR provide evidence of prior use, IEC 61511-1 § 11.5.3 through to 11.5.6
Confirm that the devices are suitable for the operating environment.
Refer to IEC 61508-2 §7.4.9.3 and Annex D,
and/or for software IEC 61508-3 §7.4.2.12 and Annex D
Demonstrate compliance to IEC 61508-2 and/or -3 and provide a safety manual, OR provide evidence of prior use, IEC 61511-1 § 11.5.3 through to 11.5.6
Confirm that the devices are suitable for the operating environment.
The design should be traceable to the requirements of the SRS, the APSRS and to the requirements of IEC 61511.
IEC 61508-4, 3.6.20
process safety time: period of time between a failure, that has the potential to give rise to a hazardous event, occurring in the EUC or EUC control system and the time by which action has to be completed in the EUC to prevent the hazardous event occurring
The Application Program Safety Requirements Specification is derived from the SRS, adding sufficient detail to allow the software design and implementation to achieve the required safety integrity and to allow an assessment of functional safety to be carried out.
Requirements tracking is required to ensure that all of the safety requirements are addressed in the design and all of the requirements are demonstrated objectively through the validation inspection and testing process.
Forwards traceability is concerned with ensuring that every objective requirement is addressed in the subsequent detailed design documents and testing specifications and it enables users to find where requirements have been addressed so that impact of changes to requirements can be managed.
Backwards traceability is broadly concerned with checking that every implementation decision (interpreted in a broad context, and not confined to code implementation) is clearly justified by some requirement.
10.3.2
To provide a complete and consistent summary of the user’s safety requirements as a basis for the design, implementation, testing and maintenance of the system.
There is no such thing as the ‘best’ architecture.
The different architectures have different advantages and disadvantages.
The choice of architecture depends on the requirements of each individual end user.
Primarily for efficiency, to avoid wasted effort in a ‘journey of discovery’.
Avoid re-design.
Very early in the design – when the concept is being developed.
Generally this should be as the P&IDs are being developed and well before the SRS is developed.
We should have a pretty good idea of which SIFs are likely to be SIL 2 or SIL 3.
We should have a pretty good idea of which SIFs are likely to need on-line maintenance access.
Balance between risk reduction and process sensitivity (cost of downtime, and cost of spurious trips):
Risk / SIL – what PFD or PFH must be achieved?
Process sensitivity – what is the target for spurious trip rate?
Cost to install
Operability and maintainability – can we get easily get access for maintenance on-line without process downtime?
Response to detected failure:
• Trip?
• Bypass?
• Compensating measures?
• Dependability of response?
See IEC 61511-1 11.3.1
When a dangerous fault in an SIS has been detected (by diagnostic tests, proof tests or by any other means) then
compensating measures shall be taken to maintain safe operation.
If safe operation cannot be maintained, a specified action to achieve or maintain a safe state of the process shall be taken.
Consider the process safety time and the ability of the operator to react promptly and dependably.
In IEC 61511-1 Edition 1 sub-clause 11.4 included a note explaining that ‘The minimum hardware fault tolerance has been defined to alleviate potential shortcomings in SIF design that may result due to the number of assumptions made in the design of the SIF, along with uncertainty in the failure rate of components or subsystems used in various process applications.’
In Edition 2 the note has been moved into part 2, IEC 61511-2 §11.4.1
Fault tolerance is defined as:
Shared or common hardware and software elements shall conform to the highest safety integrity (and systematic capability) level.
The fundamental requirement is that non-safety functions must not compromise safety
Non-safety functions that are not separated must be treated as if they were safety functions – subjected to the same rigorous practices to eliminate faults.
Simpler interfaces
Common vendor
Lower capital cost
Lower training costs
Shared tools
Shared hardware
Easier data exchange, information management
Easier to manage
To avoid common cause, common mode and dependent failures
Temperature (specify conservatively, protect, insulate, relocate)
Vibration (isolate, relocate)
Contamination (appropriate process connection design, sensor type)
Corrosion (materials compatibility design)
EMI (source identification and risk assessment, shielding, segregation)
Power supply quality (specification, filtering, monitoring)
Air / hydraulic fluid quality (specification, filtering, monitoring)
Errors in design/selection/software/maintenance (appropriate checking, review, audit and inspection)
IEC 61508 compliant safety manuals for the devices, or a dossier with the equivalent information.
Data may be available from certificates, from industry databases (exida, OREDA, SINTEF).
Data may be available from prior use.
We need to ensure that the failure rates are credible, traceable, achievable and dependable.
For the sensors HFT =1, and the confidence level is 90% so SIL 3 can be claimed.
The logic solver is certified SIL 3
We can’t apply Route 2H for the valves unless we have enough information to estimate the failure rate of the valves with 90% confidence level.
λ90% could be as much as 50% higher than λ70%, so the PFD might be closer to 1,000, which is marginal for SIL 3.
If we apply Route 1H we can claim SIL 2.
Alternatively we could apply the IEC 61511 Table 6 HFT requirements and claim SIL 3 because HFT =1. Either way we need to show that we have credible and traceable failure rate data based on operating experience with that type of equipment.
If PFDG ≈ 7 x 10-4 RRF ≈ 1400, SIL 3,
If PFDG ≈ 10-3 RRF ≈ 1000, SIL 3 borderline,
Solenoid valve:
λDU = 200 x 10-9 per hour x 0.9 x 104 hours per year ≈ 2 x 102 x 10-9 x 104
≈ 2 x 10-3 pa
Ball valve and actuator:
λDU = 1230 x 10-9 per hour x 0.9 x 104 hours per year ≈ 1.1 x 103 x 10-9 x 104
≈ 1.1 x 10-2 pa
The combined failure rate λDU ≈ 0.013pa or ≈ 0.01pa
The sensor and logic solver are both certified for SIL 3 in this configuration, so presumably they both have SFF high enough for SIL 3.
We need to analyse the HFT requirements for the final element subsystem.
We might arguably claim that the valves are Type A. SFF < 60% and HFT = 1 so only SIL 2 could be claimed if we apply Route 1H.
If we had enough information to establish a 90% confidence level in the valve failure rate data we could claim SIL 3 according to Route 2H.
Alternatively we could apply the IEC 61511 Table 6 HFT requirements and claim SIL 3 because HFT =1. We would need to show that we have credible and traceable failure rate data based on operating experience with that type of equipment.
However if we are told ‘No other information is available about failure rates for the valves and actuators’ then we have no evidence that the devices are suitable for use in SIS service. We cannot make a claim for prior use or for IEC 61508 compliance. We cannot claim any SIL at all because the systematic integrity (systematic capability) is not established.
PFD is improved by roughly x 0.5, i.e. ≈ 0.0007
RRF ≈ 1500, SIL 3
We should also think about how can we reduce λDU and b .
Logic solver has analog inputs (a multilevel coded mA signal) and a digital output so the PFD is 9 x 10-6
≈ 4 x 10-4 for the LSHH + 9 x 10-6 for the PLC + 9 x 10-4 for the valves
≈ 1.3 x 10-3
≈ 0.0013 or roughly ≈ 0.001
PFDG ≈ ((1-b ).λDU T1)²/ 3 + b .λDU .T1 / 2
≈ (0.02pa x1)²/ 3
≈ 1 x 10-4
A whole order of magnitude difference! The common cause failures CANNOT be neglected.
PFDG ≈ ((1-b ).λDU T1)²/ 3 + b .λDU .T1 / 2
≈ (0.9 x 0.02pa x1)²/ 3 + 0.1 x 0.02pa x1 / 2
≈ 1 x 10-4 + 1 x 10-3 ≈ 10-3
If you insist on working with unwarranted precision you will get the same result:
≈ (0.9 x 0.017pa x1)²/ 3 + 0.1 x 0.017pa x1 / 2
≈ 8 x 10-5 + 8.5 x 10-4
≈ 9 x 10-4, round up to 1 x 10-3
Solenoid valve:
λDU = 230 x 10-9 per hour x 0.9 x 104 hours per year ≈ 2 x 102 x 10-9 x 104
≈ 2 x 10-3 pa
Ball valve and actuator:
λDU = 1.7 x 10-6 per hour x 0.9 x 104 hours per year ≈ 1.5 x 10-6 x 104
≈ 1.5 x 10-2 pa
The combined failure rate λDU ≈ 0.002 + 0.015 = 0.017pa,
we should approximate that to 0.02pa
DO NOT IMAGINE THE ANSWER IS PRECISE!
λDU = 90 FITS = 90 failures per 109 hours, i.e. 90 x 10-9 failures per hour
Convert to failures per year by multiplying failures per hour x hours per year
≈ 90 x 10-9 hours x 8760 hours per year ≈ 0.9 x 102 x 10-9 x 9 x 103 pa
≈ 8 x 10-4 pa
PFDG = λDU x T1/ 2
≈ 8 x 10-4 x 1 / 2
≈ 4 x 10-4
(Based on IEC 61511-1 §15.2)
• Definition of validation activities with respect to SRS
• Procedures for follow up and resolution of recommendations
• Consideration of all process operation modes
• Techniques and measures to be used (considering risk of hazards), technical strategies
• Timing and sequence of activities
• Responsibilities, levels of independence
• Information against which validation is to be carried out (traceability to specifications and SRS)
• Identification of items and application program subject to validation
• Test environment, tools, equipment
• Acceptance criteria
• Procedures for managing failures and discrepancies
• Calibration requirements
• Documentation to be produced
• Records to be kept
Keep all records of verification:
• What was checked
• How was it checked
• What basis was it checked against
• How were discrepancies identified and resolved
Verification records are essential for demonstrating systematic integrity and for demonstrating due diligence in complying with the appropriate standards and practices
Verification is about checking lifecycle phase outputs with respect to inputs. It involves analysis and/or tests to demonstrate that, for specific inputs, the outputs meet in all respects the objectives and requirements set for the specific phase. It applies to every output of every phase.
Validation is of the end product after installation with respect to requirements. Validation means demonstrating that the SIF(s) and SIS after installation meet the SRS in all respects.
IEC 61511-1 §5.2.2.2 says that all parties involved in SIS shall be competent to carry out the activities for which they are accountable.
IEC 61508-1 §6.2.13 says ‘all persons with responsibilities [for safety lifecycle activities] shall have the appropriate competence […] relevant to the specific duties that they have to perform.’
The IEC 61508 requirement is broader than the one in IEC 61511 because people can be responsible for something without being accountable. Accountability usually sits higher in the chain of command.
See 5.2.5:
a) hazard analysis and risk assessment;
b) assurance activities;
c) verification activities;
d) validation activities;
e) FSAs;
f) functional safety audits;
g) post-incident and post-accident activities.
Sub-clause 5.2.5.3 addresses performance measurement and corrective actions related to failures and demands.
Refer to IEC 61511-1 §17.
Prior to any modification to SIS, procedures must be in place for identifying and requesting the work, identifying hazards that may affected and for authorising and controlling the changes. The concern is that a modification may increase hazard rate or consequence, or it may reduce effectiveness of risk reduction. Modifications or changes may have unintended consequences and may introduce new hazards.
Key elements in the modification process are:
• Identify and request the work to be done
• Assess the impact on safety
• Plan the change, update documentation
• Independent functional safety assessment before modification work begins
• Obtain authorisation
• Revalidate after implementation
• Notify personnel affected by the change
• Maintain records
5.2.6.2.3 Management of change procedures shall be in place to initiate, document, review, implement and approve changes to the SIS other than replacement in kind (i.e. like for like).
17 SIS modification
17.2 Requirements
17.2.1 Prior to carrying out any modification to a SIS, procedures for authorizing and controlling changes shall be in place.
17.2.2 The procedures shall include a clear method of identifying and requesting the work to be done and the hazards that may be affected.
17.2.3 Prior to carrying out any modification to a SIS (including the application program) an analysis shall be carried out to determine the impact on functional safety as a result of the proposed modification. When the analysis shows that the proposed modification could impact safety then there shall be a return to the first phase of the SIS safety life-cycle affected by the modification.
17.2.4 Safety planning for the modification and re-verification shall be available. Modifications and re-verifications shall be carried out in accordance with the planning.
17.2.5 All documentation affected by the modification shall be updated.
17.2.6 Modification activity shall not begin until a FSA is completed in accordance with 5.2.6.1.9 and after proper authorisation.
17.2.7 Appropriate information shall be maintained for all changes to the SIS. The information shall include:
a) a description of the modification or change;
b) the reason for the change;
c) identified hazards and SIFs which may be affected;
d) an analysis of the impact of the modification activity on the SIS;
e) all approvals required for the changes;
f) tests used to verify that the change was properly implemented and the SIS performs as required;
g) details of all SIS modification activities (e.g., a modification log);
h) appropriate configuration history;
i) tests used to verify that the change has not adversely impacted parts of the SIS which were not modified.
17.2.8 Modification shall be performed with qualified personnel who have been properly trained. All affected and appropriate personnel should be notified of the change and trained with regard to the change.
Anybody with responsibility for one or more phases in a safety lifecycle is responsible for managing their own scope, the scope of their suppliers, and for managing interfaces with the client and other parties.
Ultimately the end user has to take responsibility for ensuring that management responsibilities are clearly defined and understood for each package of work and across all organisational boundaries.
The safety lifecycle plan outlines the phases for the SIS project, defining each phase with:
• inputs and outputs,
• responsibilities
• verification activities
It provides clarity to the team regarding the necessary activities and each person’s responsibilities.
A safety lifecycle plan can take the form of a table of the activities and outputs for each phase.
Clarity of information, removal of distractions and uncertainty – e.g. implement alarm management and ‘ASM’ graphics
Training
Drilling (i.e. regular repeated practice)
An increase of +1 is allowable provided that the system designer provides justification that there is sufficient independence between the elements through common cause failure analysis.
No, IEC 61511-1 §6.2.3 requires planning for the techniques, measures, procedures and responsible organisation for all safety lifecycle phases.
IEC 61511-1 §12.6.2 requires selection of methods, techniques and tools for the for each lifecycle phase for the application program.
0.1 at best, given sufficient information to recognise the hazards, familiarity with the scenario and enough time in which to respond.
IEC 61508-4
3.5.9
systematic capability
measure (expressed on a scale of SC 1 to SC 4) of the confidence that the systematic safety integrity of an element meets the requirements of the specified SIL, in respect of the specified element safety function, when the element is applied in accordance with the instructions specified in the compliant item safety manual for the element
3.5.6
systematic safety integrity
part of the safety integrity of a safety-related system relating to systematic failures in a dangerous mode of failure
NOTE Systematic safety integrity cannot usually be quantified (as distinct from hardware safety integrity which usually can).
IEC 61511-1
3.2.80
systematic capability
measure (expressed on a scale of SC 1 to SC 4) of the confidence that the systematic safety integrity of a device meets the requirements of the specified SIL, in respect of the specified safety function, when the device is applied in accordance with the instructions specified in the device safety manual
3.2.82
systematic safety integrity
part of the safety integrity of the SIS relating to systematic failures in a dangerous mode of failure
Define the lifecycle phases by the defining the specific outputs to be produced (e.g. documents, data, equipment items, software code modules) and define the inputs that the outputs are to be based on.
For each output define who is responsible for preparing, verifying and approving the output.
Define the method of verification and the verification records that are to be kept.
Define specific techniques, measures, guidelines or templates to be used.
Policy and Strategy
Responsibilities
Competency
Hazard and Risk Analysis
Follow up and resolution of recommendations
Supplier Quality
Supplier FSMS
Performance Evaluation
Assessment and Auditing
Management of Changes
Configuration Management
And then in section 6, Life-cycle and document planning,
and in section 7, Verification planning
λT / 2
≈ 2 x 10-7 h-1 x 8,760 h / 2
≈ 9 x 10-4 or about 10-3
The dangerous undetected failures are:
Undetected fouling of remote seal causes high reading | 200 |
Undetected sensor failure causing high reading | 180 |
Other undetected dangerous failure | 50 |
The total λDU is 200 +180 + 50 = 430 FITS
λDU = 430 x 10-9 per hour = 4.3 x 10-7 per hour
8760 hours per year ≈ 0.9 x 104 hours per year
λDU ≈ 0.9 x 104 x 4.3 x 10-7 per year
≈ 4 x 10-3 per year
≈ 0.004 pa
This corresponds to a MTBFDU of about 250 years.
λ = 1000 FITS = λDU + λDD + λS
Therefore λDD + λS = 1000 – 430 FITS = 570 FITS
The SFF is λDD + λS / (λDU + λDD + λS ) = 570/1000 = 57%
Random
Systematic
Systematic
Random? Maybe, but unlikely to be purely random
Systematic
Systematic
Systematic
It is acceptable to use generic data in Route 1H:
7.4.9.5 The estimated failure rates, due to random hardware failures, for elements (see 7.4.9.4 a) and c)) can be determined either
NOTE 1 Any failure rate data used should have a confidence level of at least 70 %. The statistical determination of confidence level is defined in reference [9] of the Bibliography. For an equivalent term: “significance level”, see reference [10].
NOTE 2 If site-specific failure data are available then this is preferred. If this is not the case then generic data may have to be used.
Refer to IEC 61508-2 §7.4.4.1.
The difference in Type A and Type B is essentially to do with whether or not:
a) the failure modes of all constituent components are well defined; and
b) the behaviour of the element under fault conditions can be completely determined;
For both Type A and Type B IEC 61508-2 §7.4.4.1 requires:
c) there is sufficient dependable failure data to show that the claimed rates of failure for detected and undetected dangerous failures are met.
‘Critical’ and ‘possible’ puts us in Risk Class II – the middle zone of the ALARP triangle, so we need to implement further risk reduction unless the cost is disproportionately high.
The risk exposure is approximately:
1 fatality in 20 years, i.e. 0.05 fatalities y-1 and
$100M /20 years which is $5M y-1
Reducing the frequency to 1 in 200 reduces the risk to
0.005 fatalities y-1 and
$0.5M y-1.
Over 20 years we can expect to save
1 life (20 x 0.045) and
20 x $4.5M = $90M.
We could justify spending something in the range $10M to $100M because of the value of the damage. It would be hard to justify spending much more than $100M.
Considering the loss of life alone we might be able to justify spending $1M to $3M to avert a fatality.
$10M could be justified if the risk were toward the top of Risk Class II
Failure rates from different sources vary over a range of 1 or 2 orders of magnitude.
Additional 0.1 for alarm, therefore need only RRF 10, SIL 1
We would need to be confident that the alarm will be treated as a safety critical alarm. How can we be sure that we can depend on the operator responding correctly? How are safety critical alarms defined and managed?
Initiating frequency 10-1 pa, process design factor 0.01, BPCS factor 1 because it is the BPCS that has failed so it cannot be counted in risk reduction, Alarm factor 1 because there is no independent alarm. 0.01 for the PSV can be claimed.
10-1 pa x 0.01 x 1 x 1 x 0.01 = 10-5 pa
Start with the initiating event of once in 10 years or 0.1 pa.
Multiply by the probability of failure of the two existing risk controls, x 0.1 for the operator failing to respond successfully to the alarm and x 0.01 for the PSV failing.
That gives us:
0.1 pa x 0.1 x 0.01 = 0.0001 pa.
It may be easier to work this out using scientific notation, simply add the exponents:
10-1 pa x 10-1 x 10-2 = 10-4 pa
RRF = consequence frequency / tolerable frequency
= 10-4 pa / 10-5 pa = 10,
PFDAVG = tolerable frequency / consequence frequency
= 10-5 pa / 10-4 pa = 0.1,
Lower end of SIL 1 range
The consequence frequency without the SIF is 0.1 pa x 10 % = 10-2 pa.
The RRF needed is the consequence frequency divided by tolerable frequency:
RRF = 10-2 pa / 10-5 pa = 1,000
i.e. we want to reduce the consequence frequency by a factor of 1,000 to reach a tolerable level.
This is on the border of SIL 2 and SIL 3. It would usually be classed as SIL 3.
= λDD / λDD + λDU
= 500 / (500 + 1500) = 500 / 2000 = 25%
Maybe but unlikely. Can you find dependable data with a high confidence level. How could you justify it?
IEC 61508-2 §7.4.4.3.3
If Route 2H is selected, then the reliability data used when quantifying the effect of random hardware failures (see 7.4.5) shall be:
a) based on field feedback for elements in use in a similar application and environment; and,
b) based on data collected in accordance with international standards (e.g., IEC 60300-3-2 or ISO 14224:); and,
c) evaluated according to:
i) the amount of field feedback; and,
ii) the exercise of expert judgement; and where needed,
iii) the undertaking of specific tests;
in order to estimate the average and the uncertainty level (e.g., the 90 % confidence interval or the probability distribution (see Note 2)) of each reliability parameter (e.g., failure rate) used in the calculations.
NOTE 1 End-users are encouraged to organize relevant component reliability data collections as described in published standards.
NOTE 2 The 90 % confidence interval of a failure rate is the interval [ 5 %, 95 %] in which its actual value has a probability of 90 % to belong to. has a probability of 5 % to be better than 5 % and worse than 95 %. On a pure statistical basis, the average of the failure rate may be estimated by using the “maximum likelihood estimate” and the confidence bounds ( 5 %, 95 %) may be calculated by using the 2 function. The accuracy depends on the cumulated observation time and the number of failures observed. The Bayesian approach may be used to handle statistical observations, expert judgement and specific test results. This can be used to fit relevant probabilistic distribution functions for further use in Monte Carlo simulation.
If route 2H is selected, then the reliability data uncertainties shall be taken into account when calculating the target failure measure (i.e. PFDavg or PFH) and the system shall be improved until there is a confidence greater than 90 % that the target failure measure is achieved.
IEC 61511-1 §11.9.3 does allow generic data to be used:
Route 2H depends on the availability of dependable failure rate information with a data confidence level of 90%
Route 1H depends on safe failure fraction, which can be improved by increased diagnostic coverage. With Route 1H it is possible to compensate for lack of dependability in failure rate data by providing diagnostic coverage.
Under Route 1H it is possible to justify SIL 3 with no hardware fault tolerance if the diagnostic coverage is sufficiently high. Roue2H always requires fault tolerance for SIL 3 and for continuous mode SIL 2.
In practice almost all failures are mostly systematic in nature, not purely random. The failure rate depends very heavily on how much effort is put into prevention of failure.
Failure rates from different sources may also vary due to the size of the data sets and due to the decisions made regarding which failures should be excluded. Some of the independent certifying authorities exclude systematic failures, some do not.
Speed control on a steam turbine with no other overspeed protection, or reactant ratio control in a process reactor so that if the function fails a hazardous situation could immediately occur.
A permissive interlock may be considered as a continuous mode SIF. The interlock maintains equipment in a safe state. If the interlock fails a hazard may immediately result.
Probability of failure per hour rather than probability of failure on demand. In a continuous mode SIF it is the failure of the SIF itself that is the cause of the hazardous event. That is why we characterise it by a failure rate.
It is on the borderline between SIL 1 and SIL 2. It could be classified as SIL 1 provided that the RRF is specified in the SRS.
In semi-quantitative methods and in qualitative methods a RRF of 100 (i.e. 2 orders of magnitude in risk reduction) would usually be classified as SIL 2.
SIL 2 functions provide RRF of at least 100.
The target for PFD will remain the same at 0.01 (1/100), regardless of whether we classify it as SIL 1 or SIL2. But if we classify it as SIL 2 we need to demonstrate systematic capability SC 2. More attention will need to be paid to quality control. It can be argued that the systematic integrity is far more important than the PFD so it is better to classify a RRF of 100 as SIL 2 rather than SIL 1 to be conservative.
Some organisations are conservative and round the RRF up, so a RRF of 90 would be classed SIL 2. Other organisations are less risk averse and would insist on RRF 100 being classed as SIL 1.
If a continuous mode SIF fails dangerously a potentially hazardous situation will occur unless action is taken to prevent it. A continuous mode SIF acts to maintain a safe state.
A demand mode SIF may fail dangerously but a potentially hazardous situation will not occur until there is a failure in the process or in the BPCS. A demand mode SIF takes no action until a demand is detected. It then acts to put the equipment into a safe state.
At least 4 orders of magnitude of risk reduction are needed to reduce the risk from ‘severe‘ to ‘medium’. We need to reduce both likelihood (through prevention) and consequence (through mitigation).
Reducing likelihood alone would only achieve ‘high’ risk at best, cell A+.0 is in the orange zone.
IEC 61508 is the more general standard that covers safety related systems in all industry sectors. IEC 61511 is a specific application of IEC 61508 to the process industry sector.
IEC 61508 covers the design and manufacture of equipment and components for safety systems. IEC 61511 is limited to the application of the equipment and components.
A good example of ‘wilful blindness’ is the tacit acceptance of a sub-standard situation because of a perceived lack of funding or due to inappropriate management priorities. ‘Learned helplessness’ afflicts employees who learn that the managers do not or cannot respond to issues that affect the employees’ safety.
Managers need to have clear policies and strategies in place to achieve safety and they need to communicate them effectively. They need to have the means to evaluate the achievement of their policies and strategies.
ICAF = Cost / Number of lives saved
$100k / [5 y x (5 x 10-4 pa – 1 x 10-5 pa)]
≈ $100k / 5 y x (5 x 10-4 pa) ≈ $100k / 25 x 10-4
= $4 x 103 / 10-4
= $4 x 107 = $40M per life, which is disproportionately high. No, the work cannot be easily justified on this basis.
People should not usually be expected to take more risk in their workplace than in their private lives, but an increased level of harm may be tolerated if it is in proportion to the perceived benefit to society. For instance, deep sea divers, underground miners, firefighters, police and soldiers may be exposed to higher risk.
The risk needs to be identified, assessed and managed to a level that is as low as is reasonably practicable. This means that the cost of further risk reduction would be disproportionately high compared to the benefit gained in safety.
Identify appropriate standards or work practices
Take reasonable steps to apply the standards or practices
Demonstrate compliance
Monitor compliance
Functional Safety refers to the application of safety instrumented functions to provide a defined degree of risk reduction in a hazardous facility.
Demand mode safety functions take action on demand to achieve a safe state in response to detection of a developing hazard,
Continuous mode safety functions take continuous action to maintain a safe state, preventing a hazard from occurring.
Not necessarily, the requirement stems from duty of care because the standards are well established and widely applied.
Ignoring the standards could be deemed to be negligence – unless you can find and apply some similar well-established standard.
Application of IEC 61511 is required by some other standards – such as AS 3814, and these may be referenced in legislation.
In some jurisdictions codes of practice may specifically refer to the standards.