Introduction: Why Failure Rates Matter in Functional Safety
Failure rates sit at the center of how we evaluate, verify, and maintain safety instrumented systems (SIS) under IEC 61511. They show up in SIL verification, PFDavg and STR calculations, equipment selection, and proof test strategy. If you understand the failure‑rate categories and how to obtain them correctly, you can avoid many of the mistakes that derail SIL verification or misrepresent SIS performance.
This article explains the four failure‑rate categories, where the values come from, how to interpret them, and how a functional safety engineer uses them in practice.
The Big Picture: What We Mean by a “Failure Rate”
In functional safety, failure rate is represented by λ (lambda), typically shown in units of 1/hour or in FITs (failures per 1E9 hours). It represents the frequency of random hardware failures—the only failures that can be mathematically modeled.
Random vs. Systematic Failures
Random hardware failures are the only failures that can be described by a rate. Systematic failures absolutely matter, but because they arise from design or process weaknesses, they cannot be represented by λ. You must manage them through quality processes and functional safety management—not statistics.
Constant Failure Rate Assumption
IEC 61511 modeling assumes a constant failure rate. Real‑world devices follow a classic bathtub curve: higher failures early in life (infant mortality), a long flat useful‑life period, and then increasing failures late in life. Failure‑rate data used for SIL verification assumes you are in that useful‑life region.
Statistical Nature of λ Values
Certification bodies and data handbooks treat λ values as statistical estimates with confidence bounds. The SIL certificate condenses this into a single number, but it should be remembered that every published λ carries uncertainty. However, most of that is a bit behind the scenes for functional safety engineers.
The Four Failure Rate Categories in Functional Safety
Failure modes in functional safety fall into four buckets based on whether the failure is safe or dangerous, and whether it is detected or undetected by diagnostics:
- λSD – Safe Detected
- λSU – Safe Undetected
- λDD – Dangerous Detected
- λDU – Dangerous Undetected
These categories determine whether the failure affects safety, reliability, or uptime—and how it appears in SIL and STR calculations. Note that “detected” means detected by diagnostics, not by proof tests.
How the Four Failure Rates Feed Functional Safety Calculations
λDU is the largest driver of safety risk. It represents failures that prevent the SIF from acting and are not discovered by diagnostics. This value is always used in PFDavg. λDD may be used in PFDavg if the detected failure notifies only (does not trip the SIF).
λSU always contributes to spurious trip rate (STR). λDD and λSD contribute to STR if the control logic forces a safe‑state action when diagnostics detect a failure. See this article on STR for more background.
λSU and λSD influence reliability and uptime but do not affect PFDavg.
Finally, proof tests exist specifically to reveal undetected dangerous failures—the λDU portion. Some engineers misunderstand this and assume proof tests simply “detect failures,” but in functional safety terms, proof tests are how you manage the DU accumulation. See this article on proof testing.
Where Failure Rates Come From
Where the Functional Safety Engineer Actually Gets Failure Rates
In real SIS design work, most failure‑rate data comes from certified products, where a certification body (CB) has already performed a detailed IEC 61508 assessment. The FS engineer reads the SIL certificate and the Safety Manual, which provide the extracted λDU, λDD, λSU, and λSD values. These two documents are the authoritative sources for day‑to‑day engineering. The underlying FMEDA exists, but it is not normally reviewed or needed by the practitioner.
When a certified device is not available, several alternate data routes exist. Each route has tradeoffs and requires engineering judgment:
- Manufacturer‑supplied reliability data – useful when transparent and well‑supported, but assumptions must be confirmed.
- Validated site or company datasets – often the most realistic if maintenance and failure tracking are strong.
- User‑generated field data – applicable for legacy equipment with a long operating history.
- Industry sources such as OREDA – helpful when carefully matched to device type, service, and environment.
These alternatives are less typical in functional safety practice, but they require more scrutiny than certified data.
How Failure Rates Are Determined (Typical Scenario)
For certified devices, IEC 61508 defines the process for establishing failure rates. Behind the scenes, the CB reviews or performs:
- FMEDA (failure‑mode analysis and diagnostic modeling)
- Test campaigns and empirical validation
- Diagnostic behavior evaluation
- Environmental and installation assumption checks
The FS engineer does not redo this work. Instead, their responsibility is to:
- Use the published λ values correctly
- Ensure the application matches the assumptions in the Safety Manual
- Integrate diagnostics the way the certification expects
This is where many real‑world errors occur—not because the values are wrong, but because the application does not match the assumptions behind them.
Broader Reliability Concepts
Failure rates are not standalone constants; they are shaped by reliability principles that sit behind the λ numbers. A functional safety engineer must understand these broader ideas to avoid misusing published data.
Systematic failures are not described by λ values. Random hardware failures can be modeled with rates; systematic failures cannot. They arise from design issues, configuration errors, software defects, or procedure gaps. They must be controlled through functional safety management (such as what IEC 61508 does), not reliability math.
Failure‑rate uncertainty is always present. λ values are statistical estimates derived from limited testing, modeling, or field data. Certification bodies select a representative value for the SIL certificate, but there is natural variability behind every λ. The published number is not a perfect truth—it is a useful engineering approximation.
Application and environment can change the true failure rate. A device used in corrosive service, high vibration, or aggressive cycling may experience a higher effective λ than the certified value. Likewise, poor installation, improper mounting, or low‑quality air supply (for valves) can shift failure behavior. The published λ applies only when the Safety Manual conditions are met.
The Safety Manual controls the validity of the data. A λ value is only valid if the equipment is installed, wired, maintained, and operated according to the Safety Manual. If diagnostics are not used, if limits are exceeded, or if maintenance intervals differ from expectations, the certified failure rates no longer describe the real system.
Automated Valve Assemblies and How Their Failure Rates Combine
Automated valves used as final elements are not single devices—they are assemblies made of several components, each with their own failure behavior. A functional safety engineer must gather λ values for each sub‑component and understand how they combine to represent the full final element.
Typical valve‑assembly components include the valve body, actuator, solenoid, positioner, and any boosters or air relays. Each component contributes its own λDU, λDD, λSU, and λSD. Because the SIF fails if any one of these components cannot perform its intended action, the failure rates are combined using Boolean OR logic:
λ_total ≈ λ₁ + λ₂ + λ₃ + …
In practice, most λDU comes from mechanical components such as the actuator and valve body. These parts typically lack meaningful diagnostics, so λDU remains the dominant contributor for the final element. Electronic components—like smart positioners—may reduce λDD by improving diagnostics, but they seldom reduce λDU in a significant way.
Some manufacturers have begun certifying complete valve assemblies under IEC 61508. When available, this simplifies the engineer’s task: the assembly‑level λ values are already validated and consolidated under a single device boundary. See this example from Emerson: https://www.emerson.com/en-us/automation/valves/controlvalves/digital-isolation-solutions. We at SIL Safe expect and hope this trend continues.
Practical Examples
Sensor Example: How the Engineer Obtains λ Values
A functional safety engineer begins by locating the λDU, λDD, λSU, and λSD values published in the device’s SIL certificate or Safety Manual. These documents reflect the certification body’s IEC 61508 assessment and define how the device behaves under expected diagnostic, installation, and environmental conditions. The engineer then confirms that the plant’s SIS logic and wiring actually use the diagnostic features assumed in the certification. Once these steps are complete, the engineer has the correct λ values that will be applied in later PFDavg or STR calculations.
Final Element Example: How the Engineer Obtains λ Values
For an automated valve assembly, the process is more involved because a final element is made of multiple components that must all function correctly. The engineer identifies each sub-component—such as the valve body, actuator, solenoid, positioner, and boosters—and retrieves λ values from each component’s SIL certificate or Safety Manual. The installation and diagnostic assumptions must match the application for the values to be valid. Because a failure in any single sub-component prevents the valve from performing its safety function, the engineer combines the λ values using OR logic to produce the total assembly failure rate. This assembled λ dataset will be used in downstream PFDavg and STR calculations.
Common Mistakes Engineers Make with Failure Rates
Even experienced engineers can misapply failure‑rate data if the context behind the numbers is not fully understood. A few issues show up repeatedly in real SIS design and verification work.
Misinterpreting λDU vs. λDD. These two values behave very differently. λDU always goes into PFDavg because it represents failures that diagnostics cannot find. λDD may or may not impact STR or PFDavg depending on how diagnostics are integrated. Treating DD like DU—or assuming DD never matters—produces incorrect verification results.
Using generic values without validating assumptions. Generic data tables, old spreadsheets, or handbook values can be misleading if the assumptions behind them do not match your application. Certified values come with defined conditions; generic values usually do not.
Ignoring diagnostics. Sometimes diagnostics exist on the device but do not make it into the SIS logic or maintenance workflow. If a diagnostic bit is unwired, unmapped, filtered out, or simply ignored in operations, detected dangerous failures behave like undetected failures. In this case, λDD effectively becomes λDU.
Treating λ values as universal constants. A λ value from a certificate is not automatically valid everywhere. Installation, environment, cycling, mounting, and maintenance determine whether the published λ truly reflects your plant’s conditions. Failure rates must be applied with engineering judgment, not copied blindly.
When Diagnostics Exist but Are Not Used
Diagnostics only add value when the SIS actually acts on them. A device may have excellent internal diagnostics, but if they are not used by the controls, the failure behaves as if it were undetected. In this situation, λDD is effectively added to λDU for purposes of SIL verification because the SIF remains impaired until someone actively responds. It could really impact PFDavg.
This scenario is more common in brownfield facilities, older installations, poorly integrated SIS/BPCS architectures, or sites where diagnostics alarm but no work process exists to ensure timely repair. The lesson for the engineer is simple: diagnostics only help if the entire chain—from device to logic to maintenance—uses them correctly.
Where Failure Rates Influence SIS Design Decisions
Failure‑rate data influences several real‑world engineering choices throughout the SIS life‑cycle. Understanding how λ values behave helps the engineer select architectures, manage proof‑test strategies, and apply diagnostics intentionally—not blindly.
Architecture selection. If λDU is high, additional redundancy may be required to achieve the target SIL. Failure‑rate data helps determine whether 1oo1, 1oo2, or 2oo3 architectures are appropriate for the SIF.
Choosing the proof‑test interval (TI). Proof tests exist to reveal λDU—the part diagnostics cannot see. A higher λDU or lower proof‑test coverage (Cpt) typically requires a shorter TI. Failure‑rate data directly shapes the proof‑test strategy. See this other article about CPT.
Partial‑stroke testing for final elements. For valves that dominate λDU, partial‑stroke testing may reduce the exposure time of dangerous failures. This decision depends on understanding which failure modes are found by diagnostics versus proof tests.
Diagnostic selection and integration. λDD and λSD only help if diagnostics are wired, mapped, and acted on. Understanding the diagnostic coverage of a device and the assumptions in the Safety Manual helps engineers design logic and maintenance workflows that truly reduce risk.
Summary: A Practical Way to Think About Failure Rates
Failure rates are the foundation of how we model and manage random hardware failures in functional safety. λDU represents the portion of failures that silently erode the ability of a SIF to act when needed. λDD and λSD describe failures that diagnostics can reveal, informing how often a SIF may trip unnecessarily and how reliably it stays available. λSU supports reliability but does not influence risk directly.
These four failure‑rate categories show up throughout the IEC 61511 safety life‑cycle: in equipment selection, architectural decisions, proof‑test strategy, diagnostic design, and SIL verification. If the failure‑rate assumptions in the SIL certificate and Safety Manual are respected—and if diagnostics are used correctly—then λ values become powerful tools for designing and maintaining a dependable SIS.
More Help
For more help applying failure‑rate data correctly—or for third‑party SIS verification—reach out through SIL Safe’s contact page.
- Explore the SIL Safe glossary for clear explanations of related terms.
- See ISA’s main guidelines
- Fully certified automated valves
- Deeper blog article on proof testing
Q&A Section
- What’s the practical difference between λDU and λDD?
λDU drives PFDavg because the failure is both dangerous and undetected. λDD is dangerous but detected, so it typically results in an alarm or forced-safe trip and may contribute to STR depending on configuration. - Do failure rates change over time in my facility?
Yes. The published λ values assume controlled conditions; however, real‑world factors like environment, cycling, installation quality, and maintenance can shift the true failure rate up or down. However, we assume constant for modeling purposes. - Why do certified products help for accurate λ values?
Certified devices provide validated λ values and clearly defined DU/DD/SU/SD splits under IEC 61508. This reduces interpretation errors and ensures the assumptions behind the numbers are understood and controlled. - Is it okay to use generic failure rates?
Only if you confirm that they match your device and application. Generic values may not reflect your environment, proof-test strategy, or diagnostic coverage. - What if the device doesn’t have a SIL certificate?
You can still use it, but you must rely on credible manufacturer reliability data, validated site history, or reputable sources such as OREDA. These represent the alternate data routes when a certified data path is not available. Assumptions must match the application, and justification must be documented. - Why do final elements almost always have higher λDU than sensors?
Most λDU in a final element comes from mechanical components like the actuator and valve body, which lack strong diagnostics. Sensors generally have better diagnostic coverage and fewer mechanical wear points, so their λDU values are typically much lower. - I understand why λDU impacts PFDavg, but why would λDD impact PFDavg?
It depends on how the controls and diagnostics are set up. In many low-demand configurations, only λDU enters PFDavg. But if a dangerous detected failure leaves the SIF unable to perform its function—and the system does not act on or repair that diagnostic—then that portion of λDD effectively behaves like λDU and may need to be included in the PFDavg analysis. - How are λ values used differently in high-demand or continuous modes?
Failure rate is highly relevant but used differently. In low-demand mode (most common), we use λDU to calculate PFDavg. In continuous or high-demand modes, PFDavg is not used. Instead, SIL is based on PFH — the rate of dangerous failure per hour. In these cases, λDU is used to calculate PFH through a different series of equations than PFDavg.
