Rogue Units: Focus on cost containment

Oct. 1, 2001

Rogue Units

Focus on cost containment

By Thomas Carroll

October 2001

Thomas Carroll is the Manager of Reliability Engineering at US Airways at Pittsburgh International Airport, Pittsburgh, PA. During his 28 years in aircraft maintenance, he has worked the line, shop, maintenance control and component reliability engineering disciplines.

The industry is in trouble, and everybody is focused on cost containment. Grasping for the "low hanging fruit," a number of painful initiatives have been implemented with limited success. However, the lowest hanging, pain-free fruit has typically been overlooked — the rogue unit. Its impact is felt across the entire organization, both for the operator and the repair facility, and it’s huge.
Not only does it affect the direct operating cost, but it also generates a lot of unnecessary studies and initiatives, hamstrings the airline’s operational performance, and demoralizes the front line and repair facility troops.

What is it?
A rogue unit is like the proverbial "rogue elephant" in that it is an individual component gone awry. It repeatedly experiences short service periods, manifesting the same system fault every time, and whose replacement resolves the system malfunction. The problem is that when it is sent in for repair, the standard bench or overhaul testing can’t identify its unusual failure mode.

How does it happen?
Bench testing does not address 100 percent of what a component does, where it lives, or how it operates. It’s not intentional, it’s just a fact of life that the shop is not the same as the aircraft. Also, bench tests are designed to identify anticipated failures — checking things that are expected to fail. So a unit that fails in an unaddressed or unanticipated way will never be resolved — a rogue is born.
Any component can become rogue. We’ve even identified a rogue pilot’s seat (the actual aircraft seat itself – not the pilot’s seat!). So it’s important to remember that rogue units aren’t isolated to the "high tech" world.

"Natural selection"
It would be bad enough if rogue units just existed somewhere in the inventory. But there’s a phenomenon that catapults its effect throughout the entire airline’s operation. It’s like a Darwinian "natural selection" thing, just in reverse — survival of the worst, rather than the fittest — in that rogue components will displace serviceable spares.
The way that happens is like this: Let’s say there are four serviceable units in the spare pool. A unit in service fails in a way that starts its rogue career, so it goes to the shop and back to the pool. Now, there are three serviceable spares, and the rogue unit. Over time, there is a good chance another unit in service will develop a rogue failure. It will also make the trip to the shop and back to the spare pool. Now there will be two serviceable spares and two rogue units in stock.
As long as there are no problems with the parts in service, the rogue units will sleep peacefully in the spare pool. But, as soon as there is a problem in service, there’s a 50/50 chance that a spare will be pulled from the pool that has a problem of its own. As bad as that sounds, if the rogue units aren’t identified and resolved, the rogue population will continue to grow, displacing more spares. Think what would happen if all of them became rogue units!

The rogue unit effect
As the "Natural Selection" phenomenon continues, the negative effect will be felt throughout the airline operation: Operational and dispatch reliability, aircraft systems, maintenance effectiveness, airline/OEM engineering, maintenance support, training programs, the repair facility, spare levels, component "quarantine" programs, and other components as well. Here’s what usually happens when a rogue unit is installed in an effort to correct a system malfunction that only manifests itself during flight. The problem continues (since a defective part has been introduced), so the troubleshooting tree directs the replacement of other system parts in succession. Then, bizarre troubleshooting methods are employed as the troubleshooting tree runs out of options and wiring or plumbing is checked and shaken down. As the chronic problem persists, there is also an increase in MEL activity and delays or cancellations. Finally, when logic is thrown out the window, all of the parts are replaced again in the hopes of making the problem just go away. Mercifully and unexplainably, the problem eventually disappears. The rogue units make their trip to the repair facility and back to the spare pool, waiting to do it all over again. Is maintenance elated that they’ve defeated the chronic problem? Usually they’re just glad that it’s gone away, so there’s noreal victory to improve morale. In fact, the morale typically suffers even more as fingers are pointed and armchair quarterbacking takes place later. Even the repair facility feels frustrated by all the "no fault founds" that are flowing through the shop. Additionally, the problems generated by the rogue units will appear to be caused by other drivers — design flaws, human factors, training, Built-In Test programs, or a host of other issues, which generate a bunch of fruitless investigations and initiatives.

Rogue impact on MTBUR
While everybody wants to see costs go down, they’d all love to see component MTBUR (Mean Time Between Unscheduled Removals) go up. Of course, all the reliability modifications, training and other initiatives are working to improve the MTBUR, and are typically considered successful if there is an improvement of 10 percent to 20 percent year over year. An effective rogue unit identification and resolution program will do much more than that. As the chart (please see pg. 22) shows of a real-life component, it is possible (and quite probable) to see sustainable improvements of over 100 percent when the rogue units are targeted and resolved.

Then, there’s the direct cost
Chart shows the impact of rogue units on MTBUR.

Any time parts are replaced, it costs money. There is a whole infrastructure built within the maintenance organization to facilitate the movement and tracking of unserviceable units to the shop and refilling the holes on the spare shelf with serviceable ones. Additionally, any time maintenance is performed on the aircraft, there is the direct cost of the technician, along with a host of support equipment required to facilitate the repair and system checkout. Even the system checkout generates a cost, in that the system components are exercised, which shortens their useful lives in service.
So, when considering the typical life of a rogue unit: usually six more installations after becoming a rogue (before some sort of "accident" befalls it); the associated "no fault found" costs; the additional system parts replaced needlessly; their associated "no fault found" costs; and the hours of troubleshooting the system wiring and plumbing unnecessarily; the average cost for each rogue unit comes out to be around $50,000 to the maintenance division. And, this figure doesn’t even represent delays, cancellations and additional inventory impacts!

Is there hope?
In a word, yes. However, it requires the development of a couple of key elements in order to have a good identification and resolution program. The most important element is the tracking of each component by serial number. This would include date on/off, aircraft number, position, reason for removal, time since installed/overhauled, etc. Without this critical information, there is no hope in controlling rogue units.
The next element is a process that identifies the potential rogues as they develop, such as three consecutive installations that are less than 1,000 hours each. Then an assessment must be made to sort out the real rogues from the ones that just appear to be rogue. Over the years, our confirmation rate has hovered around 66 percent, which means the other 34 percent were victims of circumstance rather than bona fide rogue units. Without such a screening process, how would the shop know which 3 units to ignore out of the 10 that appear to be rogue?
Finally, open communication needs to be established between the maintenance operation and the repair facility. Rogue units require new and unique testing to identify and resolve the oddball failure, so the shop needs to understand how and where the unit operates in service in order to mimic those conditions.

Benefit for all
Why go through all this? It’s obvious that there is a big payoff for the operator in cost avoidance. There’s also great incentive for the repair facility to focus on rogue resolution. The OEM’s name is all over their parts, and their reputation is a direct reflection of the component’s performance. Other repair facilities, whether in-house or out, are also resting their reputation on the reliability of their product. Without a doubt, rogue units will cause irreparable damage to a reputation.
Additionally, rogues will not go away. When one odd failure is resolved, another will develop over time. If constant surveillance isn’t maintained, then the rogue population will continue to develop, grow and wreak havoc on an even grander scale.

The bottom line
In this world of cost containment, the lowest hanging, ripest fruit to be picked is the rogue unit. Not only because of the direct costs incurred, but their control also facilitates the resolution of design or human factors issues. Nowadays, the paycheck is no longer an option to improve morale, so rogue control would be the next best thing. Line maintenance will no longer be frustrated with their inability to fix things logically and expediently, and the repair facility will take pride in that they resolved a rogue unit.
All in all, when looking to control costs — before cutting service or personnel, or pushing for concessions or productivity increases — focus on the rogue unit. The return on investment for the simple surveillance and resolution processes is incredible.