cars: Zero Defect Initiative - Could be Better

This initiative, at Daimler-Chrysler, is seeking to eliminate defects in software/hardware systems. It is an excellent goal, and many of the methods they espouse in the article and in a somewhat related article Focus on: Specification (Daimler-Chrysler Hightech Report 2/2006) are about increasing communications and documentation, to achieve co-ordination for apparently disparate applications. The approach, in both cases is to follow the lead of IT projects, which are the most complex sorts of projects in existence, and therefore necessarily use the leading edge in project management. It should be noted that IT projects are widely renowned for having a high failure rate. This is a result of the very high complexity of IT projects, and what has driven the development of project management techniques. Managing complexity is very difficult and expensive. In the short term, applying strong project management methodologies and discipline to problems with irreducible complexity, as expressed in the reports are likely the most effective answer.

Looking at a longer term, however, ideally one wants to reduce complexity, so that there is less of it to manage. How does one make complexity reducible? Electronics and software technology has used the same method for decades: Standardization. Standardization is mentioned as one component among many in the drive towards reliability, but the importance of this activity does not seem to be fully appreciated. It is viewed as a cost-cutter, but the importance of it as a simplifying factor in specifications and after market diagnostics is not described.

Engineers are facing ever more complex features, and the number of tests to perform is going up as a geometric function of the number of components to interconnect:
Despite simplifications in the assumptions we made, we arrived at 10180 potential test conditions for a single vehicle model. If you wanted to examine all these as a simulation, you’d have to book several decades of computing time on a supercomputer.

. . .

... Standardization instead of ad-hoc solutions.

Using many computers with simple standard interfaces, instead of a few large ones with very complex behaviours, reduces the complexity of interactions, and simplifies testing methodologies. This is a bit analogous to using a multi-stage cross-bar switch instead of a single stage switch reduces the of connections ...

Individual computers control a small number of sensors, say one computer per corner of the automobile, with lights, tire rotational speed, tire pressure, brake hydraulic pressure, RADAR sensor data, video data, etc... performs basic analysis and data reduction, and then feeds standardized data towards management computers, which deal with no raw sensor data, only pre-reduced data. The corner computers have multiple redundant sensors (two or three per function) so that sensor failures are easy to detect and correct. The computers can run diagnostics on the components for which they are responsible, and report summaries back to the management computer. The management computer is communicated with via tcp/ip protocol, so that checksumming and guarantees about data integrity are improved. The management computer can interrogate and ask for self-checks from the corner computers. Again, redundant processing will mean extensive self-diagnostics built in.

Total wiring is reduced because rather than individually wiring individual subsystems, power becomes a bus that distributes to each corner, and the distribution within each corner is handled by the computers there. Reducing the total wiring reduces the number of connections to test and troubleshoot.

While such an architecture is a large change at first, once in place, very little ongoing testing is required to maintain it, and the system itself is very modular and independent of changes to individual components.

Standardization is mentioned as one element to assist in reducing defect, but instead of applying
the lessons of standardization and upward compatibility to their components to address the rapid product cycle in the electronics industry, the choice is to force the electronics groups to slow down:
In Wolfsried’s estimation, however, the benefit of such rapid development - smaller components or more power for the same size - are totally outweighed by the risks involved for dependability. Ultimately, each time the model changes, the part must be tested and approved with regard to its ability to withstand vibration and fluctuations in temperature, for example - and that’s an elaborate and costly process. “So in the future we’re going to work with only those semiconductor makers who will guarantee not only the necessary standards of reliability for the parts but also the availability of certain elements throughout the lifecycle of a vehicle.”
The problem with this approach is that that is solving the wrong problem. The five year development cycle and intensive testing requirement is the problem to address, rather than the short life cycle of the electronic components. In the computer industry, short product life cycles are normal, and asking suppliers to commit to longer product life cycles is just forcing up costs. The industry has been using custom equipment which was the norm in embedded systems because the field was so immature. In the computing industry, there have basically been two multi-vendor "standard" hardware communications interfaces used for the past 20 years: formerly RS-232/RS-422 serial, and now ethernet with RJ-45 connectors, and USB.

That is a level of hardware standardization. Clearly, physical connectors from commodity computing cannot be used without modification for the harsh environment of automobiles, but hardware durability is really the only thing that is special about this environment. There is a great deal of common ground between MIL-SPEC type robustness requirements, and the automotive environment. Testing by automotive groups could concentrate on physical reliability (loose connections, corrosion, dirt, etc...)

If conectors are standardized, and computer architecture coalesces on PC-style arrangement (Intel/AMD based systems running Linux ) and in computer applications, one takes modern hardware, install modern systems software, and then can run the old software on this improved, more robust base. This is the principle behind upward compatibility.

The communications standards and other software infrastructure such as operating systems, can be taken directly from the commodity industries. Application of many more, simpler computers will simplify testing. A typically automobile would have many gumstix style computers, communicating among each other using TCP/IP over ethernet, and using a standard operating system such as linux.

With such an environment, replacing a processor from 2005 with a processor from 2010, would mean running the software appropriate for the 2005 model on the new software platform. The testing of the component is only the hardware durability, since the connectors and software will be identical. Vendors would not have to stock the five year old computer, since components from this year's model would be able to replace the Original equipment. This can only happen if the automotive industry uses operating systems and drivers which are standardized and abstracted away from the automotive applications itself. (the software to operate the vehicle's sensors and components, and the software to operate the computer which runs the application, need to be separable, so that one can change the computer without modifying the vehicle.)

So the first defect is to under-appreciate of the benefits of standardization, and the perception that one must force electronics onto a slower rate of advancement rather than capitalizing on upward compatibility to add features at ever reduced costs in a judo-like taking of the energy of a trend and channeling it for one's own advantage.

Another weakness is the near total emphasis on pre-market testing. Aftermarket diagnostics and followup are also important. Right now, computer information is typically read by the mechanic, and then reset and erased prior to returning the client. Manufacturer should consider this information as a goldmine of reliability information. Getting information about how components actually fail, coupled with GPS and weather data would improve data gathering on use of systems. Have on-board systems accumulate logs, and have maintenance personnel able to retrieve them, like odometers (say a write-once medium for diagnostics.) Even better would be to include the work done on the vehicle (parts replacements, etc...) to provide a complete log to subsequent owners as well as the constructor. Such information could be used to improve diagnostics of common problems in older cars and thereby improve the ownership experience for owners of older vehicles.

Maintaining vehicles past a million kilometers is a point of pride of the Mercedes brand. Making diagnostics and repairs cheaper and faster is something that can only be achieved by studying the failures of these vehicles in the field. There are many good reasons to do this: running vehicles for more kilometers improves the manufacturer's reputation, and feeds back into the quality process in future years, potentially reducing some of the need for long term pre-market testing. Such work also reduces the replacement rate of automobiles, which is good for the planet.

So, to increase quality over the longer term, one should look at reducing complexity though focus on use of components which are standard for their industry, with standard interfaces so that the individual components can evolve compatibly over time. One should learn about failure modes of aging components by maintaining birth to death logs of data on vehicles.


Post a Comment

Subscribe to Post Comments [Atom]

<< Home