Castigated Engineer Sees the Light

For several years, we had supplied a machine-tool builder with interface cards to link early BBC and PC computers to a range of small but very capable lathes and milling machines. The main market was for schools and training establishments, and we had already sold a couple thousand sets of cards.

We were commissioned to design a "mark 2" version with higher cutting speed and greater flexibility. We had gone past the inevitable teething troubles and bugs with "mark 2," and already had several dozen machines working perfectly in the field when the customer told us about an export order for 250 machines. It seemed like Christmas, so we supplied all the PCBs as fast as we could and the customer began to build the machines.

The first part of the consignment was due to be shipped on a Monday, but on the Saturday before I received an urgent call from the customer -- they had been working overtime with all their mechanical engineers, but all the machines were failing their final machining test.

"We've checked everything else so it must be your PCBs and/or firmware," the customer said. When I got to the factory it was like being summoned to the headmaster's office. All the directors were there, and it was made clear to me that the whole of the £2 million (US$3 million) order was stopped, and it was entirely my fault.

I was shown a machine connected to a PC with a whole row of test pieces. The final test was to run a CNC program to turn a short spindle with various diameter steps and shoulders to prove dimensional accuracy, which had to be ±10 microns (0.00039 inches) using several different tools. Nearly all the test pieces had random errors at least 10 times bigger than the allowable tolerance.

Curiously, some would have one or two dimensions correct and the very next piece would have a different combination of good and bad dimensions. Furthermore, they had fitted clock gauges to check that the carriage returned to the "park" position, which it did with perfect reliability in both X and Z axes. It was totally baffling.

The most common cause of dimensional inaccuracy was either misalignment, which caused repeatable errors or stiff slides, causing the stepper motors to lose lock. If the latter happened, you could usually hear a "twink" noise or a shriek, and the slides wouldn't return to "park" accurately. Not only that, once one motor slipped, subsequent pieces would always be wrong until the machine was put through the "homing" procedure.

Over the next several hours I sat with the customer's technicians, running test after test, changing various parameters, and adding extra test routines in the software to try to find some kind of discrepancy. I tried everything I could think of, from strictly logical and well-reasoned tests to just plain hunches. The PC was sending out the correct stream of data, my interface card was working perfectly, the lathe carriages were moving correctly, but the random errors remained.

The mechanical designers and fitters were still adamant that it was my problem because they had checked everything else already and nothing else was different from the machines that worked perfectly.

Around 3 a.m. I was ready to give up. I sat yet again in front of the machine, put my elbows on the edge of the cabinet, and put my head in my hands. After a few seconds of utter black despair, silently praying for Scottie to beam me up, I opened my eyes and raised my head. Now, the lathe was a miniature version of a professional slant-bed turning center, and by chance my line of sight happened to coincide exactly with a perpendicular from the bed of the machine through the Dickson Quick-change tool-post.

I could see a glint of light down one of the V-grooves. If I had been a few millimeters to the left or right, up or down, I wouldn't have spotted it, and my misery would have continued.

Calm now, I casually asked for the clamp key and unlocked the tool-holder, lifted it off, and then refitted it. This time, when I tightened the clamp, the daylight appeared at the other V-groove. With a bit of jiggling I could even clamp it with a tiny bit of daylight down both V-grooves. I tried a different tool-holder with similar results. Bingo -- I had the answer. The angles on the V-grooves were fractionally wrong. Instead of the clamp drawing the Vs into perfect engagement on both sides, it would slip to one side or the other, thus offsetting the tool tip from its correct position.

I called over the managing director, who was almost apoplectic by now, and showed him what I had found. It transpired that they had purchased all the tool-posts and tool-holders from a new supplier in Eastern Europe instead of the previous UK supplier, hoping to save a few pounds on each machine. They had to return the whole lot and revert to the UK supplier. A classic example of "spoiling the ship for a ha'porth of tar," and risking a valuable contract by using untested components from an unknown supplier.

As usual, it was the electronic/software guy who solved the problem, leaving the mechanicals looking a bit foolish. Later machines had an eight-station tool turret so the manual Dickson tool-posts were phased out for all but the cheapest models. I still keep in touch with that customer, and more than 20 years later I have just supplied a batch of "mark 2" boards for spares and repairs for the nearly 10,000 machines that were subsequently made and are still running.

This entry was submitted by Rod Hine and edited by Jennifer Campbell.

Rod Hine graduated from Churchill College in Cambridge, England. He worked in satellite communications, meteorological telecoms, and then general automation, machine tools, and industrial control systems. He has also lectured in electronic engineering and cybernetics.

Great story. This is the kind of tale that illustrates how much chance enters into problem solving. I wonder what the odds are against all those conditions lining up perfectly just so the real problem could actually be perceived, let alone what Rod then figured out to solve it.

"The mechanical designers and fitters were still adamant that it was my problem because they had checked everything else already and nothing else was different from the machines that worked perfectly."

As a test engineer, whenever I was called to the line to try to figure out what was wrong with a test set I had built, I would always ask if the operator had run the "golden" units we used for calibration, to see if the test set was working properly and the data was accurate. The answer was often no, it never occurred to them when parts started failing that their process could have shifted - it MUST be something wrong with the test set!

The feeling of despair when a part you made might be causing a major problem in the field is a bad one. Proving your product is not at fault to the customeris not an easy task. Good job finding the fault under pressure.

In a crisis you can tell who is the most valuable employee- he's the one who isn't pointing fingers but has rolled up his sleeves and is going over schematics, checking out the equipment, and is not paying attention to the positioning going on in the rest of the group.

I admire the castigated engineer's diligence and sacrifice in getting to the site and doing whatever it took to get to the bottom of things. I don't think I would have prayed to Scotty, myself, as I have another God, but I guess Sotty sometimes answers prayers, too, it seems. :-)

Unfortunately, by not being in the political frey, it can sometimes cost one his job or advancement. Large companies can often lose site of the ones actually making things a success.

I was also impressed with the relationship that lasted 20 years. That is how real business is done!

Nancy, those "golden" parts have certainly been lifesavers for me on a few occasions. Several organizations do have a protocol in place to stop production whenever there are three failures in a row. That does make sense because if the process drifts why make bad parts, and if the tester has drifted then why ship parts of unknown quality. The operator would note the failed parametr and then run the golden parts while the production folks would check the line. Usually the problem was a process issue on the line.

The reality today is that even when you make a similar effort to solve a problem, you need to prepare to lose a contract or get fired, no matter if you show OTHER PEOPLE to be the cause of a failure or not!

That has been the reality for over 10 years now and my offspring is in the same type of situation where the Team Leader ( the position ) and the Team are blamed for the customer ignoring the requirements in a contract and getting predictable results. Even careful back- up notes doesn't change the management attitude.

The company is a major DR firm handling customer back up issues.

In my personal career, the last place I had a similar situation happen with a positive outcome was at Cray Research 25 years ago.

Now, the people behind the engineer sent to fix the problem get a threat and the Management types at the client company will FIRE the Engineer AFTER a problem gets fixed and report that that was the solution at the client company....I've run into that same situation 3 times in the last 25 years, so when I go out into the Real World to a customer site, I've already updated my resume.

Focus on Fundamentals consists of 45-minute on-line classes that cover a host of technologies. You learn without leaving the comfort of your desk. All classes are taught by subject-matter experts and all are archived. So if you can't attend live, attend at your convenience.