When M. Elizabeth H. Hammond, MD, talks about the newly released guidelines to enhance the accuracy of HER2 testing for patients with breast cancer, she talks about technical aspects like validation, proficiency testing, and accreditation. But perhaps her most telling comment is this: “Patients are counting on us.” Every pathologist looking through a microscope at tissue on a slide will know what Dr. Hammond is talking about.

Evaluating a tissue section for HER2 positivity is more like doing a frozen section than a special stain, says Dr. Hammond, who is professor and former chair of pathology at LDS Hospital of Intermountain Health Care and professor of pathology at the University of Utah. “A single observation leads to a decision,” she points out. “The choice of a treatment can be based on one reading on one slide, so that reading had better be good.” A positive HER2 test is required to qualify a breast cancer patient for treatment with the anti-HER2 monoclonal antibody trastuzumab (Herceptin, Genentech), which improves survival in patients with HER2-positive tumors but is expensive and can be cardiotoxic (Tan-Chiu E, et al. J Clin Oncol. 2005;23:7811–7819).

Unfortunately, the accuracy of HER2 testing often falls short: It has been estimated that up to 25 percent of such tests are incorrect. To improve this situation, the CAP and the American Society of Clinical Oncology, or ASCO, collaborated on a set of guidelines for HER2 testing that were posted this month on the Web sites of the Journal of Clinical Oncology and Archives of Pathology & Laboratory Medicine and that will appear in print simultaneously in the two journals in January 2007. Overall, the guidelines contain two important messages for pathologists, says Dr. Hammond, who was a co-chair of the panel that drafted the document.

First, she says, “The definitions of positive, equivocal, and negative test results have been refined,” both for immunohistochemistry (IHC) and fluorescence in situ hybridization (FISH). Second, “The way in which quality assurance for HER2 testing will be assessed has been strengthened and made much more specific.” At a symposium at CAP ’06 in September, Dr. Hammond spelled out these changes in detail.

At a separate session at CAP ’06, Antonio C. Wolff, MD, associate professor in the Breast Cancer Program of the Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins in Baltimore, also discussed the guidelines. In an interview with CAP TODAY, he reprised his comments. “The main thrust behind the joint guidelines project was the realization that, at a time when cancer treatment has become so complex, and we are trying to individualize patient treatment, multidisciplinary collaboration among professional societies is critical, in addition to the usual collaboration among individual physicians in daily clinical practice,” said Dr. Wolff, who was a co-chair of the guidelines panel.

Behind the guideline effort, Dr. Wolff says, was “the realization by all of us that for one specific treatment—trastuzumab—we were relying on a single measurement that was being used as the sole determinant for its selection.” Even before trastuzumab was approved in 1998, he says, oncologists and pathologists realized that “accurate testing for HER2 is essential to avoid exposing patients with false-positive results to a costly and potentially toxic placebo while avoiding denying potential life-saving therapy to patients who have a false-negative result.” Many laboratories are testing for HER2 and using various assays to do so. “Until now physicians have not paid sufficient attention to the need for standardization and reproducibility and, most important, accuracy,” Dr. Wolff continues. “The whole process of validation of assays and proficiency testing has not been attended to.” Ultimately these concerns drove the guidelines project.

(A less-detailed document that Dr. Hammond calls “harmonious” with the ASCO/CAP guideline was produced by a panel organized by the National Comprehensive Cancer Network; Drs. Hammond and Wolff sat on that panel also [Carlson RW, et al. J Natl Compr Canc Netw. 2006; 4(suppl 3): S1–S22].)

“These guidelines are a big step forward,” says Allen M. Gown, MD, medical director and chief pathologist at PhenoPath Laboratories, Seattle, who was not on the panel. They are unique in two ways, he says. First, pathologists formulated them in partnership with oncologists. “These kinds of tests require close collaboration between pathologists and oncologists,” Dr. Gown says. “We have known this for many years, but the guidelines put it in a formalized way.”

Second, Dr. Gown says, “One of the biggest factors that affects HER2 testing is preanalytical variables. Probably the most critical is fixation time. The guidelines require six to 24 hours of fixation. This is a historic first for pathologists.” In no other situation are pathologists required to observe a pre-specified fixation time. “It will have a salutary effect on other kinds of testing as well,” Dr. Gown predicts, such as estrogen receptor, or ER, testing. There are good indications that similar error rates are occurring in ER testing in the U.S., he says, adding that both ER and HER2 testing are affected by inadequate fixation and that six hours is the minimum fixation time needed for ER testing as well.

Speaking at the CAP ’06 session, Dr. Hammond noted that underlying the guidelines was the success of trastuzumab therapy. “Trastuzumab is a great example of targeted therapy,” she said. “It is based on a test that selects patients most likely to benefit. In this way, it foreshadowed other cancer drugs like imatinib.” Initially, trastuzumab was approved for treating patients with metastatic breast cancer, both as second-line monotherapy after failure of standard therapy, and as first-line therapy with paclitaxel (Slamon DJ, et al. N Engl J Med. 2001;344:783–792). More recently, results from adjuvant breast cancer trials demonstrated the benefit of trastuzumab for early-stage breast cancer patients (Romond EH, et al. N Engl J Med. 2005;353:1673–1684).

Dr. Wolff noted there have been five clinical trials that have had positive outcomes evaluating trastuzumab in an adjuvant setting in addition to local therapy. “We can and are using it in the adjuvant setting in addition to local therapy with surgery and possibly radiation in patients with larger tumors or node-positive, early-stage HER2-positive breast cancer,” he told CAP TODAY. He finds that in most cases this use is being reimbursed. Genentech announced Nov. 16 that the FDA had approved this application.

Thus, the promise of anti-HER2 therapy: “Trastuzumab decreases the relative risk of recurrence by about 50 percent when added to adjuvant cytotoxic chemotherapy in patients with HER2-positive breast cancer,” Dr. Wolff concludes. However, if this promise is to be realized, testing has to be accurate. Emphasizing the importance of the pathologist’s work, Dr. Wolff says the sequence of evaluating breast cancer patients has reversed somewhat. “We now first consider predictive markers—ER/PR and HER2—to determine whether the patient could benefit from specific therapies. Then we decide whether to administer those treatments based on traditional features, such as nodal status, tumor size, and tumor grade.” ASCO and the National Comprehensive Cancer Network already recommend HER2 testing for all newly diagnosed breast cancers. Expansion of trastuzumab therapy makes accurate HER2 testing even more important.

Unfortunately, Dr. Hammond said, the clinical trials that showed the benefit of trastuzumab in early breast cancer also showed significant variation in results for both IHC and FISH in community laboratories, with many false-positive results (Perez EA, et al. J Clin Oncol. 2006;24:3032–3038; Reddy JC, et al. Clin Breast Cancer. 2006;7:153–157). “A problem is caused by having many testing methods for both gene amplification and cell membrane protein,” Dr. Hammond said. Four FDA-approved methods are primarily used—IHC with Dako’s HercepTest or Ventana Pathway and FISH with Vysis’ PathVysion or Ventana Inform. These methods provide specific protocols and reagents for HER2 testing. There are countless antibodies available for IHC and some probes for FISH that are not FDA approved and not configured in reagent sets to allow easy standardized testing. Therefore, she noted, “there is lots of variation in how we pathologists are doing these tests.”

One of the reasons pathologists have had difficulties with these tests, Dr. Hammond told CAP TODAY, is that many have not recognized how different they are from usual IHC testing used as a special stain, where non-FDA-approved antibody use is commonly acceptable. “Testing for HER2 is much more like doing a frozen section on a piece of tissue than like doing a special stain, because the result stands alone. We are not used to doing that,” Dr. Hammond says.

Lack of appreciation for the need for a higher level of accuracy in HER2 testing is reflected in the data from community laboratories: The average false-positive rate for HER2 protein overexpression by IHC is 18 percent (with a range of three percent to 50 percent); for HER2 gene amplification by FISH it is 13 percent (range, five percent to 23 percent), according to Dr. Hammond. Error rates are even higher in small-volume laboratories and laboratories not using an automated stainer or not using an FDA-approved kit. For instance, Perez and colleagues found a 25 percent false-positive rate for laboratories using an IHC method other than HercepTest. All of these patients are getting a costly drug—$40,000 to $100,000 per patient—that has potential cardiotoxicity with no chance of benefit. And the smaller fraction of patients with false-negative results are not getting a drug with a high probability of benefit.

Even among laboratories using an FDA-approved kit, about half say they vary the method from the approved version. “If you vary the FDA method,” Dr. Hammond pointed out, “it is no longer FDA-approved and becomes a test whose accuracy is the responsibility of the laboratory director. According to CLIA regulations, before you offer a test, including an FDA-approved test, you must validate its accurate performance in your laboratory. In addition, whenever you modify a test, you must prove that your modification has not altered its accuracy. Most pathologists who are varying FDA-approved tests have not done the necessary validation to show that is the case.”

Pathologists are varying from FDA-approved methods in two predominant ways, according to Dr. Hammond. First, many are not using formalin fixation, as specified in package inserts. “The problem of formalin fixation is the biggest issue,” she says. “There are a number of proprietary fixatives where we don’t actually know what is in them. If you use a fixative other than formalin, you must show that it performs the same as formalin-fixed tissue.” To show that their fixative works, some laboratories are automatically saving a small sample of each breast tumor in formalin and sending the rest through their normal process. Another significant problem is the emergence of new automated instruments for rapid tissue processing that rely on proprietary fixatives and rapid processing times. “Companies claim that performance on their instruments is the same as formalin-based tissue processing with conventional processing times,” Dr. Hammond says. “But the lab director is responsible for proving that.” In addition, some laboratories are using antigen retrieval that is more aggressive than, or significantly different in methodology from, that specified in the FDA-approved protocol. Such modifications must also be validated to show equivalence with the FDA-approved method.

Validation data documenting initial validation and validation of any modifications must be kept for inspection. Another option is to send out HER2 tests. “I think some pathologists will decide that the aggravation associated with validating these tests is too high,” Dr. Hammond told CAP TODAY.

Validation can be done using tissue microarrays. Dr. Hammond and her colleagues created an array with tissue from 200 cases of breast cancer. “We know the amplification status of those samples and we know their staining pattern with a validated FDA-approved method,” she says. “We could use that array to validate a new method of HER2 testing in our lab if we chose to do so.”

Persistent problems in HER2 testing demonstrated that existing quality assurance systems were not adequate. Laboratory accreditation standards for HER2 testing were primarily educational (though generic requirements for IHC and FISH have been in place), and proficiency testing has been voluntary—only 25 percent to 30 percent of laboratories performing HER2 testing participate. “Our QA procedures are not solving the problem,” Dr. Hammond says.

But they could. Longitudinal data from proficiency testing show that participating laboratories achieve consistent improvement with time. “The number of people achieving satisfactory concordance increased every year of the survey,” Dr. Hammond said. “So we believe if proficiency testing is mandatory, there will be marked improvement.”

Which brings us to the specific elements of the new guideline. There are three categories: an algorithm is provided for choosing and performing a test; QA elements are specified and monitored; and laboratories and pathologists will be evaluated continually with an upgraded, mandated accreditation process that includes the possibility of suspension. (These elements are listed in tables in the online document. Many elements, including reporting requirements, are also in the summary of a 2002 Strategic Science Symposium on this topic [Zarbo RJ, Hammond ME. Arch Pathol Lab Med. 2002;127:549–553].)

In the algorithm, either IHC or FISH assays are acceptable, as long as they are validated FDA-approved kits done according to the approved protocol or modified methods validated against FDA-approved methods in the laboratory doing them. Each type of test has three categories of results: positive, equivocal, or negative. For IHC, “positive” is defined as 3+ expression; for FISH, it is a HER2 gene/CEP17 probe ratio of greater than 2.2. “Equivocal” means IHC 2+ or FISH ratio of 1.8–2.2. “Negative” is defined as IHC 0 or 1+ or FISH ratio <1.8. (For FISH systems without an internal control probe, the respective values are HER2 gene copy number >6, 4–6, and <4 signals/ nucleus.) For an IHC equivocal result, the specimen must be retested with a validated assay for gene amplification. For a FISH equivocal result, the laboratory must either count additional cells or retest with a validated IHC assay.

One caveat to these definitions is that interpretation of IHC 3+ requires continuous dark membrane staining in at least 30 percent of cells, rather than the initial FDA level of 10 percent (the FDA has accepted this revision). Another caveat: The scheme assumes that a laboratory has established a 95 percent concordance rate between FISH and IHC for both positive and negative categories. “You must validate your method in your lab,” Dr. Hammond emphasized, even if you are performing the exact FDA-approved protocol with an approved kit. If the laboratory cannot achieve 95 percent concordance for any category, that category must be interpreted as “equivocal” and sent to FISH. (Concordance rates are shown in the guideline.)

A third caveat is that genomic heterogeneity on FISH still needs to be sorted out. “Not all tumor cells in a sample may show the same level of amplification of the HER2 gene,” Dr. Hammond explains. “When you do the counting, you need to have specific rules to make sure ratios come out the same for every observer.” How do you choose areas to count? Do you choose only areas that appear amplified? Or random areas? Do you pick “hot spots”? “The College has agreed to hold a consensus conference to try to iron out that issue,” Dr. Hammond says.

That there is an equivocal category for FISH is based on extensive clinical evidence. Still, the specific range called equivocal in the guideline raises a problem. In the FDA’s approval of trastuzumab, treatment was approved for ratios of 2.0 or above, yet the guideline includes the 2.0–2.2 region as “equivocal.” This puts clinicians in a difficult position. “We specifically said in the guideline that the package insert stands,” Dr. Hammond says. “The data we have about low ratios is very poor,” she continues. “We want to draw attention to the fact that there are still gaps in the evidence.”

Dr. Hammond is actually concerned that there is a bigger category of uncertainty for FISH testing than is reflected in the guideline, based on the coefficient of variation of FISH at lower numbers. “We might end up with an equivocal range from 1.8 to 3.0,” she speculates. “We have asked clinicians who have data from clinical trials to go back and look at FISH ratios and outcomes to see if the equivocal category should be larger than we specified. We don’t know if they will follow through.” Uncertainty exists also about the outcomes of patients with discordant IHC/FISH results.

A number of exclusion criteria are specified in the guideline. For IHC, for instance, an assay would be rejected if controls are not as expected or if there is not strong membrane staining of internal ducts in the tumor specimen. For FISH, an assay would be rejected if controls are not as expected, if the observer can’t find and count at least two areas of invasive tumor, or if more than 25 percent of signals are unscorable due to weak staining.

In the quality assurance section, standard criteria seek to specify and control elements that cause variation. For instance, the guideline mandates formalin fixation for six to 48 hours. Time to fixation of tissue and duration of fixation should be recorded if available. “Pathologists don’t typically do this now,” Dr. Hammond noted.

There will be two mandatory proficiency testing events per year, consisting of 20-case tissue microarrays for FISH and 80-case arrays for IHC. Satisfactory performance will require =90 percent correct responses on graded challenges. Unsatisfactory performance will result in suspension of HER2 testing accreditation if the testing problem is not corrected. Asked from the floor about the source of the CAP’s authority to suspend laboratories’ HER2 testing accreditation, Dr. Hammond said it derives from the College’s deemed status with the Centers for Medicare and Medicaid Services. “At the panel meetings, FDA agreed,” she added.

These measures will be introduced gradually. “It will take some time to train inspectors and make sure that labs being inspected know what they will be required to do,” Dr. Hammond told CAP TODAY. She advised interested laboratorians to watch the CAP Web site for information.
Speaking at the CAP ’06 session on how to improve one’s ability to interpret IHC HER2 tests, Kenneth J. Bloom, MD, described the underlying biology of HER2 protein. It is made in the cytoplasm and transported to the membrane, so cytoplasmic staining is not an artifact, said Dr. Bloom, who is chief medical officer and medical director of Clarient Inc., Aliso Viejo, Calif. At the membrane in normal breast epithelial cells, HER2 aggregates in clusters of about a thousand receptors, which are located predominantly in the basolateral aspect of the cells. As HER2 is upregulated, these aggregates become larger and eventually coalesce, giving the appearance of linear membrane expression (Nagy P, et al. J Cell Sci. 1999;112:1733–1741). In IHC, it is these membrane aggregates that are scored. Intensity and distribution of HER2 immunostain correlates with the number of receptors on the cell surface; thus, staining is a surrogate for protein distribution. A positive IHC result requires circumferential staining. Dr. Bloom cautioned that, if cells are cut tangentially, it can produce what looks like circumferential staining. To guard against this, he advised, “Look for the nucleolus.”

In the clinical trials that led to approval of trastuzumab, measurement of HER2 overexpression was demonstrated to predict a higher response rate. An IHC assay was used to qualify patients, with scoring criteria that Dr. Bloom said were “arbitrarily developed” but based on preclinical models. The IHC results obtained with this scoring system correlated with clinical efficacy. Relative risk reduction of progression was 0.42 for IHC 3+ specimens and 0.82 for IHC 2+ samples when compared against similar patients who did not receive trastuzumab. When the same tissue specimens were scored retrospectively with FISH, relative risk reduction was 0.44 for FISH-positive samples and 0.66 for those that were FISH negative. Dr. Bloom called results with the two assays “virtually identical” and said “either test is adequate for qualifying these cases if done well.” Correlations with response rates were even more striking when trastuzumab was used as single-agent, first-line therapy: 35 percent and 34 percent for IHC 3+ and FISH-positive specimens, respectively, but zero percent and seven percent for IHC 2+ and FISH-negative specimens (Vogel CL, et al. J Clin Oncol. 2002;20:719–726).

To take advantage of these correlations, each laboratory needs to calibrate the FDA-approved scoring system on specimens with known amplification status, Dr. Bloom emphasized, because accuracy requires placing the cutoffs between 1+/2+ and between 2+/3+ correctly, or at least consistently. “That’s why we must use the FDA approach,” he said. “Different staining conditions may produce different relative curves.” For instance, it has been shown that different antibodies will likely lead to different staining patterns (Press M, et al. Canc Res. 1994;54:2771–2777).

To interpret the HercepTest assay accurately, Dr. Bloom advised first reviewing the H&E slide to make sure preanalytic conditions are met. Choose a block with well-preserved invasive tumor and surrounding benign tissue. For this purpose, Dr. Bloom said, “Give me an interface block.” Review positive and negative controls. Are 0, 1+, and 3+ cell lines stained as expected? Are benign lobular ductal units negative or weakly 1+? (The exception is apocrine metaplasia.) If the slide appears brown, the score will most likely be 3+ or 1+. The lower score is more likely if nuclear grade is low and there are well-formed glands.

Next, look for chickenwire appearance under 10x to verify complete membrane staining. “Look for uniformity,” Dr. Bloom said. “Strong staining on 10 percent of cells is too low for 3+. You need a strong similar staining pattern throughout the tumor.” The combination of chickenwire appearance and uniform staining predicts gene amplification with 98 percent accuracy, while chickenwire appearance without uniformity has an accuracy of only 68 percent.

Commenting on the exclusion criteria, Dr. Bloom noted that the time limit for fixation means no more “Friday cases”—cases cut on Friday and left over the weekend in fixative. He also pointed out that needle core biopsy cases can be “particularly problematic,” because all three IHC artifacts—edge, crush, and retraction—are especially prominent in needle core biopsies.

“IHC and FISH are both excellent tests, but they are complementary and do not replace each other,” Dr. Bloom said. For HER2 evaluation, accurate IHC is critical, he said, and added, “Think of it as a frozen section rather than a standard IHC test.”

Dr. Gown spoke with CAP TODAY about IHC testing for HER2 from the viewpoint of a reference laboratory that does a lot of both IHC and FISH on samples from outside institutions that have different fixation protocols. “Even a six- to 24-hour window means some people fix for eight hours and some for 23 hours and there will be big differences,” Dr. Gown says. “And there will still be differences in what people call standard neutral buffered formalin.”

To accommodate these differences, Dr. Gown began advocating in 1999 a normalization procedure in which he scores the staining on invasive tissue, then subtracts staining on normal breast epithelium, if it is present. “What we have found,” Dr. Gown says, “is extremely high concordance between IHC and FISH when we do this normalization.” He and his colleagues published comparative data on 100 cases from one institution in 1999 (Jacobs TW, et al. J Clin Oncol. 1999;17:1983–1987). At the San Antonio Breast Cancer Symposium earlier this month, they presented similar data on almost 7,000 cases from more than 100 institutions across the United States. “Using a non-normalized scoring system, 32 percent of IHC 3+ cases were FISH-non-amplified, whereas using the normalized scoring system, only six percent of IHC 3+ cases proved to be FISH-non-amplified,” they reported. Almost all specimens in this latter subgroup had HER2/CEP17 ratios <3.

Though this normalization procedure works in a single reference laboratory, Dr. Hammond says, its use is not advocated in the new guideline. “If a laboratory chooses to try normalization,” she says, “it must document that results of this process are 95 percent concordant with routine validated methods performed on formalin-fixed tissue and interpreted without normalization.”

Practical ideas for performing FISH measurement of HER2 amplification were presented by David G. Hicks, MD, vice chair in the Department of Pathology and Laboratory Medicine and director of anatomic pathology at the Roswell Park Cancer Institute. “HER2 testing begins at the time the tissue is removed from the patient,” Dr. Hicks said. At Roswell Park, they do an immediate gross assessment and rapid fixation. “It took some doing to get there,” Dr. Hicks said. He noted that the same processing factors optimize FISH as IHC. “All levels of standardization apply equally to FISH testing as well as to IHC,” he said. He has published technical considerations of doing FISH (Hicks DG, Tubbs RR. Hum Pathol. 2005;36:250–261).

To score a sample with FISH, first confirm that probe signals are present in at least 75 percent of tumor cell nuclei, Dr. Hicks said, to ensure adequate hybridization and enzymatic digestion of tissue. Count 20 to 30 randomly selected nuclei from two areas of invasive tumor. “Only invasive tumor cell nuclei are evaluated for FISH,” he stressed. Cells must be non-overlapping with good nuclear borders. Calculate the average number of HER2 genes per tumor cell or the ratio of HER2 genes to CEP17 probes. (Use of the cutoffs in the guideline leads to almost identical classification of tumors with single- or dual-color FISH assays.) A pathologist doesn’t need to do the counting, but a pathologist must confirm that cells used are in a region of invasive cancer.

FISH requires a dark-field microscope, under which landmarks that pathologists depend on are less clearly visualized, Dr. Hicks acknowledged. However, he said, “With practice, topographic and architectural features of tissue sections viewed by fluorescence microscopy will become familiar. In other words, we don’t have to be afraid of the dark.” He believes there is a potential role for image analysis in FISH as well.

Experience with HER2 testing has shown that “quantitative assessment of assay results, especially for prognosis and predicting therapeutic response, will be increasingly important,” Dr. Hicks said. In this capacity, he thinks FISH has the edge because it is quantitative. However, he conceded, image analysis decreases subjectivity with IHC and improves reproducibility and concordance with FISH results. Another lesson is that standardization of tissue handling is critical for biomarker detection.

Dr. Hicks foresees at sometime in the future the possibility that a panel of FISH biomarkers will be used in evaluating the newly diagnosed breast cancer patient. Data presented at the San Antonio symposium last year showed that tumors from many HER2-positive patients also demonstrate co-amplification of the C-MYC oncogene. These patients appear to have a more aggressive clinical course to their disease and to gain the greatest benefit from cytotoxic chemotherapy in combination with trastuzumab. In another report from the symposium, co-amplification of the HER2 and TOP2A genes also appears to be predictive for those patients who will have a good response to trastuzumab in combination with anthracyclines.

FISH assays can be multiplexed and can include up to four to five different color probes on one tissue section, he noted.

The CAP and ASCO realize that HER2 is “just the beginning,” Dr. Wolff told CAP TODAY. “We will have an increasing number of predictive assays used as the sole determinant of therapy selection in the clinic. It is our responsibility that they be done in the most accurate way possible.” Ultimately, he says, the HER2 project can serve as a template for how these new assays should be brought into clinical practice. If the tests are to be effective, oncologists and pathologists will have to collaborate in developing standard methods and quality assurance tools. The collaboration between ASCO and CAP is potentially just starting.

The most immediate application of the HER2 guideline process may be to immunohistochemistry assays for estrogen and progesterone receptors. “These assays have now been available for 15 years or so,” Dr. Wolff says. “Concern about their accuracy remains a major issue. It is interesting that it took the determination of the clinical approval of trastuzumab and the use of such a costly and potentially toxic drug to trigger an effort that should have happened in the early 1990s. We may go back to ER/PR before we go forward with newer assays.”