Patent Search

 
 

Breast cancer signatures

Abstrict

The invention relates to the identification and use of gene expression profiles, or patterns, suitable for identification of breast cancer patient populations with different survival outcomes. The gene expression profiles may be embodied in nucleic acid expression, protein expression, or other expression formats, and may be used in the study and/or determination of the prognosis of a patient, including breast cancer survival.

Claims

8. A method to determine the prognosis or clinical course and aggressiveness of breast cancer of a subject comprising assaying for the expression level(s) of one or more genes in Table 2, 3, 4, 6, 7, 8, or 9 from a breast cancer cell sample from the subject.

9. The method of claim 8 wherein said assaying comprises preparing RNA, optionally labeled, from said sample and optionally converting said RNA into cDNA, optionally labeled.

10. The method of claim 9 wherein said RNA is not labeled and used for quantitative PCR.

11. The method of claim 9 wherein said assaying comprises using an array.

12. The method of claim 8 wherein said sample is a ductal lavage or fine needle aspiration or FFPE breast tissue sample.

13. The method of claim 12 wherein said sample is microdissected to isolate one or more cells that are breast cancer cells or suspected of being breast cancer cells.

14. The method of claim 10 wherein genes from Table 4 are used and further comprising determination of the ratio of the expression of an underexpressed gene to the expression of an overexpressed gene as an indicator of prognosis or clinical course and aggressiveness of breast cancer in said subject.

15. A method of determining prognosis of a subject having breast cancer, said method comprising: assaying for the expression level(s) of one or more genes in Table 2, 3, 4, 6, 7, 8, or 9 from a breast cancer cell sample from said subject.

16. The method of claim 15 wherein said assaying comprises preparing RNA, optionally labeled, from said sample and optionally converting said RNA into cDNA, optionally labeled.

17. The method of claim 16 wherein said RNA is not labeled and used for quantitative PCR.

18. The method of claim 15 wherein said assaying comprises using an array.

19. The method of claim 15 wherein said sample is a ductal lavage or fine needle aspiration or FFPE breast tissue sample.

20. The method of claim 19 wherein said sample is microdissected to isolate one or more cells that are breast cancer cells or suspected of being breast cancer cells.

21. The method of claim 17 wherein genes from Table 4 are used and further comprising determination of the ratio of the expression of an underexpressed gene to the expression of an overexpressed gene as an indicator of prognosis in said subject.

22. A method to determine the survival outcome of a breast cancer afflicted subject comprising assaying a sample of breast cancer cells of said subject for the expression level(s) of one or more genes listed in Table 2, 3, 4, 6, 7, 8, or 9.

23. The method of claim 22 wherein said assaying comprises preparing RNA, optionally labeled, from said sample and optionally converting said RNA into cDNA, optionally labeled.

24. The method of claim 23 wherein said RNA is not labeled and used for quantitative PCR.

25. The method of claim 22 wherein said assaying comprises using an array.

26. The method of claim 22 wherein said sample is a ductal lavage or fine needle aspiration or FFPE breast tissue sample.

27. The method of claim 26 wherein said sample is microdissected to isolate one or more cells that are breast cancer cells or suspected of being breast cancer cells.

28. The method of claim 24 wherein genes from Table 4 are used and further comprising determination of the ratio of the expression of an underexpressed gene to the expression of an overexpressed gene as an indicator of prognosis in said subject.

Description

RELATED APPLICATIONS

[0001] This application claims benefit of priority from U.S. Provisional Patent application No. 60/453,006, filed Mar. 7, 2003, which is hereby incorporated by reference in its entirety as if fully set forth.

FIELD OF THE INVENTION

[0002] The invention relates to the identification and use of gene expression profiles, or patterns; with clinical relevance to breast cancer. In particular, the invention provides the identities of genes that are correlated with breast cancer recurrence, cancer metastasis, and patient survival. The gene expression profiles, whether embodied in nucleic acid expression, protein expression, or other expression formats, may be used to predict breast cancer recurrence and survival of subjects afflicted with breast cancer. The profiles may also be used in the study and/or diagnosis of breast cancer cells and tissue as well as for the study and/or determination of prognosis of a patient. When used for diagnosis or prognosis, the profiles are used to determine the treatment of breast cancer based upon the likelihood of recurrence, metastases, and life expectancy.

BACKGROUND OF THE INVENTION

[0003] Breast cancer is by far the most common cancer among women. Each year, more than 180,000 and 1 million women in the U.S. and worldwide, respectively, are diagnosed with breast cancer. Breast cancer is the leading cause of death for women between ages 50-55, and is the most common non-preventable malignancy in women in the Western Hemisphere. An estimated 2,167,000 women in the United States are currently living with the disease (National Cancer Institute, Surveillance Epidemiology and End Results (NCI SEER) program, Cancer Statistics Review (CSR), www-seer.ims.nci.nih.gov/Publications/CSR1973 (1998)). Based on cancer rates from 1995 through 1997, a report from the National Cancer Institute (NCI) estimates that about 1 in 8 women in the United States (approximately 12.8 percent) will develop breast cancer during her lifetime (NCI's Surveillance, Epidemiology, and End Results Program (SEER) publication SEER Cancer Statistics Review 1973-1997). Breast cancer is the second most common form of cancer, after skin cancer, among women in the United States. An estimated 250,100 new cases of breast cancer are expected to be diagnosed in the United States in 2001. Of these, 192,200 new cases of more advanced (invasive) breast cancer are expected to occur among women (an increase of 5% over last year), 46,400 new cases of early stage (in situ) breast cancer are expected to occur among women (up 9% from last year), and about 1,500 new cases of breast cancer are expected to be diagnosed in men (Cancer Facts & Figures 2001 American Cancer Society). An estimated 40,600 deaths (40,300 women, 400 men) from breast cancer are expected in 2001. Breast cancer ranks second only to lung cancer among causes of cancer deaths in women. Nearly 86% of women who are diagnosed with breast cancer are likely to still be alive five years later, though 24% of them will die of breast cancer after 10 years, and nearly half (47%) will die of breast cancer after 20 years.

[0004] Every woman is at risk for breast cancer. Over 70 percent of breast cancers occur in women who have no identifiable risk factors other than age (U.S. General Accounting Office. Breast Cancer, 1971-1991: Prevention, Treatment and Research. GAO/PEMD-92-12; 1991). Only 5 to 10% of breast cancers are linked to a family history of breast cancer (Henderson IC, Breast Cancer. In: Murphy G P, Lawrence W L, Lenhard R E (eds). Clinical Oncology. Atlanta, Ga.: American Cancer Society; 1995:198-219).

[0005] Each breast has 15 to 20 sections called lobes. Within each lobe are many smaller lobules. Lobules end in dozens of tiny bulbs that can produce milk. The lobes, lobules, and bulbs are all linked by thin tubes called ducts. These ducts lead to the nipple in the center of a dark area of skin called the areola. Fat surrounds the lobules and ducts. There are no muscles in the breast, but muscles lie under each breast and cover the ribs. Each breast also contains blood vessels and lymph vessels. The lymph vessels carry colorless fluid called lymph, and lead to the lymph nodes. Clusters of lymph nodes are found near the breast in the axilla (under the arm), above the collarbone, and in the chest.

[0006] Breast tumors can be either benign or malignant. Benign tumors are not cancerous, they do not spread to other parts of the body, and are not a threat to life. They can usually be removed, and in most cases, do not come back. Malignant tumors are cancerous, and can invade and damage nearby tissues and organs. Malignant tumor cells may metastasize, entering the bloodstream or lymphatic system. When breast cancer cells metastasize outside the breast, they are often found in the lymph nodes under the arm (axillary lymph nodes). If the cancer has reached these nodes, it means that cancer cells may have spread to other lymph nodes or other organs, such as bones, liver, or lungs.

[0007] Major and intensive research has been focussed on early detection, treatment and prevention. This has included an emphasis on determining the presence of precancerous or cancerous ductal epithelial cells. These cells are analyzed, for example, for cell morphology, for protein markers, for nucleic acid markers, for chromosomal abnormalities, for biochemical markers, and for other characteristic changes that would signal the presence of cancerous or precancerous cells. This has led to various molecular alterations that have been reported in breast cancer, few of which have been well characterized in human clinical breast specimens. Molecular alterations include presence/absence of estrogen and progesterone steroid receptors, HER-2 expression/amplification (Mark H F, et al. HER-2/neu gene amplification in stages I-IV breast cancer detected by fluorescent in situ hybridization. Genet Med; 1(3):98-103 1999), Ki-67 (an antigen that is present in all stages of the cell cycle except G0 and used as a marker for tumor cell proliferation, and prognostic markers (including oncogenes, tumor suppressor genes, and angiogenesis markers) like p53, p27, Cathepsin D, pS2, multi-drug resistance (MDR) gene, and CD31.

[0008] van't Veer et al. (Nature 415:530-536, 2002) describe gene expression profiling of clinical outcome in breast cancer. They identified genes expressed in breast cancer tumors, the expression levels of which correlated either with patients afflicted with distant metastases within 5 years or with patients that remained metastasis-free after at least 5 years.

[0009] Ramaswamy et al. (Nature Genetics 33:49-54, 2003) describe the identification of a molecular signature of metastasis in primary solid tumors. The genes of the signature were identified based on gene expression profiles of 12 metastatic adenocarcinoma nodules of diverse origin (lung, breast, prostate, colorectal, uterus) compared to expression profiles of 64 primary adenocarcinomas representing the same spectrum of tumor types from different individuals. A 128 gene set was identified.

[0010] Both of the above described approaches, however, utilize heterogeneous populations of cells found in a tumor sample to obtain information on gene expression patterns. The use of such populations may result in the inclusion or exclusion of multiple genes that are differentially expressed in cancer cells. The gene expression patterns observed by the above described approaches may thus provide little confidence that the differences in gene expression are meaningfully associated with breast cancer recurrence or survival.

[0011] Citation of documents herein is not intended as an admission that any is pertinent prior art. All statements as to the date or representation as to the contents of documents is based on the information available to the applicant and does not constitute any admission as to the correctness of the dates or contents of the documents.

SUMMARY OF THE INVENTION

[0012] The present invention relates to the identification and use of gene expression patterns (or profiles or "signatures") which are clinically relevant to breast cancer. In particular, the identities of genes that are correlated with breast cancer recurrence, cancer metastasis, and patient survival are provided. The gene expression profiles, whether embodied in nucleic acid expression, protein expression, or other expression formats, may be used to predict breast cancer recurrence and survival of subjects afflicted with breast cancer.

[0013] The invention thus provides for the identification and use of gene expression patterns (or profiles or "signatures") which correlate with (and thus able to discriminate between) patients with good or poor survival outcomes. In one embodiment, the invention provides patterns that are able to distinguish patients with estrogen receptor (ER) positive breast tumors into those with poor survival outcomes, similar to that of patients with ER negative breast tumors, and those with a better survival outcome. These patterns are thus able to distinguish patients with ER positive breast tumors into at least two subtypes. Other patterns are capable of identifying subjects with ER negative tumors, and the survival outcomes associated therewith, as well as survival outcomes for some breast cancer subjects independent of the ER status of their tumors.

[0014] The invention also provides for the identification and use of gene expression patterns which correlate with the recurrence of breast cancer in the form of metastases. The patterns are able to distinguish patients with breast cancer into at least those with good or poor survival outcomes.

[0015] The present invention provides a non-subjective means for the identification of patients with breast cancer as likely to have a good or poor survival outcome by assaying for the expression patterns disclosed herein. Thus where subjective interpretation may have been previously used to determine the prognosis and/or treatment of breast cancer patients, the present invention provides objective gene expression patterns, which may used alone or in combination with subjective criteria to provide a more accurate assessment of breast cancer patient outcomes. The expression patterns of the invention thus provide a means to determine breast cancer prognosis. Furthermore, the expression patterns can also be used as a means to assay small, node negative tumors that are not readily assayed by other means.

[0016] The gene expression patterns comprise one or more than one gene capable of discriminating between breast cancer survival outcomes with significant accuracy. The gene(s) are identified as correlated with various breast cancer survival outcomes such that the levels of their expression are relevant to a determination of the survival, and thus preferred treatment protocols, of a breast cancer patient. Thus in one aspect, the invention provides a method to determine the survival outcome of a subject afflicted with, or suspected of having, breast cancer by assaying a cell containing sample from said subject for expression of one or more than one gene disclosed herein as correlated with breast cancer survival outcomes.

[0017] Gene expression patterns of the invention are identified as described below. Generally, a large sampling of gene expression profile of a sample is obtained through quantifying the expression levels of mRNA corresponding to many genes. This profile is then analyzed to identify genes, the expression of which are positively, or negatively, correlated, with breast cancer survival outcomes. An expression profile of a subset of human genes may then be identified by the methods of the present invention as correlated with a particular breast cancer survival outcome. The use of multiple samples increases the confidence which a gene may be believed to be correlated with a particular survival outcome. Without sufficient confidence, it remains unpredictable whether a particular gene is actually correlated with breast cancer survival outcomes and also unpredictable whether a particular gene may be successfully used to identify the survival outcome for a breast cancer patient.

[0018] A profile of genes that are highly correlated with one survival outcome relative to another may be used to assay an sample from a subject afflicted with, or suspected of having, breast cancer to predict the survival outcome of the subject from whom the sample was obtained. Such an assay may be used as part of a method to determine the therapeutic treatment for said subject based upon the breast cancer survival outcome identified.

[0019] The correlated genes may be used singly with significant accuracy or in combination to increase the ability to accurately discriminate between various stages and/or grades of breast cancer. The present invention thus provides means for correlating a molecular expression phenotype with breast cancer survival outcomes. This correlation is a way to molecularly provide for the determine survival outcomes as disclosed herein. Additional uses of the correlated gene(s) are in the classification of cells and tissues; determination of diagnosis and/or prognosis; and determination and/or alteration of therapy.

[0020] An assay of the invention may utilize a means related to the expression level of the sequences disclosed herein as long as the assay reflects, quantitatively or qualitatively, expression of the sequence. Preferably, however, a quantitative assay means is preferred. The ability to discriminate is conferred by the identification of expression of the individual genes as relevant and not by the form of the assay used to determine the actual level of expression. An assay may utilize any identifying feature of an identified individual gene as disclosed herein as long as the assay reflects, quantitatively or qualitatively, expression of the gene. Identifying features include, but are not limited to, unique nucleic acid sequences used to encode (DNA), or express (RNA), said gene or epitopes specific to, or activities of, a protein encoded by said gene. Alternative means include detection of nucleic acid amplification as indicative of increased expression levels and nucleic acid inactivation, deletion, or methylation, as indicative of decreased expression levels. Stated differently, the invention may be practiced by assaying one or more aspect of the DNA template(s) underlying the expression of the disclosed sequence(s), of the RNA used as an intermediate to express the sequence(s), or of the proteinaceous product expressed by the sequence(s), as well as proteolytic fragments of such products. As such, the detection of the presence of, amount of, stability of, or degradation (including rate) of, such DNA, RNA and proteinaceous molecules may be used in the practice of the invention. As such, all that is required is the identity of the gene(s) necessary to discriminate between breast cancer survival outcomes and an appropriate cell containing sample for use in an expression assay.

[0021] In one aspect, the invention provides for the identification of the gene expression patterns by analyzing global, or near global, gene expression from single cells or homogenous cell populations which have been dissected away from, or otherwise isolated or purified from, contaminating cells beyond that possible by a simple biopsy. Because the expression of numerous genes fluctuate between cells from different patients as well as between cells from the same patient sample, multiple data from expression of individual genes and gene expression patterns are used as reference data to generate models which in turn permit the identification of individual gene(s), the expression of which are most highly correlated with particular breast cancer survival outcomes.

[0022] In a further aspect, the gene sequence(s) capable of discriminating between breast cancer survival outcomes based on cell or tissue samples may be used to determine the likely outcome of a patient from whom the sample was obtained. Preferably, the sample is isolated via non-invasive means. The expression of said gene(s) in said sample may be determined and compared to the expression of said gene(s) in reference data of gene expression patterns as disclosed herein. Alternatively, the expression level may be compared to expression levels in normal or non-cancerous cells, such as, but not limited to, those from the same sample or subject. In embodiments of the invention utilizing quantitative PCR, the expression level may be compared to expression levels of reference genes in the same sample or a ratio of expression levels may be used. The invention provides for ratios of the expression level of a sequence that is underexpressed to the expression level of a sequence that is overexpressed as a indicator of survival outcome or cancer recurrence, including metastatic cancer. The use of a ratio can reduce comparisons with normal or non-cancerous cells.

[0023] One advantage provided by the present invention is that contaminating, non-breast cells (such as infiltrating lymphocytes or other immune system cells) are not present to possibly affect the genes identified or the subsequent analysis of gene expression to identify the survival outcomes of patients with breast cancer. Such contamination is present where a biopsy is used to generate gene expression profiles.

[0024] While the present invention has been described mainly in the context of human breast cancer, it may be practiced in the context of breast cancer of any animal known to be potentially afflicted by breast cancer. Preferred animals for the application of the present invention are mammals, particularly those important to agricultural applications (such as, but not limited to, cattle, sheep, horses, and other "farm animals") and for human companionship (such as, but not limited to, dogs and cats).

BRIEF DESCRIPTION OF THE DRAWINGS

[0025] FIG. 1 is a clinical outcome (overall survival) plot of two subtypes based on expression of 864 genes as listed in Tables 2 and 3.

[0026] FIG. 2 is a plot of a 297 gene signature (identities of the genes are presented in Table 5) which segregates the survival data of a patient population into "long" and "short" groups with significantly different overall survival curves. FIG. 2 also shows the comparison of this 297 gene set with that of a set of 17 genes correlated with matastasis described by Ramaswamy et al. (supra, see Table 1 therein).

[0027] FIG. 3 is a plot of clinical outcomes for four breast cancer subtypes provided by the instant invention.

DETAILED DESCRIPTION OF THE SPECIFIC EMBODIMENTS

[0028] Definitions of Terms as used Herein:

[0029] A gene expression "pattern" or "profile" or "signature" refers to the relative expression of a gene between two or more breast cancer survival outcomes which is correlated with being able to distinguish between said outcomes.

[0030] A "gene" is a polynucleotide that encodes a discrete product, whether RNA or proteinaceous in nature. It is appreciated that more than one polynucleotide may be capable of encoding a discrete product. The term includes alleles and polymorphisms of a gene that encodes the same product, or a functionally associated (including gain, loss, or modulation of function) analog thereof, based upon chromosomal location and ability to recombine during normal mitosis.

[0031] A "sequence" or "gene sequence" as used herein is a nucleic acid molecule or polynucleotide composed of a discrete order of nucleotide bases. The term includes the ordering of bases that encodes a discrete product (i.e. "coding region"), whether RNA or proteinaceous in nature, as well as the ordered bases that precede or follow a "coding region". Non-limiting examples of the latter include 5' and 3' untranslated regions of a gene. It is appreciated that more than one polynucleotide may be capable of encoding a discrete product. It is also appreciated that alleles and polymorphisms of the disclosed sequences may exist and may be used in the practice of the invention to identify the expression level(s) of the disclosed sequences or the allele or polymorphism. Identification of an allele or polymorphism depends in part upon chromosomal location and ability to recombine during mitosis.

[0032] The terms "correlate" or "correlation" or equivalents thereof refer to an association between expression of one or more genes in a breast cancer cell or tissue sample and the survival outcome of the subject from whom the sample was obtained. Genes expressed at higher levels and correlated with the survival outcomes disclosed herein are provided. The invention provides for the correlation between increases, as well as decreases, in expression of gene sequences and survival outcomes and cancer recurrence, including cancer metastases, in patients. Increases and decreases may be readily expressed in the form of a ratio between expression in a non-normal cell and a normal cell such that a ratio of one (1) indicates no difference while ratios of two (2) and one-half indicate twice as much, and half as much, expression in the non-normal cell versus the normal cell, respectively. Expression levels can be readily determined by quantitative methods as described below.

[0033] For example, increases in gene expression can be indicated by ratios of or about 1.1, of or about 1.2, of or about 1.3, of or about 1.4, of or about 1.5, of or about 1.6, of or about 1.7, of or about 1.8, of or about 1.9, of or about 2, of or about 2.5, of or about 3, of or about 3.5, of or about 4, of or about 4.5, of or about 5, of or about 5.5, of or about 6, of or about 6.5, of or about 7, of or about 7.5, of or about 8, of or about 8.5, of or about 9, of or about 9.5, of or about 10, of or about 15, of or about 20, of or about 30, of or about 40, of or about 50, of or about 60, of or about 70, of or about 80, of or about 90, of or about 100, of or about 150, of or about 200, of or about 300, of or about 400, of or about 500, of or about 600, of or about 700, of or about 800, of or about 900, or of or about 1000. A ratio of 2 is a 100% (or a two-fold) increase in expression. Decreases in gene expression can be indicated by ratios of or about 0.9, of or about 0.8, of or about 0.7, of or about 0.6, of or about 0.5, of or about 0.4, of or about 0.3, of or about 0.2, of or about 0.1, of or about 0.05, of or about 0.01, of or about 0.005, of or about 0.001, of or about 0.0005, of or about 0.0001, of or about 0.00005, of or about 0.00001, of or about 0.000005, or of or about 0.000001.

[0034] In some embodiments of the invention, such as those related to survival, cancer recurrence, or metastasis as possible outcome phenotypes, a ratio of the expression of a gene sequence expressed at increased levels in correlation with an outcome to the expression of a gene sequence expressed at decreased levels in correlation with the outcome may also be used as an indicator of the phenotype. As a non-limiting example, one cancer survival outcome may be correlated with increased expression of a gene sequence overexpressed in a sample of cancer cells as well as decreased expression of another gene sequence underexpressed in those cells. Therefore, a ratio of the expression levels of the underexpressed sequence to the expression levels of the overexpressed sequence may be used as an indicator or predictor of the ourcome.

[0035] A "polynucleotide" is a polymeric form of nucleotides of any length, either ribonucleotides or deoxyribonucleotides. This term refers only to the primary structure of the molecule. Thus, this term includes double- and single-stranded DNA and RNA. It also includes known types of modifications including labels known in the art, methylation, "caps", substitution of one or more of the naturally occurring nucleotides with an analog, and intemucleotide modifications such as uncharged linkages (e.g., phosphorothioates, phosphorodithioates, etc.), as well as unmodified forms of the polynucleotide.

[0036] The term "amplify" is used in the broad sense to mean creating an amplification product can be made enzymatically with DNA or RNA polymerases. "Amplification," as used herein, generally refers to the process of producing multiple copies of a desired sequence, particularly those of a sample. "Multiple copies" mean at least 2 copies. A "copy" does not necessarily mean perfect sequence complementarity or identity to the template sequence.

[0037] By corresponding is meant that a nucleic acid molecule shares a substantial amount of sequence identity with another nucleic acid molecule. Substantial amount means at least 95%, usually at least 98% and more usually at least 99%, and sequence identity is determined using the BLAST algorithm, as described in Altschul et al. (1990), J. Mol. Biol. 215:403-410 (using the published default setting, i.e. parameters w=4, t=17). Methods for amplifying mRNA are generally known in the art, and include reverse transcription PCR (RT-PCR) and those described in U.S. patent application Ser. No. 10/062,857 (filed on Oct. 25, 2001), as well as U.S. Provisional Patent Applications 60/298,847 (filed Jun. 15, 2001) and 60/257,801 (filed Dec. 22, 2000), all of which are hereby incorporated by reference in their entireties as if fully set forth. Another method which may be used is quantitative PCR (or Q-PCR). Alternatively, RNA may be directly labeled as the corresponding cDNA by methods known in the art.

[0038] A "microarray" is a linear or two-dimensional array of preferably discrete regions, each having a defined area, formed on the surface of a solid support such as, but not limited to, glass, plastic, or synthetic membrane. The density of the discrete regions on a microarray is determined by the total numbers of immobilized polynucleotides to be detected on the surface of a single solid phase support, preferably at least about 50/cm.sup.2, more preferably at least about 100/cm.sup.2, even more preferably at least about 500/cm.sup.2, but preferably below about 1,000/cm.sup.2. Preferably, the arrays contain less than about 500, about 1000, about 1500, about 2000, about 2500, or about 3000 immobilized polynucleotides in total. As used herein, a DNA microarray is an array of oligonucleotides or polynucleotides placed on a chip or other surfaces used to hybridize to amplified or cloned polynucleotides from a sample. Since the position of each particular group of primers in the array is known, the identities of a sample polynucleotides can be determined based on their binding to a particular position in the microarray.

[0039] Because the invention relies upon the identification of genes that are over- or under-expressed, one embodiment of the invention involves determining expression by hybridization of mRNA, or an amplified or cloned version thereof, of a sample cell to a polynucleotide that is unique to a particular gene sequence. Preferred polynucleotides of this type contain at least about 20, at least about 22, at least about 24, at least about 26, at least about 28, at least about 30, or at least about 32 consecutive basepairs of a gene sequence that is not found in other gene sequences. The term "about" as used in the previous sentence refers to an increase or decrease of 1 from the stated numerical value. Even more preferred are polynucleotides of at least or about 50, at least or about 100, at least about or 150, at least or about 200, at least or about 250, at least or about 300, at least or about 350, or at least or about 400 basepairs of a gene sequence that is not found in other gene sequences. The term "about" as used in the preceding sentence refers to an increase or decrease of 10% from the stated numerical value. Such polynucleotides may also be referred to as polynucleotide probes that are capable of hybridizing to sequences of the genes, or unique portions thereof, described herein. Preferably, the sequences are those of mRNA encoded by the genes, the corresponding cDNA to such mRNAs, and/or amplified versions of such sequences. In preferred embodiments of the invention, the polynucleotide probes are immobilized on an array, other devices, or in individual spots that localize the probes.

[0040] In another embodiment of the invention, all or part of a disclosed sequence may be amplified and detected by methods such as the polymerase chain reaction (PCR) and variations thereof, such as, but not limited to, quantitative PCR (Q-PCR), reverse transcription PCR (RT-PCR), and real-time PCR (including as a means of measuring the initial amounts of mRNA copies for each sequence in a sample), optionally real-time RT-PCR or real-time Q-PCR. Such methods would utilize one or two primers that are complementary to portions of a disclosed sequence, where the primers are used to prime nucleic acid synthesis. The newly synthesized nucleic acids are optionally labeled and may be detected directly or by hybridization to a polynucleotide of the invention. The newly synthesized nucleic acids may be contacted with polynucleotides (containing sequences) of the invention under conditions which allow for their hybridization. Additional methods to detect the expression of expressed nucleic acids include RNAse protection assays, including liquid phase hybridizations, and in situ hybridization of cells.

[0041] Alternatively, and in yet another embodiment of the invention, gene expression may be determined by analysis of expressed protein in a cell sample of interest by use of one or more antibodies specific for one or more epitopes of individual gene products (proteins), or proteolytic fragments thereof, in said cell sample or in a bodily fluid of a subject. The cell sample may be one of breast cancer epithelial cells enriched from the blood of a subject, such as by use of labeled antibodies against cell surface markers followed by fluorescence activated cell sorting (FACS). Such antibodies are preferably labeled to permit their easy detection after binding to the gene product. Detection methodologies suitable for use in the practice of the invention include, but are not limited to, immunohistochemistry of cell containing samples or tissue, enzyme linked immunosorbent assays (ELISAs) including antibody sandwich assays of cell containing tissues or blood samples, mass spectroscopy, and immuno-PCR.

[0042] The term "label" refers to a composition capable of producing a detectable signal indicative of the presence of the labeled molecule. Suitable labels include radioisotopes, nucleotide chromophores, enzymes, substrates, fluorescent molecules, chemiluminescent moieties, magnetic particles, bioluminescent moieties, and the like. As such, a label is any composition detectable by spectroscopic, photochemical, biochemical, immunochemical, electrical, optical or chemical means.

[0043] The term "support" refers to conventional supports such as beads, particles, dipsticks, fibers, filters, membranes and silane or silicate supports such as glass slides.

[0044] As used herein, a "breast tissue sample" or "breast cell sample" refers to a sample of breast tissue or fluid isolated from an individual suspected of being afflicted with, or at risk of developing, breast cancer. Such samples are primary isolates (in contrast to cultured cells) and may be collected by any non-invasive means, including, but not limited to, ductal lavage, fine needle aspiration, needle biopsy, the devices and methods described in U.S. Pat. No. 6,328,709, or any other suitable means recognized in the art. Alternatively, the "sample" may be collected by an invasive method, including, but not limited to, surgical biopsy. A sample of the invention may also be one that has been formalin fixed and paraffin embedded (FFPE) or freshly frozened.

[0045] "Expression" and "gene expression" include transcription and/or translation of nucleic acid material.

[0046] As used herein, the term "comprising" and its cognates are used in their inclusive sense; that is, equivalent to the term "including" and its corresponding cognates.

[0047] Conditions that "allow" an event to occur or conditions that are "suitable" for an event to occur, such as hybridization, strand extension, and the like, or "suitable" conditions are conditions that do not prevent such events from occurring. Thus, these conditions permit, enhance, facilitate, and/or are conducive to the event. Such conditions, known in the art and described herein, depend upon, for example, the nature of the nucleotide sequence, temperature, and buffer conditions. These conditions also depend on what event is desired, such as hybridization, cleavage, strand extension or transcription.

[0048] Sequence "mutation," as used herein, refers to any sequence alteration in the sequence of a gene disclosed herein interest in comparison to a reference sequence. A sequence mutation includes single nucleotide changes, or alterations of more than one nucleotide in a sequence, due to mechanisms such as substitution, deletion or insertion. Single nucleotide polymorphism (SNP) is also a sequence mutation as used herein. Because the present invention is based on the relative level of gene expression, mutations in non-coding regions of genes as disclosed herein may also be assayed in the practice of the invention.

[0049] "Detection" includes any means of detecting, including direct and indirect detection of gene expression and changes therein. For example, "detectably less" products may be observed directly or indirectly, and the term indicates any reduction (including the absence of detectable signal). Similarly, "detectably more" product means any increase, whether observed directly or indirectly.

[0050] Increases and decreases in expression of the disclosed sequences are defined in the following terms based upon percent or fold changes over expression in normal cells. Increases may be of 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 120, 140, 160, 180, or 200% relative to expression levels in normal cells. Alternatively, fold increases may be of 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, or 10 fold over expression levels in normal cells. Decreases may be of 10, 20, 30, 40, 50, 55, 60, 65, 70, 75, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 99 or 100% relative to expression levels in normal cells.

[0051] Unless defined otherwise all technical and scientific terms used herein have the same meaning as commonly understood to one of ordinary skill in the art to which this invention belongs.

[0052] Specific Embodiments

[0053] The present invention relates to the identification and use of gene expression patterns (or profiles or "signatures") which discriminate between (or are correlated with) breast cancer survival outcomes in a subject. Such patterns may be determined by the methods of the invention by use of a number of reference cell or tissue samples, such as those reviewed by a pathologist of ordinary skill in the pathology of breast cancer, which reflect breast cancer cells as opposed to normal or other non-cancerous cells. Because the overall gene expression profile differs from person to person, cancer to cancer, and cancer cell to cancer cell, correlations between certain cells and overexpressed genes may be made as disclosed herein to identify genes that are capable of discriminating between breast cancer survival outcomes.

[0054] The present invention may be practiced with any number of the genes believed, or likely to be, differentially expressed with respect to breast cancer survival outcomes. The identification may be made by using expression profiles of various homogenous breast cancer cell populations, which were isolated by microdissection, such as, but not limited to, laser capture microdissection (LCM) of 100-1000 cells. The expression level of each gene of the expression profile may be correlated with a particular survival outcome. Alternatively, the expression levels of multiple genes may be clustered to identify correlations with particular survival outcomes.

[0055] Genes with significant correlations to breast cancer survival outcomes may be used to generate models of gene expressions that would maximally discriminate between survival outcomes. Alternatively, genes with significant correlations may be used in combination with genes with lower correlations without significant loss of ability to discriminate between survival outcomes. Such models may be generated by any appropriate means recognized in the art, including, but not limited to, cluster analysis, supported vector machines, neural networks or other algorithm known in the art. The models are capable of predicting the classification of a unknown sample based upon the expression of the genes used for discrimination in the models. "Leave one out" cross-validation may be used to test the performance of various models and to help identify weights (genes) that are uninformative or detrimental to the predictive ability of the models. Cross-validation may also be used to identify genes that enhance the predictive ability of the models.

[0056] The gene(s) identified as correlated with particular breast cancer survival outcomes by the above models provide the ability to focus gene expression analysis to only those genes that contribute to the ability to identify a subject as likely to have a particular survival outcome relative to another. The expression of other genes in a breast cancer cell would be relatively unable to provide information concerning, and thus assist in the discrimination of, breast cancer survival outcome.

[0057] As will be appreciated by those skilled in the art, the models are highly useful with even a small set of reference gene expression data and can become increasingly accurate with the inclusion of more reference data although the incremental increase in accuracy will likely diminish with each additional datum. The preparation of additional reference gene expression data using genes identified and disclosed herein for discriminating between different survival outcomes in breast cancer is routine and may be readily performed by the skilled artisan to permit the generation of models as described above to predict the status of an unknown sample based upon the expression levels of those genes.

[0058] To determine the (increased or decreased) expression levels of genes in the practice of the present invention, any method known in the art may be utilized. In one preferred embodiment of the invention, expression based on detection of RNA which hybridizes to the genes identified and disclosed herein is used. This is readily performed by any RNA detection or amplification+detection method known or recognized as equivalent in the art such as, but not limited to, reverse transcription-PCR, the methods disclosed in U.S. patent application Ser. No. 10/062,857 (filed on Oct. 25, 2001) as well as U.S. Provisional Patent Applications No. 60/298,847 (filed Jun. 15, 2001) and 60/257,801 (filed Dec. 22, 2000), and methods to detect the presence, or absence, of RNA stabilizing or destabilizing sequences.

[0059] Alternatively, expression based on detection of DNA status may be used. Detection of the DNA of an identified gene as methylated or deleted may be used for genes that have decreased expression in correlation with survival outcomes. This may be readily performed by, PCR based methods known in the art, including, but not limited to, quantitative PCR (Q-PCR). Conversely, detection of the DNA of an identified gene as amplified may be used for genes that have increased expression in correlation with survival outcomes. This may be readily performed by PCR based, fluorescent in situ hybridization (FISH) and chromosome in situ hybridization (CISH) methods known in the art.

[0060] Expression based on detection of a presence, increase, or decrease in protein levels or activity may also be used. Detection may be performed by any immunohistochemistry (IHC) based, blood based (especially for secreted proteins), antibody (including autoantibodies against the protein) based, ex foliate cell (from the cancer) based, mass spectroscopy based, and image (including used of labeled ligand) based method known in the art and recognized as appropriate for the detection of the protein. Antibody and image based methods are additionally useful for the localization of tumors after determination of cancer by use of cells obtained by a non-invasive procedure (such as ductal lavage or fine needle aspiration), where the source of the cancerous cells is not known. A labeled antibody or ligand may be used to localize the carcinoma(s) within a patient.

[0061] A preferred embodiment using a nucleic acid based assay to determine expression is by immobilization of one or more sequences of the genes identified herein on a solid support, including, but not limited to, a solid substrate as an array or to beads or bead based technology as known in the art. Alternatively, solution based expression assays known in the art may also be used. The immobilized gene(s) may be in the form of polynucleotides that are unique or otherwise specific to the gene(s) such that the polynucleotide would be capable of hybridizing to a DNA or RNA corresponding to the gene(s). These polynucleotides may be the full length of the gene(s) or be short sequences of the genes (up to one nucleotide shorter than the full length sequence known in the art by deletion from the 5' or 3' end of the sequence) that are optionally minimally interrupted (such as by mismatches or inserted non-complementary basepairs) such that hybridization with a DNA or RNA corresponding to the gene(s) is not affected. Preferably, the polynucleotides used are from the 3' end of the gene, such as within about 350, about 300, about 250, about 200, about 150, about 100, or about 50 nucleotides from the polyadenylation signal or polyadenylation site of a gene or expressed sequence. Polynucleotides containing mutations relative to the sequences of the disclosed genes may also be used so long as the presence of the mutations still allows hybridization to produce a detectable signal.

[0062] Alternatively, amplification of such sequences from the 3' end of genes by methods such as quantitative PCR may be used to determine the expression levels of the sequences. The Ct values generated by such methods may be used as indicators of expression levels.

[0063] The immobilized gene(s) may be used to determine the state of nucleic acid samples prepared from sample breast cell(s) for which the survival outcome of the sample's subject (e.g. patient from whom the sample is obtained) is not known or for confirmation of an outcome that is already assigned to the sample's subject. Without limiting the invention, such a cell may be from a patient suspected of being afflicted with, or at risk of developing, breast cancer. The immobilized polynucleotide(s) need only be sufficient to specifically hybridize to the corresponding nucleic acid molecules derived from the sample. While even a single correlated gene sequence may to able to provide adequate accuracy in discriminating between two breast cancer survival outcomes, two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, ten or more, or eleven or more of the genes identified herein may be used as a subset capable of discriminating may be used in combination to increase the accuracy of the method. The invention specifically contemplates the selection of more than one, two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, ten or more, or eleven or more of the genes disclosed in the tables and figures herein for use as a subset in the identification of breast cancer survival outcome.

[0064] Of course 15 or more, 20 or more, 30 or more, 40 or more, 50 or more, 60 or more, 70 or more, 80 or more, 90 or more, 100 or more, 150 or more, 200 or more, 250 or more, 300 or more, 350 or more, 400 or more, 450 or more, 500 or more, 600 or more, 700 or more, 800 or more, 900 or more, 1000 or more, 1100 or more, 1200 or more, or all the genes provided in Tables 2, 3, and/or 4 below may be used. "CloneID" as used in the context of the Tables herein as well as the present invention refers to the IMAGE Consortium clone ID number of each gene, the sequences of which are hereby incorporated by reference in their entireties as they are available from the Consortium at image.llnl.gov/ as accessed on the filing date of the present application. "GeneID" as used in the context of the Tables herein as well as the present invention refers to the GenBank accession number of a sequence of each gene, the sequences of which are hereby incorporated by reference in their entireties as they are available from GenBank as accessed on the filing date of the present application.

[0065] P value refers to values assigned as described in the Example below. The indications of "E-xx" where "xx" is a two digit number refers to alternative notation for exponential figures where "E-xx" is "10.sup.-xx". Thus in combination with the numbers to the left of"E-xx", the value being represented is the numbers to the left times 10.sup.-xx. Chromosome Location refers to the human chromosome to which the gene has been assigned, and Description provides a brief identifier of what the gene encodes.

[0066] The invention may also be practiced with all or a portion of the gene sequences disclosed in Tables 6, 7, 8, and 9 herein. The gene sequences of each of these tables define one of four breast cancer subtypes based upon increased expression in correlation with particular survival outcomes as shown in FIG. 3. Therefore, the increased expression of sequences of 2 or more, 4 or more, 6 or more, 8 or more, 10 or more, 12 or more, 14 or more, 16 or more, 18 or more, 20 or more, 22 or more, 24 or more, 26 or more, 28 or more, 30 or more, 32 or more, 34 or more, 36 or more, 38 or more, 40 or more, 42 or more, 44 or more, 46 or more, 48 or more, or all 50 genes in each table can be used in the practice of the invention as indicators of a breast cancer survival outcome. Of course sequences of the 25 possible odd numbers of these genes may also be used.

[0067] Genes with a correlation identified by a p value below or about 0.02, below or about 0.01, below or about 0.005, below or about 0.001, below or about 1.times.10.sup.-4, below or about 1.times.10.sup.-5, below or about 1.times.10.sup.-6, below or about 1.times.10.sup.-7, below or about 1.times.10.sup.-8, below or about 1.times.10.sup.-9, below or about 1.times.10.sup.-10, below or about 1.times.10.sup.-11, below or about 1.times.10.sup.-12, below or about 1.times.10.sup.-13, below or about 1.times.10.sup.-14, below or about 1.times.10.sup.-15, below or about 1.times.10.sup.-16, below or about 1.times.10.sup.-17, below or about 1.times.10.sup.-18, below or about 1.times.10.sup.-19, or about 1.times.10.sup.-20 are preferred for use in the practice of the invention. The present invention includes the use of genes that identify different ER.alpha. (estrogen receptor alpha) positive subtypes and breast cancer recurrence/metastases together to permit simultaneous identification of breast cancer survival outcome of a patient based upon assaying a breast cancer sample from said patient.

[0068] In some embodiments of the invention, the genes used will not include HRAS-like suppressor (UNIGENE ID Hs.36761; CloneID 950667; GenBank accession # NM.sub.--020386; and GeneSymbol HRASLS) and/or origin recognition complex, subunit 6 (yeast homolog)-like, (UNIGENE ID Hs.49760; CloneID 306318; GenBank accession # NM.sub.--014321; and GeneSymbol ORC6L) as disclosed by van't Veer et al. (supra).

[0069] In embodiments where only one or a few genes are to be analyzed, the nucleic acid derived from the sample breast cancer cell(s) may be preferentially amplified by use of appropriate primers such that only the genes to be analyzed are amplified to reduce contaminating background signals from other genes expressed in the breast cell. Alternatively, and where multiple genes are to be analyzed or where very few cells (or one cell) is used, the nucleic acid from the sample may be globally amplified before hybridization to the immobilized polynucleotides. Of course RNA, or the cDNA counterpart thereof may be directly labeled and used, without amplification, by methods known in the art.

[0070] The invention is preferably practiced with unique sequences present within the gene sequences disclosed herein. The uniqueness of a disclosed gene sequence refers to the portions or entireties of the sequences which are found in each gene to the exclusion of other genes. Such unique sequences include those found at the 3' untranslated portion of the genes. Preferred unique sequences for the practice of the invention are those which contribute to the consensus sequences for each gene such that the unique sequences will be useful in detecting expression in a variety of individuals rather than being specific for a polymorphism present in some individuals. Alternatively, sequences unique to an individual or a subpopulation may be used. The preferred unique sequences are preferably of the lengths of polynucleotides of the invention as discussed herein.

[0071] In particularly preferred embodiments of the invention, polynucleotides having sequences present in the 3' untranslated and/or non-coding regions of the disclosed gene sequences are used to detect expression levels in breast cells. Such polynucleotides may optionally contain sequences found in the 3' portions of the coding regions of the disclosed sequences. Polynucleotides containing a combination of sequences from the coding and 3' non-coding regions preferably have the sequences arranged contiguously, with no intervening heterologous sequence(s).

[0072] Alternatively, the invention may be practiced with polynucleotides having sequences present in the 5' untranslated and/or non-coding regions of gene sequences in breast cells to detect their levels of expression. Such polynucleotides may optionally contain sequences found in the 5' portions of the coding regions. Polynucleotides containing a combination of sequences from the coding and 5' non-coding regions preferably have the sequences arranged contiguously, with no intervening heterologous sequence(s). The invention may also be practiced with sequences present in the coding regions of disclosed sequences.

[0073] Preferred polynucleotides contain sequences from 3' or 5' untranslated and/or non-coding regions of at least about 16, at least about 18, at least about 20, at least about 22, at least about 24, at least about 26, at least about 28, at least about 30, at least about 32, at least about 34, at least about 36, at least about 38, at least about 40, at least about 42, at least about 44, or at least about 46 consecutive nucleotides. The term "about" as used in the previous sentence refers to an increase or decrease of 1 from the stated numerical value. Even more preferred are polynucleotides containing sequences of at least or about 50, at least or about 100, at least about or 150, at least or about 200, at least or about 250, at least or about 300, at least or about 350, or at least or about 400 consecutive nucleotides. The term "about" as used in the preceding sentence refers to an increase or decrease of 10% from the stated numerical value.

[0074] Sequences from the 3' or 5' end of the above described coding regions as found in polynucleotides of the invention are of the same lengths as those described above, except that they would naturally be limited by the length of the coding region. The 3' end of a coding region may include sequences up to the 3' half of the coding region. Conversely, the 5' end of a coding region may include sequences up the 5' half of the coding region. Of course the above described sequences, or the coding regions and polynucleotides containing portions thereof, may be used in their entireties.

[0075] Polynucleotides combining the sequences from a 3' untranslated and/or non-coding region and the associated 3' end of the coding region are preferably at least or about 100, at least about or 150, at least or about 200, at least or about 250, at least or about 300, at least or about 350, or at least or about 400 consecutive nucleotides. Preferably, the polynucleotides used are from the 3' end of the gene, such as within about 350, about 300, about 250, about 200, about 150, about 100, or about 50 nucleotides from the polyadenylation signal or polyadenylation site of a gene or expressed sequence. Polynucleotides containing mutations relative to the sequences of the disclosed genes may also be used so long as the presence of the mutations still allows hybridization to produce a detectable signal.

[0076] In another embodiment of the invention, polynucleotides containing deletions of nucleotides from the 5' and/or 3' end of the above disclosed sequences may be used. The deletions are preferably of 1-5, 5-10, 10-15, 15-20, 20-25, 25-30, 30-35, 35-40, 40-45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-125, 125-150, 150-175, or 175-200 nucleotides from the 5' and/or 3' end, although the extent of the deletions would naturally be limited by the length of the disclosed sequences and the need to be able to use the polynucleotides for the detection of expression levels.

[0077] Other polynucleotides of the invention from the 3' end of the above disclosed sequences include those of primers and optional probes for quantitative PCR. Preferably, the primers and probes are those which amplify a region less than about 350, less than about 300, less than about 250, less than about 200, less than about 150, less than about 100, or less than about 50 nucleotides from the from the polyadenylation signal or polyadenylation site of a gene or expressed sequence.

[0078] In yet another embodiment of the invention, polynucleotides containing portions of the above disclosed sequences including the 3' end may be used in the practice of the invention. Such polynucleotides would contain at least or about 50, at least or about 100, at least about or 150, at least or about 200, at least or about 250, at least or about 300, at least or about 350, or at least or about 400 consecutive nucleotides from the 3' end of the disclosed sequences.

[0079] The above assay embodiments may be used in a number of different ways to identify or detect the breast cancer stage and/or grade, if any, of a breast cancer cell sample from a patient as well as the likely survival outcome of said patient. In many cases, this would reflect a secondary screen for the patient, who may have already undergone mammography or physical exam as a primary screen. If positive, the subsequent needle biopsy, ductal lavage, fine needle aspiration, or other analogous methods may provide the sample for use in the above assay embodiments. The present invention is particularly useful in combination with non-invasive protocols, such as ductal lavage or fine needle aspiration, to prepare a breast cell sample.

[0080] The present invention provides a more objective set of criteria, in the form of gene expression profiles of a discrete set of genes, to discriminate (or delineate) between breast cancer survival outcomes. In particularly preferred embodiments of the invention, the assays are used to discriminate between good and poor outcomes within 5, or about 5, years after surgical intervention to remove breast cancer tumors or within about 95 months after surgical intervention to remove breast cancer tumors. Comparisons that discriminate between outcomes after about 10, about 20, about 30, about 40, about 50, about 60, about 70, about 80, about 90, about 100, about 110, about 120, about 130, about 140, or about 150 months may also be performed.

[0081] While good and poor survival outcomes may be defined relatively in comparison to each other, a "good" outcome may be viewed as a better than 50% survival rate after about 60 months post surgical intervention to remove breast cancer tumor(s). A "good" outcome may also be a better than about 60%, about 70%, about 80% or about 90% survival rate after about 60 months post surgical intervention. A "poor" outcome may be viewed as an about 60% or less, or about 50% or less, survival rate after about 40 or about 50 or about 60 months post surgical intervention to remove breast cancer tumor(s). A "poor" outcome may also be about a 70% or less survival rate after about 40 months, or about a 80% or less survival rate after about 20 months, post surgical intervention.

[0082] In one embodiment of the invention, the isolation and analysis of a breast cancer cell sample may be performed as follows:

[0083] (1) Ductal lavage or other non-invasive procedure is performed on a patient to obtain a sample.

[0084] (2) Sample is prepared and coated onto a microscope slide. Note that ductal lavage results in clusters of cells that are cytologically examined as stated above.

[0085] (3) Pathologist or image analysis software scans the sample for the presence of non-normal and/or atypical cells.

[0086] (4) If non-normal and/or atypical cells are observed, those cells are harvested (e.g. by microdissection such as LCM).

[0087] (5) RNA is extracted from the harvested cells.

[0088] (6) RNA is purified, amplified, and labeled.

[0089] (7) Labeled nucleic acid is contacted with a microarray containing polynucleotides of the genes identified herein as correlated to discriminations between breast cancer survival outcomes under hybridization conditions, then processed and scanned to obtain a pattern of intensities of each spot (relative to a control for general gene expression in cells) which determine the level of expression of the gene(s) in the cells.

[0090] (8) The pattern of intensities is analyzed by comparison to the expression patterns of the genes in known samples of breast cancer cells correlated with survival outcomes (relative to the same control).

[0091] A specific example of the above method would be performing ductal lavage following a primary screen, observing and collecting non-normal and/or atypical cells for analysis. The comparison to known expression patterns, such as that made possible by a model generated by an algorithm (such as, but not limited to nearest neighbor type analysis, SVM, or neural networks) with reference gene expression data for the different breast cancer survival outcomes, identifies the cells as being correlated with subjects with good outcomes. Another example would be taking a breast tumor removed from a subject after surgical intervention, isolation and preparation of breast cancer cells for determination/identification of atypical, non-normal, or cancer cells, and isolation of said cells followed by steps 5 through 8 above.

[0092] Alternatively, the sample may permit the collection of both normal as well as cancer cells for analysis. The gene expression patterns for each of these two samples will be compared to each other as well as the model and the normal versus individual comparisons therein based upon the reference data set. This approach can be significantly more powerful that the cancer cells only approach because it utilizes significantly more information from the normal cells and the differences between normal and non-normal or atypical or cancer cells (in both the sample and reference data sets) to determine the likely survival outcome of the patient based on gene expression in the cancer cells from the sample.

[0093] With use of the present invention, skilled physicians may prescribe treatments based on prognosis determined via non-invasive samples that they would have prescribed for a patient which had previously received a diagnosis via a solid tissue biopsy.

[0094] The above discussion is also applicable where a palpable lesion is detected followed by fine needle aspiration or needle biopsy of cells from the breast. The cells are plated and reviewed by a pathologist or automated imaging system which selects cells for analysis as described above.

[0095] The present invention may also be used, however, with solid tissue biopsies. For example, a solid biopsy may be collected and prepared for visualization followed by determination of expression of one or more genes identified herein to determine the breast cancer survival outcome. One preferred means is by use of in situ hybridization with polynucleotide or protein identifying probe(s) for assaying expression of said gene(s).

[0096] In an alternative method, the solid tissue biopsy may be used to extract molecules followed by analysis for expression of one or more gene(s). This provides the possibility of leaving out the need for visualization and collection of only cancer cells or cells suspected of being cancerous. This method may of course be modified such that only cells that have been positively selected are collected and used to extract molecules for analysis. This would require visualization and selection as an prerequisite to gene expression analysis.

[0097] In a further modification of the above, both normal cells and cancer cells are collected and used to extract molecules for analysis of gene expression. The approach, benefits and results are as described above using non-invasive sampling.

[0098] The genes identified herein may be used to generate a model capable of predicting the breast cancer survival outcomes via an unknown breast cell sample based on the expression of the identified genes in the sample. Such a model may be generated by any of the algorithms described herein or otherwise known in the art as well as those recognized as equivalent in the art using gene(s) (and subsets thereof) disclosed herein for the identification of breast cancer outcomes. The model provides a means for comparing expression profiles of gene(s) of the subset from the sample against the profiles of reference data used to build the model. The model can compare the sample profile against each of the reference profiles or against model defining delineations made based upon the reference profiles. Additionally, relative values from the sample profile may be used in comparison with the model or reference profiles.

[0099] In a preferred embodiment of the invention, breast cell samples identified as normal and cancerous from the same subject may be analyzed for their expression profiles of the genes used to generate the model. This provides an advantageous means of identifying survival outcomes based on relative differences from the expression profile of the normal sample. These differences can then be used in comparison to differences between normal and individual cancerous reference data which was also used to generate the model.

[0100] The detection of gene expression from the samples may be by use of a single microarray able to assay gene expression from some or all genes disclosed herein for convenience and accuracy.

[0101] Other uses of the present invention include providing the ability to identify breast cancer cell samples as correlated with particular breast cancer survival outcomes for further research or study. This provides a particular advantage in many contexts requiring the identification of cells based on objective genetic or molecular criteria.

[0102] The materials for use in the methods of the present invention are ideally suited for preparation of kits produced in accordance with well known procedures. The invention thus provides kits comprising agents for the detection of expression of the disclosed genes for identifying breast cancer survival outcomes. Such kits optionally comprising the agent with an identifying description or label or instructions relating to their use in the methods of the present invention, is provided. Such a kit may comprise containers, each with one or more of the various reagents (typically in concentrated form) utilized in the methods, including, for example, pre-fabricated microarrays, buffers, the appropriate nucleotide triphosphates (e.g., dATP, dCTP, dGTP and dTTP; or rATP, rCTP, rGTP and UTP), reverse transcriptase, DNA polymerase, RNA polymerase, and one or more primer complexes of the present invention (e.g., appropriate length poly(T) or random primers linked to a promoter reactive with the RNA polymerase). A set of instructions will also typically be included.

[0103] The methods provided by the present invention may also be automated in whole or in part. All aspects of the present invention may also be practiced such that they consist essentially of a subset of the disclosed genes to the exclusion of material irrelevant to the identification of breast cancer survival outcomes via a cell containing sample.

[0104] Having now generally described the invention, the same will be more readily understood through reference to the following examples which are provided by way of illustration, and are not intended to be limiting of the present invention, unless specified.

Glypican-1 in human breast cancer
Compositions and methods for therapy and diagnosis of breast cancer
Mammaglobin, a mammary-specific breast cancer protein
Method of diagnosing breast cancer and compositions therefor
Use of anastrozole for the treatment of post-menopausal women having early breast cancer
Compositions and methods for the therapy and diagnosis of breast cancer
Treatment of breast cancer
Apparatus and method for breast cancer imaging


20050165081 Use of anastrozole for the treatment of post-menopausal women having early breast cancer
20050147970 Breast cancer associated polypeptide
20050119263 Treatment of breast cancer
20050119188 Method of treating breast cancer
20050118658 Use of ERRalpha phosphorylation status as a breast cancer biomarker
20050118291 Formulations and methods for treating breast cancer with Morinda citrifolia and methylsulfonymethane
20050113432 Bis(cyanophenyl)methyl-triazole for use in prevention of breast cancer
20050100933 Breast cancer survival and recurrence
20050095607 Breast cancer signatures
20050089518 Prospective identification and characterization of breast cancer stem cells
20050080062 Breast cancer treatment regimen
20050080055 Method of treating breast cancer with androgen receptor antagonists
20050065418 Breast cancer screening
20050065333 Breast cancer-associated genes and uses thereof
20050064442 Methods for identifying risk of breast cancer and treatments thereof
20050053988 Gene expressed in breast cancer and methods of use
20050053958 Methods for identifying risk of breast cancer and treatments thereof
20040191819 Expression profiles for breast cancer and methods of use
20040167399 Breast cancer detection system
20040167170 Methods of preventing breast cancer
20040152144 Novel method of diagnosing, monitoring, staging, imaging and treating breast cancer
20040151724 Antibody fab fragments specific for breast cancer
20040151666 Rodent mammary window for intravital microscopy of orthotopic breast cancer and related method
20040146862 Methods of diagnosis of breast cancer, compositions and methods of screening for modulators of breast cancer
20040142490 Method of using estrogen-related receptor gamma (ERRgamma) status to determine prognosis and treatment strategy for breast cancer, method of using ERRgamma as a therapeutic target for treating breast cancer, method of using ERRgamma to diagnose breast cancer, and method of using ERRgamma to identify individuals predisposed to breast cancer

Copyright © 2006 - 2015 Patent Information Search