WO2006088208A1 - Procede d’evaluation d’un changement physiologique dans un corps vivant et appareil - Google Patents
Procede d’evaluation d’un changement physiologique dans un corps vivant et appareil Download PDFInfo
- Publication number
- WO2006088208A1 WO2006088208A1 PCT/JP2006/303083 JP2006303083W WO2006088208A1 WO 2006088208 A1 WO2006088208 A1 WO 2006088208A1 JP 2006303083 W JP2006303083 W JP 2006303083W WO 2006088208 A1 WO2006088208 A1 WO 2006088208A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- physiological change
- expression level
- gene
- individuals
- physiological
- Prior art date
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
- G16B25/10—Gene or protein expression profiling; Expression-ratio estimation or normalisation
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/10—Signal processing, e.g. from mass spectrometry [MS] or from PCR
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/158—Expression markers
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Definitions
- the present invention relates to a technique for predicting physiological changes in a living body based on gene expression data.
- a sample used for predicting physiological changes in a living body was collected from a site where a physiological change occurred, that is, from a site where a disease appears if the physiological change occurred.
- a biological tissue or a biological sample prepared from them is used (Non-patent Document 1).
- the expression level of a biological group having an element that induces a certain physiological change is compared with the expression level of a biological group that does not have a powerful element. It is known to select a gene to be a marker gene (Non-patent Document 3).
- Non-Patent Document 1 Biochemical and Biophysical Research Communications 315, 1088-10 96 (2004)
- Non-Patent Document 2 Nature 415, 530-526 (2002)
- Non-Patent Document 3 New England Journal of Medicine 347, 1999-2009
- an object of the present invention is to provide a technique for predicting a physiological change with high accuracy using a sample obtained by collecting a part force different from a part where a physiological change occurs in a living body.
- a method for predicting physiological changes in a living body includes a plurality of individuals that cause the physiological change and a plurality of individuals that do not cause the physiological change. Detecting a gene expression level of a plurality of genes by means of a biological tissue from which a site force different from the measurement target site is also collected, and among the genes, the individual causing the physiological change and the physiological Selecting a gene that has a statistical difference in expression level as an marker gene group from an individual that does not cause a change, and an individual that produces the physiological change and the physiological change that does not occur Perform multivariate analysis on the expression level of marker gene groups with individuals !, and based on the expression level of marker gene groups!
- Generate discrimination criteria to determine the presence or absence of the onset And the individual to be predicted The step of detecting the gene expression level of at least a gene including a marker gene group in a biological tissue collected from a site force different from the target site of the physiological change prediction And a step of predicting the presence or absence of a physiological change in the target region by applying the determination criterion to the gene expression level of a gene including the marker gene group of the prediction target individual.
- the presence or absence of a physiological change with high accuracy can be predicted based on the biological tissue collected from a site force different from the site causing the physiological change.
- the method for predicting physiological changes in a living body includes at least a biological tissue obtained by collecting a site force different from the target site of the physiological change prediction for an individual to be predicted. Apply judgment criteria to the step of detecting gene expression level for genes including marker gene groups and the gene expression level for genes including marker gene groups of the target individuals to predict the presence or absence of physiological changes at the target site Comprising the steps of
- the marker gene group consists of a plurality of individuals that cause the physiological change and a plurality of individuals that do not cause the physiological change, and a biological tissue collected from a portion different from the target site for the physiological change prediction.
- the gene expression level of a plurality of genes is detected, and a statistical difference in the expression level is found between the individual that produces the physiological change and the individual that does not produce the physiological change.
- the selection criterion is that a multivariate analysis is performed on the expression level of the marker gene group between the individual causing the physiological change and the physiological change! /, And the individual. Based on the expression level of the group, it is a discriminant criterion created based on the expression level.
- a method for generating a discrimination criterion used for predicting physiological changes of a living body according to the present invention is intended for a plurality of individuals that cause the physiological change and a plurality of individuals that do not cause the physiological change.
- the method for selecting a marker gene according to the present invention comprises a plurality of individuals that cause physiological changes and a plurality of individuals that do not cause the physiological changes. Different body force, the step of detecting the gene expression level of a plurality of genes using the collected biological tissue, and the individual of the gene that causes the physiological change and the individual that does not cause the physiological change. And a step of selecting genes to be expressed as marker gene groups when the difference in expression level is statistically observed.
- a marker gene with high accuracy can be selected based on the biological tissue collected from a site force different from the site causing the physiological change.
- the method for predicting physiological changes in a living body according to the present invention is characterized in that the detection of the gene expression level is performed using a gene expression detection element.
- the determination criterion creating program includes a plurality of individuals that cause a physiological change and a plurality of individuals that do not cause the physiological change, And a step of detecting gene expression levels of a plurality of genes by using different body forces, and a step of using the detected gene expression levels for each individual as basic data in association with the presence or absence of the physiological change.
- a gene in which a difference in expression level is statistically found between an individual that produces the physiological change and an individual that does not produce the physiological change is a marker gene.
- a criterion for predicting the presence or absence of a physiological change with high accuracy can be created based on a biological tissue collected from a part force different from a part that causes a physiological change.
- the marker gene selection program according to the present invention is intended for a plurality of individuals that cause physiological changes and a plurality of individuals that do not cause the physiological changes. Detecting a gene expression level of a plurality of genes by means of a biological tissue from which different site forces have been collected, and, among genes, between an individual that causes the physiological change and an individual that does not cause the physiological change. In this case, the computer is caused to execute a step of statistically seeing the difference in the expression level and selecting the gene to be expressed as a marker gene group.
- a marker gene with high accuracy can be selected based on a biological tissue collected from a site force different from the site causing the physiological change.
- the prediction program according to the present invention relates to a gene including at least a marker gene group for a biological tissue obtained by collecting a site force different from the target site of the physiological change prediction for an individual to be predicted.
- the computer executes the steps of detecting the gene expression level and applying a judgment criterion to the gene expression level for the gene including the marker gene group of the individual to be predicted, and predicting the presence or absence of a physiological change in the target region.
- the marker gene group is composed of a plurality of individuals that cause the physiological change and a plurality of individuals that do not cause the physiological change, and a biological tissue in which a force different from the site where the physiological change occurs is collected.
- the gene expression level of a plurality of genes is detected, and a statistical difference in the expression level is found between an individual that produces the physiological change and an individual that does not produce the physiological change among the genes.
- the criterion is based on the expression level of the marker gene group by performing a multivariate analysis on the expression level of the marker gene group between the individual causing the physiological change and the physiological change not occurring! It is characterized by the discriminant criteria created by the above.
- the program according to the present invention uses a gene expression detection element to detect the gene expression level.
- the gene expression detection element comprises a substrate and a probe formed on the substrate in order to detect each gene expression level for the marker gene group, and the marker gene group comprises: Targeting a plurality of individuals that cause the physiological change and a plurality of individuals that do not cause the physiological change, a part of the tissue that is different from the target site for the physiological change prediction is collected from a plurality of genes. A gene expression level is detected, and a gene in which a difference in the expression level is statistically found between the individual causing the physiological change and the individual not causing the physiological change is selected from the genes described above. It is characterized by being.
- a prediction device is a prediction device for predicting physiological changes in a living body
- the conversion unit that converts the gene expression level captured by the probe into an electric signal, and the expression level of each gene
- a prediction unit that receives a corresponding electrical signal and predicts the presence or absence of a physiological change based on a criterion
- the marker gene group consists of a plurality of individuals that cause the physiological change and a plurality of individuals that do not cause the physiological change, and a biological tissue collected from a portion different from the target site for the physiological change prediction.
- the gene expression level of a plurality of genes is detected, and a statistical difference in the expression level is found between the individual that produces the physiological change and the individual that does not produce the physiological change.
- a gene selected, and the criterion is that an individual who develops the physiological change and a person who does not develop the physiological change.
- V multivariate analysis based on the expression level of the marker gene group between individuals, and the distinction criteria created based on the expression level of the marker gene group! /
- the program according to the present invention is characterized in that the gene expression detection element is a DNA chip or a DNA array.
- the prediction device is characterized in that the gene expression level of a gene for a living tissue is detected based on the living tissue or a biological sample prepared therefrom.
- the prediction device according to the present invention is characterized in that the living tissue is skin tissue or mucosal tissue.
- the prediction device according to the present invention is characterized in that the biological sample is a fibroblast.
- the prediction device according to the present invention is characterized in that the biological sample is fibroblast-derived RNA.
- the prediction device according to the present invention is characterized in that the onset site is the brain.
- the prediction device according to the present invention is characterized in that the physiological change is the onset of a disease.
- the prediction device according to the present invention is characterized in that the disease is a central nervous disease.
- the prediction device is characterized in that the central nervous system disease is dementia, Parkinson's disease, amyotrophic lateral sclerosis, or prion disease (Kreuzfeld-Jakob disease). .
- the prediction device according to the present invention is characterized in that the dementia is Alzheimer's disease or frontotemporal dementia.
- the element that induces a physiological change is a Swedish mutation, Arctic Mutation and preserinin 1 gene It is characterized by being one or more elements selected for H136Y mutation.
- the prediction apparatus according to the present invention is characterized in that the multivariate analysis is an analysis method including principal component analysis and linear discriminant analysis.
- the predicting apparatus according to the present invention is characterized in that the difference in the expression level is observed! Selection of the gene to be displayed is performed based on the information amount standard.
- the prediction device is characterized in that the information criterion is an Allen cross-validation criterion.
- the detection of the gene expression level is performed by detecting a change in optical characteristics due to the labeled gene bound to the probe of the gene expression detection element by hybridization. It is characterized by things.
- the detection of the gene expression level is performed by detecting a change in electrical characteristics due to the gene bound to the probe of the gene expression detection element by hybridization. It is characterized by that.
- a prediction apparatus is a gene expression detection element used for predicting physiological changes in a living body
- the marker gene group is:
- Targeting multiple individuals that produce the physiological change and multiple individuals that do not produce the physiological change a different site force from the target site for the physiological change prediction.
- a gene in which a difference in expression level is statistically found between an individual that produces the physiological change and an individual that does not produce the physiological change is selected from the genes,
- the probe for each marker gene is
- Principal component analysis is performed on the expression level of the marker gene group between the individual that causes the physiological change and the individual that does not cause the physiological change, and corresponds to each gene according to the coefficient of the synthetic variable related to the principal component.
- the detection sensitivity of the probe to be set is set. It is said.
- a biological physiological change prediction system includes a server device and a terminal device,
- the terminal device is configured to detect a gene expression level detected for a gene including at least a marker gene group in a biological tissue collected from a site force different from the target site of the physiological change prediction for an individual to be predicted.
- Transmission means for transmitting the information indicating the reception, reception means for receiving the prediction result data from the server device, and output means for outputting the received prediction result data,
- a server device that receives information indicating the gene expression level from the terminal device; a prediction unit that applies a determination criterion to the gene expression level and predicts the presence or absence of a physiological change in the target site; Transmission means for transmitting the prediction result data by the prediction means to the terminal device,
- the marker gene group is:
- Targeting multiple individuals that produce the physiological change and multiple individuals that do not produce the physiological change a different site force from the target site for the physiological change prediction.
- a gene in which a difference in expression level is statistically found between an individual that produces the physiological change and an individual that does not produce the physiological change is selected from the genes,
- the criterion is
- a multivariate analysis is performed on the expression level of the marker gene group between the individual that causes the physiological change and the individual that does not cause the physiological change, and V is created based on the expression level of the marker gene group. It is characterized by being a discriminant criterion!
- the server device provides at least a marker for a biological tissue in which a part force different from the physiological change prediction target part is collected for an individual to be predicted.
- a receiving means for receiving information indicating a gene expression level detected for a gene including one gene group from the terminal device, and a prediction for predicting the presence or absence of a physiological change in the target site by applying a criterion to the gene expression level Means, and transmission means for transmitting the prediction result data by the prediction means to the terminal device,
- the marker gene group is:
- Targeting multiple individuals that produce the physiological change and multiple individuals that do not produce the physiological change a different site force from the target site for the physiological change prediction.
- a gene in which a difference in expression level is statistically found between an individual that produces the physiological change and an individual that does not produce the physiological change is selected from the genes,
- the criterion is
- a multivariate analysis is performed on the expression level of the marker gene group between the individual that causes the physiological change and the individual that does not cause the physiological change, and V is created based on the expression level of the marker gene group. It is characterized by being a discriminant criterion!
- the presence or absence of a physiological change with high accuracy can be predicted based on the collected body tissue from a site force different from the site causing the physiological change.
- a terminal device includes at least a marker gene group for a biological tissue in which a site force different from the target site for the physiological change prediction is collected for an individual to be predicted
- a transmission means for transmitting information indicating the gene expression level detected for the gene
- a reception means for receiving the prediction result data of the server device power
- an output means for outputting the received prediction result data
- the marker gene group is:
- Targeting multiple individuals that produce the physiological change and multiple individuals that do not produce the physiological change a different site force from the target site for the physiological change prediction.
- the genes that are found to be statistically different in the expression level between individuals that produce the physiological change and individuals that do not produce the physiological change are selected. It is characterized by that.
- the marker gene is specified by an accession number of the gene information database "Genbank” of the National Center for Biotechnology Information (NCBI), It is characterized by at least the following 51 genes included! /, Ru:
- physiological change of a living body refers to an observable change that occurs in a part of an organism such as a cell, tissue, organ, or the entire individual. For example, it is a concept that includes shape, color, size, temperature, energy consumption, substance production and changes in movement and behavior, and the onset of disease.
- the “element that expresses physiological change” includes any material or non-material matter that can induce the physiological change of the living body. Specific examples include genes, environment (temperature, water temperature, humidity, osmotic pressure, sound, vibration, etc.), nutritional status, drug administration, stress, personality, personality, and preferences. Not. “Elements that induce physiological changes” are also synonymous.
- Prediction of physiological change is the prediction of the presence or absence of a current physiological change in a site that is difficult to observe directly, not only when predicting a physiological change that will occur in the future. It is a concept that includes cases where
- Bio physiological change prediction marker is a direct marker for predicting physiological changes in the living body. It is used directly or indirectly. This includes genes, nucleotides, polynucleotides or proteins, polypeptides, and polynucleotides capable of specifically recognizing and binding to them whose expression varies in the body in relation to physiological changes in the body. Or an antibody is included. Based on the above properties, these nucleotides, polynucleotides and antibodies are used as probes for detecting the above-described genes and proteins expressed in vivo, and nucleotides and polynucleotides are expressed in vivo. As a primer for amplifying the protein, the protein can be effectively used for screening a substance to be bound. “Physiological change prediction marker”, “prediction marker”, and “marker” are also synonymous.
- Gene includes genetic information represented by a base sequence such as RNA or DNA. Also included are orthologous genes that are conserved among species such as humans, mice, and rats. A gene may function as RNA or DNA in addition to those that encode proteins. A gene generally encodes a protein according to its base sequence, but a protein having a biological function equivalent to the protein (for example, a homologue (such as a homologue splice variant), a mutant or a derivative). May be used. For example, a protein that encodes a protein whose base sequence is slightly different from the protein indicated by the base sequence based on genetic information, and whose base sequence hybridizes with a complementary sequence of the base sequence based on the genetic information. May be.
- DNA is a concept that includes each single-stranded DNA such as a sense strand and an antisense strand that constitutes a double-stranded DNA alone.
- DNA includes not only double-stranded DNA containing human genomic DNA, but also single-stranded DNA (positive strand) containing cDNA, single-stranded DNA (complementary strand) having a sequence complementary to the positive strand, and fragments thereof. This is a concept that includes deviations.
- DNA is a concept that includes functional regions such as expression control region, coding region, exon, intron, and so on. It is also a concept that includes cDNA, genomic DNA, synthetic DNA, and so on.
- RNA is a concept including single-stranded RNA having a complementary sequence to single-stranded RNA and double-difference RNA composed thereof.
- TotalRNA, mRNA, rR It is a concept that includes NA.
- Gene expression detection element refers to an element that detects the presence or absence or expression level of gene expression, and includes an element that electrically detects the expression level in addition to the one that optically detects the expression level. This refers to the presence / absence of expression and the expression level converted into physical quantities.
- This concept includes DNA chips and DNA arrays, including those in which probe DNA is placed on the glass surface, plastic wells, side and bottom surfaces of tubes, and the surface of microbeads.
- the “DNA chip” and “DNA array” have a structure in which probe DNA is arranged on a substrate, and measure the expression of a plurality of genes by hybridization. This includes not only optically measuring the expression level but also outputting the expression level electrically.
- GeneChip (trademark) manufactured by Affymetritas can be used as the “DNA chip”.
- CodeLink Expression Bioarray (trademark) of Amersham Biosciences can be used.
- DNA arrays include not only DNA microarrays but also DNA macroarrays.
- “Expression level” is a concept that includes a value calculated by a predetermined calculation or a statistical technique, in addition to a value obtained by directly measuring the expression level of a gene.
- “gene expression level”, “expression signal”, “gene expression signal”, “expression signal value”, “gene expression signal value”, “gene expression data”, “expression data”, etc. It is synonymous to indicate the value to be reflected.
- Gene expression refers to an aspect of gene expression in a living body expressed by the expression level of a gene, and is expressed by the expression level of one gene or the expression level of a plurality of genes. Any of the cases are included. “Expression” is also synonymous with the expression of gene expression in a living body.
- Detecting the expression level of a gene in a living tissue means detecting the expression level using a biological sample prepared based on the living tissue, which is not only when detecting the expression level using the living tissue itself. It is a concept that includes cases.
- Bio sample refers to a sample prepared from collected tissues, such as cells, fibroblasts, erythrocytes, leukocytes, lymphocytes, nucleic acids, fibroblast-derived RNA, and the like.
- Program refers to a source consisting of only a program that can be directly executed by a CPU.
- the concept includes a format program, a compressed program, an encrypted program, and the like.
- FIG. 1 and 2 show the flow of processing in a method for predicting physiological changes in a living body according to an embodiment of the present invention.
- Figure 1 shows the generation of discrimination criteria
- Fig. 2 shows the prediction using the discrimination criteria.
- the biological group that produces the physiological change (referred to as the first biological group) and the! / ⁇ biological group (referred to as the second biological group). ) (Step Pl).
- a biological tissue is collected from each individual belonging to the first biological group and the second biological group (step P2).
- a biological tissue at a site different from the target site for which a physiological change is predicted is collected. For example, when predicting physiological changes in the brain, tissue such as human upper arm skin is collected.
- a sample is prepared on the basis of the collected biological tissue for each individual force (step P3).
- fibroblasts are prepared from the collected biological tissue.
- step P4 hybridization using a DNA chip is performed using this sample (step P4).
- mRNA is removed from a sample, and cDNA (complementary DNA) of this mRNA is replicated.
- This cDNA is fluorescently treated.
- an aqueous solution containing the fluorescently treated cDNA is dropped onto the probe of the DNA chip to perform hybridization (duplex formation reaction).
- the DNA chip is provided with a large number of probe regions in the vertical and horizontal directions, and a large number of DNA probes are provided in each probe region.
- the DNA probe has a different base sequence for each probe region.
- the fluorescence-treated cDNA interacts with a DNA probe having a corresponding base sequence. Therefore, the expression level of mRNA can be detected by measuring the color density of each probe region.
- the hybridized DNA chip is imaged with a scanner to obtain an image having a color density corresponding to the expression level of mRNA.
- Sarako based on this image by image analysis software Next, obtain concentration data for each probe region as gene expression data (step P5).
- a marker gene group Based on the gene expression data for each individual obtained as described above, by comparing the gene expression data of the first biological group and the second biological group, the first biological group and the second biological group Then select genes with significantly different expression data (Step P6).
- the gene group selected in this way is defined as a marker gene group.
- a marker gene group can be selected by using an information criterion such as a cross-reduction criterion.
- multivariate analysis is performed on the gene expression data of the marker gene group in each individual to generate a reference for discriminating between the first biological group and the second biological group.
- a principal component analysis can be performed to obtain a discrimination criterion.
- a discrimination criterion used for predicting physiological changes can be generated.
- FIG. 2 shows the prediction of physiological changes.
- a biological tissue is collected from the individual that is the prediction target.
- a biological tissue of a site different from the target site for which physiological change is predicted is collected.
- a biological sample is prepared from the collected biological tissue (step P12). It is preferable that the biological sample is the same type of biological sample from which the tissue strength of the same part as that used in creating the discrimination criterion is also collected.
- step P13 hybridization using a DNA chip is performed (step P13). For example, mRNA is removed from a sample, and cDNA (complementary DNA) of this mRNA is replicated. This cDNA is fluorescently treated. Furthermore, an aqueous solution containing fluorescently treated cDNA is dropped onto the probe of the DNA chip, and hybridization (duplex formation reaction) is performed.
- the DNA chip used here can be the same DNA chip used to generate the discrimination criteria! /, But a dedicated DNA chip with only probes corresponding to the marker gene group. Is preferred.
- the hybridized DNA chip is imaged with a scanner to obtain an image with a color density corresponding to the expression level of mRNA. Further, based on this image, image analysis software acquires concentration data for each probe region as gene expression data (step P14). [0093] Subsequently, a discrimination criterion is applied to the acquired gene expression data to predict the presence or absence of a physiological change (Step P15), and obtain a prediction result (Step P16). As described above, physiological changes in the living body can be predicted.
- the marker gene selection method, the discrimination criterion creation method, and the prediction method based on the discrimination criterion can be performed independently.
- the discrimination criteria shown in FIG. 1 can be created according to the present invention, and the discrimination criteria can be used for other prediction methods or other than the prediction method.
- the method for selecting a marker gene indicated by steps P1 to P6 in Fig. 1 is performed, and based on the selected marker gene, a discrimination criterion is generated by a method other than the present invention, or the selected marker gene is selected. Can be used for purposes other than generating discrimination criteria.
- the generation of discrimination criteria shown in Fig. 1 and the prediction of physiological changes shown in Fig. 2 can be performed without using a computer. However, considering a large amount of data processing, it is preferable to implement as a device as shown below.
- Figure 3 shows a functional block diagram of the discrimination criterion generator.
- the expression level detection means 22 can obtain expression level data for each gene from the DNA chips Dl and D2 ⁇ ⁇ Dn that have been hybridized. Furthermore, the basic data is generated by the basic data generation means 24 by combining the expression level data with the physiological change presence / absence data of each individual.
- the marker selection means 26 selects the marker gene by comparing the expression data of the first biological group and the second biological group.
- the discrimination criterion generation means 28 performs multivariate analysis based on the expression level data of the marker gene group, and calculates a criterion for discriminating between the first biological group and the second biological group. As a result, the discrimination criterion 30 is recorded.
- FIG. 4 shows the hardware configuration when the discrimination criterion generator is realized by a computer.
- CPU 2 Connected to CPU 2 are display 4, scanner 6, memory 8, CD-ROM drive 10, and hard disk 12.
- the scanner 6 reads the probe area of the DNA chip on which hybridization has been performed as an image.
- the scanner 6 is connected to the CPU 2 and can directly capture data.
- the image data read by the scanner 6 may be recorded on a portable recording medium (CD-RW, etc.) and read from the CD-ROM drive 10!
- Memory 8 is used as a work area of CPU2.
- the operating disk 16 and the discriminant reference generation program 18 are recorded on the memory disk 12. These programs are recorded on the CD-ROM 14 and installed on the hard disk 12 via the CD-ROM drive 10.
- the discrimination criterion generation program 18 performs its function in cooperation with the operating system 16. Note that the discrimination criterion generation program 18 may be a program that functions alone.
- 5 and 6 show flowcharts of the discrimination criterion generation program.
- the physiological change of the living body to be predicted will be described as Alzheimer's disease.
- fibroblasts were isolated and cultured by the method described in Neuroscience Letters, 220 9-12 (199 6), and 3 to 10 million fibroblasts per sample. This was used as a biological sample. Sarako, this fibroblastic force also extracted TotalRNA. For extraction, Rneasy Mini kit (Qiagen, Valencia, CA) can be used.
- step S1 the CPU 2 reads an image of a DNA chip that is set in the scanner 6 and hybridized with a biological sample of the first individual. This picture The image has a fluorescence concentration corresponding to the expression level of each gene.
- CPU2 generates expression level data for each gene based on the fluorescence concentration of each probe region of the image. Thereby, the expression level data can be acquired (step S2). Also for these, the method described in JP-A-2003-169867 can be used. In addition, the expression level data acquisition part can be realized by using analysis software Microarray Suite version 5.0 of Affymetritas.
- the CPU 2 acquires data on whether or not Alzheimer's disease develops for the individual (step S3).
- an individual who has the ability to develop Alzheimer's disease or an individual who is certain to develop it in the future was treated as an individual who “develops Alzheimer's disease”.
- This may be input from a keyboard or the like (not shown), or may be acquired from data recorded in advance on the hard disk 12 for each individual. In the latter case, it is advisable to record the data by attaching an ID to each individual so that the corresponding data can be obtained by inputting the ID when reading the image of the DNA chip.
- the target individual is obtained, and the presence or absence of Arnno and Imah disease and the expression level for each gene are obtained as basic data and recorded on the hard disk 12 (step S4).
- CPU 2 determines whether or not the above processing has been performed for all individuals (step S5). If there is an unprocessed individual, the above steps S1 to S5 are repeated.
- Figure 7 shows a part of the basic data recorded on the hard disk 12.
- the top column is the individual ID.
- the DNA chip (HG-U133A) used in this embodiment has 22,283 types of probes. Therefore, in this embodiment, the number n of genes recorded on the hard disk 12 is 22,283.
- CPU 2 selects a marker gene based on the basic data recorded on hard disk 12.
- CPU2 excludes genes that are not expressed and genes with low expression levels (signal less than 44).
- the first gene is set as a target gene (step S6), and the maximum expression level and the minimum expression level are extracted from all individuals for the target gene (step S7). Based on the following formula, an intermediate value between the maximum value and the minimum value is calculated.
- Step S8 using this intermediate value as a boundary, it is divided into a large expression level group and a low expression level group (step S8). Further, based on the presence or absence of Alzheimer's disease in each individual, it is divided into two groups (Step S9).
- the expression level data of the target gene is divided into four groups as shown in FIG. Regions 1 and 1 are “onset” and “high expression” regions, regions 1 and 2 are “onset” and “low expression”, regions 2 and 1 are “onset” and “high expression” Regions 2 and 2 are “no onset” and “small expression” regions.
- the CPU 2 calculates the degree that the expression level of this gene is not related to "onset” and "no onset” (independent model) and the degree of relation (dependent model).
- the maximum log likelihood Lde of the dependent model and the maximum log likelihood Lin of the independent model are calculated according to the following equation based on Allen's cross validation (CV) standard (step S10).
- CV Allen's cross validation
- statistical analysis software “Visual Mining Studio ver. 3.0” (Mathematical Systems Inc.) can be used.
- n is the number of samples (total number of individuals), and n (i, j) is the number of samples (individuals) that fall within regions i and j in FIG.
- CPU2 records the calculated CV value in the hard disk 12 in association with the gene.
- CPU 2 determines whether or not CV values have been calculated for all genes constituting the probe of the DNA chip (step S12). If there is an uncalculated gene, the next gene is the target gene (step S14), and step S7 and subsequent steps are repeated.
- a predetermined number of genes having a large CV value are selected as one gene (step S13).
- the top 200 genes having a CV value of 3 or more were selected as marker genes.
- the marker gene may be selected by combining the CV value and the number as in this embodiment, but may be selected only by the CV value or by the number!
- the CPU 2 changes the support vector machine (SVM) when the CV value as a threshold is changed.
- SVM support vector machine
- the CV value that maximizes the correct answer rate of LOOCV may be selected. For example, one sample (one individual) is removed from all samples (all individuals), and the remaining sample (individual) is subjected to discriminant analysis by SVM using the expression level of the selected marker gene. The discriminant plane between the first group and the second group having the presence or absence of onset is obtained. Remove! Based on the expression level of only one sample, it was projected onto the discriminant space to determine whether or not discrimination was performed correctly. Repeat this procedure for all samples with different samples to be removed. Thereby, the correct answer rate is calculated.
- the LOOCV cross-validation part by SVM is the statistical analysis software “R” and “R” statistical analysis package “el071” (http: ⁇ www.cran.us.r-project.org/) This can be done using.
- CPU 2 normalizes the expression levels ⁇ 1, ⁇ 2... ⁇ ⁇ of each marker gene based on the following equation (step S 15).
- ⁇ ⁇ , ⁇ ⁇ ... ⁇ ⁇ is the average value of all the markers for the expression levels ⁇ 1, ⁇ 2.
- ⁇ 1, ⁇ 2 ⁇ ⁇ ⁇ are standard deviations in all individuals with respect to the expression levels ⁇ 1, ⁇ 2 ⁇ ⁇ ⁇ of each marker gene. In this embodiment, ⁇ is 200.
- CPU2 calculates the standardized expression level Dl of each marker gene calculated above, ⁇ 2 ⁇ ⁇ ⁇
- a principal component analysis is performed to calculate a first principal component X, a second principal component Y, and a fourth principal component ⁇ (step S16).
- Pli is the eigenvector for the i-th marker gene of the first principal component.
- P2i is the eigenvector for the i-th marker gene of the second principal component.
- P4i is the eigenvector for the i-th marker gene of the fourth principal component.
- CPU2 is the first, second, and fourth principal components of the first population that develops Alzheimer, and the first, second, and fourth major components of the second population that does not develop the Arnno
- a discriminant for discriminating between the first group and the second group is calculated by linear discriminant analysis. Specifically, a, b, c, and d in the following formula are calculated.
- the value of the above formula is predicted to develop Alzheimer's disease if it is greater than A force ⁇ , and less than 0 If so, it can be predicted that Arno and Imah disease will not develop.
- two or less forces using three main components, or four or more main components may be used.
- the first, second, and fourth principal components are used, but this is more than the case where the first, second, and third principal components are used.
- the prediction accuracy is higher.
- the prediction accuracy is often higher when the first, second, and third principal components are used.
- Figure 11 shows the functional block diagram of the prediction device.
- the above discriminant is recorded as a discrimination criterion.
- the expression level detection means 32 obtains the expression level data for each gene of the individual to be predicted from the DNA chip D that has been hybridized.
- a DNA chip having only a probe corresponding to a marker gene is used, but a DNA chip having other gene probes may also be used.
- the predicting means 34 calculates a numerical value A based on the recorded discriminant, and predicts that if it is greater than 0, it will develop Alzheimer's disease. Moreover, if it is smaller than 0, it is predicted that Alcno and Imah's disease will not occur.
- the output means 36 outputs this prediction result to a display, a printer or the like.
- Figure 12 shows the hardware configuration when the prediction device is implemented by a computer.
- a display 4, a scanner 6, a memory 8, a CD-ROM drive 10, and a hard disk 12 are connected to the CPU 2.
- the scanner 6 reads the probe area of the DNA chip that has been hybridized as an image.
- the scanner 6 is connected to the CPU 2 and can directly capture data.
- the image data read by the scanner 6 may be recorded on a portable recording medium (CD-RW, etc.) and read from the CD-ROM drive 10.
- the memory 8 is used as a work area of the CPU2.
- the operating disk 16, the prediction program 17, and the discriminant 19 are recorded on the memory disk 12.
- Discriminant formula 19 is described as part of the program of prediction program 17!
- These programs are recorded on the CD-ROM 14 and installed on the hard disk 12 via the CD-ROM drive 10.
- the forecast program 17 performs its function in cooperation with the operating system 16.
- the forecast program 17 is a program that works alone.
- FIG. 13 shows a flowchart of the prediction program.
- a physiological change of a living body to be predicted will be described as Arno and Imah's disease.
- step S51 the CPU 2 reads an image of a DNA chip that is set in the scanner 6 and hybridized with a biological sample of an individual to be predicted. This image has a fluorescence concentration corresponding to the expression level of each marker gene. Next, the CPU 2 generates expression level data for each marker gene based on the fluorescence concentration of each probe region of the image. Thereby, expression level data can be acquired (step S52).
- CPU 2 reads the discriminant (recorded above in Equation 6) recorded on hard disk 12, and calculates numerical value A based on the expression level of each marker gene (step S53). If the calculated numerical value A is smaller than 0, the prediction target individual predicts that Alzheimer's disease will not occur, and records the prediction result on the node disk 12 (step S55). If the calculated numerical value A force is greater than or equal to the predicted value, the prediction target individual predicts that the Arnotnoima disease will develop, and the prediction result is recorded on the node disk 12 (step S56).
- CPU 2 outputs numerical value A and the prediction result from display 4 or a printer (not shown) (step S 57).
- Figure 15 shows the configuration of the prediction system performed via the network.
- Terminal device Comb 50
- server device 54 computer
- the hardware configuration of the server device 54 is shown in FIG.
- the force scanner 6 that has almost the same configuration as the prediction device in FIG. 12 is not provided.
- a communication circuit 7 for communicating with the terminal device 50 via the Internet 52 is provided.
- FIG. A communication circuit 7 is provided for communicating with the server device 54 via the force Internet 52, which has almost the same configuration as the prediction device of FIG.
- a data acquisition program 15 is recorded on the hard disk 12.
- the DNA chip D hybridized with respect to the biological sample of the individual to be predicted is read by the scanner of the terminal device 50.
- the CPU 2 executes step S51 in FIG.
- the CPU 2 transmits this image data to the server device 54 through the communication circuit 7.
- CPU 2 of server device 54 executes steps S 52 to S 57 of FIG. 13 according to prediction program 17. That is, expression level data is acquired from this image data, and the presence or absence of onset is predicted.
- step S57 the CPU 2 transmits the numerical value A and the prediction result to the terminal device 50.
- the terminal device 50 receives this and displays it on the display 4.
- the presence / absence of onset can be predicted without a prediction program on the terminal device 50 side.
- the image data is transmitted to the server device 54.
- the expression level data may be obtained by the terminal device 50 and transmitted to the server device 54.
- a prediction device 40 in which a processing circuit 42 for prediction and a display device 44 for displaying a determination result are incorporated in a DNA chip can be constructed.
- a probe region 46 a probe corresponding to the marker gene is provided.
- Each probe emits an electrical signal when bound to a biological sample.
- This electrical signal is amplified by a transistor or the like and output as an expression level signal.
- This expression level signal is given to the processing circuit 42. Electronic See Analytical and Bioanalytical Chemistry, 377 (3) 521-527, 20 03, The Analyst, 130 (5), 687-693, 2005, etc. for details of the expression DNA chip.
- the processing circuit 42 includes a CPU and a memory, and has a program for executing steps S52 and S57 in FIG.
- a display 44 such as an LCD is connected to the CPU of the processing circuit 42, and the CPU displays the numerical value A on the display 44 in step S57! Note that the determination result may be displayed. If this DNA chip type prediction device is used, prediction can be performed easily.
- the power using the CPU for the processing circuit 42 may be a hardware circuit that executes a discriminant operation as shown in Fig. 14B.
- the expression level data ⁇ 2, 2... ⁇ ⁇ obtained by converting the expression level signal from each probe into digital data by an AZD converter (not shown) is given to the subtractor 62 via the multiplexer 60.
- constant data ⁇ 1, ⁇ 2... ⁇ ⁇ (average value in the above equation 3) is also given to the subtractor 62 via the multiplexer 64.
- Multiplexers 60 and 64 switch the expression data ⁇ ⁇ , 2 ⁇ ⁇ and constant data ⁇ 1, ⁇ 2 ⁇ ⁇ ⁇ by timing pulses ⁇ 1, ⁇ 2 ⁇ ⁇ Apply to subtractor 62. Therefore, the subtracter 62 sequentially includes the combination of the expression level data / ⁇ 1 and the constant data ⁇ 1 and the combination of the expression level data ⁇ 2 and the constant data ⁇ 2. A combination of ⁇ ⁇ is output. Therefore, the subtractor 62 sequentially subtracts the expression level data / ⁇ 1 from the constant data ⁇ 1, the expression level data from the constant data ⁇ 2; the operation to subtract ⁇ 2 ⁇ Expression from the constant data ⁇ ⁇ Performs subtraction of quantity data / ⁇ ⁇ . Then, the subtraction result is given to the multipliers 66, 68 and 70 in accordance with the timing rule.
- the multiplier 66 sequentially performs an operation of multiplying this by P11Z ⁇ 1 and an operation of multiplying this by P12Z ⁇ 2 ⁇ Multiplication of ⁇ 1 ⁇ / ⁇ (see Equation 3 and Equation 4). See).
- the output is given to the adder 72 according to the timing pulse.
- the adder 72 cumulatively adds the calculation results sent sequentially. Accordingly, when the timing pulse advances from TP1 to ⁇ , the first principal component data X is output from the adder 72. Similarly, the adder 74 and adder 76 output the second principal component data ⁇ and the fourth principal component data ⁇ .
- the first principal component data X is multiplied by a coefficient a (see Equation 5) by a multiplier 78 to obtain the second principal component data.
- the data Y is multiplied by the coefficient b by the multiplier 80, and the fourth principal component data Z is multiplied by the coefficient c by the multiplier 82 and then added by the adder 84, respectively. Therefore, the numerical value A can be obtained from the adder 84.
- the manufacturing cost can be suppressed relatively inexpensively as well as being able to perform prediction even in an environment without a computer or an expensive scanner.
- the necessary main components such as a DNA chip for the first main component, a DNA chip for the second main component, and a DNA chip for the fourth main component are used. Install a compatible DNA chip!
- Equation 8 standard data D′ l, D′ 2 ⁇ ⁇ “D′ n” is obtained by a conversion equation as shown in Equation 8 below.
- FIG. 18 shows the probe region of the DNA chip for the first main component in this embodiment. As shown in the figure, the probe region is provided from l to n.
- the sensitivity of the RNA probe in each probe region is not the same.
- the sensitivity of the probe is adjusted according to the coefficients ⁇ ⁇ , Pli corresponding to the genes in the region.
- the sensitivity is adjusted so that the fluorescence density corresponding to the amount multiplied by ⁇ ⁇ ⁇ ⁇ is detected.
- This preparation can be performed by adjusting the number of probe RNA or probe DNA provided in each probe region. It is preferable to determine the number of probes by measuring the relationship between the fluorescence concentration and the number of probes in advance.
- the DNA chip for the second main component and the DNA chip for the fourth main component are formed in the same manner.
- Equation 9 The sigma addition in Equation 9 below is automatically performed.
- Second principal component ' ⁇ / 3 ⁇ 4.
- the sensor readings for the DNA chip for the first principal component, the DNA chip for the second principal component, and the DNA for the fourth principal component are each obtained.
- Predictive judgment can be made by obtaining and recording, and applying the following Equation 10 to these manually (using a calculator or the like).
- prediction can be easily performed without a computer or without an expensive scanner.
- the ability to discriminate about Algno-Ima disease and other central nervous system diseases such as frontotemporal dementia, dementia, Parkinson's disease, amyotrophic lateral sclerosis and prion disease. It can also be applied to discrimination. Furthermore, it can also be used to predict diseases that develop in sites other than the brain.
- the skin tissue is collected and / or beaten. However, it may be a tissue other than the skin tissue such as mucosal tissue or blood as long as it is a tissue other than the site where the disease occurs.
- the cross-reduction criterion is used as the "comparison analysis”.
- “comparison analysis” refers to an analysis method that compares the gene expression data of two groups and evaluates the difference in gene expression between the groups.
- an analysis method that performs comparison based on the information criterion. For example, t-test, F-test,% 2 test, rank sum test, etc. If it is an analysis method that can evaluate the difference in gene expression between two groups by applying it to the data, it is not limited to these! ,.
- the analysis method may be constituted by a plurality of analysis methods rather than only one analysis method.
- the configuration of multiple analysis methods may be, for example, a parallel configuration in which the analysis results obtained by the multiple analysis methods are combined into a final analysis result!
- a serial configuration may be used in which an analysis result obtained by one analysis method is used as a variable, and an analysis result obtained by applying another analysis method is used as a final analysis result.
- the strength of the relationship between gene expression and factors that induce physiological changes obtained by comparative analysis is, for example, the ratio of p-values and statistics, or the mean, median, and variance of expression signals, although it may be expressed as a difference, etc., it is not limited to these as long as the difference in gene expression between groups can be evaluated by a continuous amount, a discrete amount, a series, or the like.
- the living body is usually a living body group that can be classified into two groups, such as a living body group having an element that induces a physiological change and a living body group having no such element.
- a living body group having an element that induces a physiological change and a living body group having no such element.
- the difference in gene expression between individual groups is evaluated separately by comparative analysis, and gene expression and By evaluating the strength of the association with the physiological change, it is possible to select a biological physiological change prediction marker gene corresponding to the physiological change between each group.
- a certain standard is set for the magnitude of the relationship between gene expression and an element that induces physiological changes in living organisms. Genes that match can be selected.
- the criteria for the magnitude of the relationship between gene expression for selection of a marker for predicting the expression of physiological changes in living organisms and the factors that induce physiological changes, and the number of genes to be selected are not limited. It is possible to select and adjust as appropriate.
- the "information criterion” is a criterion for evaluating the magnitude of association between a variable and an element that classifies the two groups, and an expression of an individual gene and an element that induces a physiological change. Used to evaluate the magnitude of the association.
- the living organisms are classified into two groups: a group with a high expression level and a group with a low expression level, and a group with and without an element that induces physiological changes.
- a 2-row by 2-column contingency table containing the number of organisms that meet each classification criterion is created.
- the methods for classifying living organisms into two groups, a group with a high expression level and a group with a low expression level, are classified according to whether or not the average value is greater than the average value, and the second between the maximum value and the minimum value, etc. Examples include, but are not limited to, a method for classification by classification, a method using% 2 test, and the like.
- the difference in the expression level between the group with and without the element that induces physiological change is due to the pattern power by which the organism is classified according to the above two criteria. Compare whether the statistical model (subordinate model) is assumed to have some relationship or the statistical model (independent model) if it has no relationship.
- the genes that are more compatible with the subordinate model are more closely related to the factors that induce expression and physiological changes.
- the comparison of the information criterion that represents the fitness to the dependent model and the information criterion that represents the fitness to the independent model can be made, for example, by taking a ratio or difference.
- the present invention is not limited to these as long as it can be evaluated by the amount of diffusion or series.
- the information amount criterion Akaike's information amount criterion, Bayesian information amount criterion, Minimum Description Length (MDL) criterion, or Allen's cross-reduction criterion, etc. may be mentioned.
- MDL Minimum Description Length
- Allen's cross-reduction criterion etc.
- Allen's Allen's Cross Validation Standard is mentioned.
- Multivariate analysis in the above embodiment is a general term for statistical analysis methods that simultaneously analyze a plurality of variables, and refers to an analysis method that simultaneously analyzes expression data of a plurality of genes.
- Multivariate analysis includes analysis methods such as principal component analysis, factor analysis, self-organizing map, cluster analysis, discriminant analysis, multiple regression analysis, and canonical correlation analysis. Any analysis technique can be used as long as it can discriminate gene expression between the two groups by applying to the above expression data.
- the analysis method described above may be constituted by a plurality of analysis methods, not just those constituted by one analysis method.
- the configuration of multiple analysis methods may be, for example, a parallel configuration in which each analysis result obtained by multiple analysis methods is combined into a final analysis result. It may be a serial configuration in which an analysis result obtained by one analysis method is a variable, and an analysis result obtained by applying another analysis method is a final analysis result.
- the criteria for discriminating the two groups determined by multivariate analysis are the relational expression representing the characteristics of one group, the relational expression representing the characteristics of the other group, and the! Forces that may be obtained as points, curves, straight lines, curved surfaces, planes, hyperplanes, etc. that represent the boundary between one group and the other group. If it is possible to project individual organisms in space using their gene expression data, it is limited to these. It is not a thing.
- a living body to which the prediction method of the present invention is applied is usually a living body that can be classified into two groups, a living body group having an element that expresses a certain physiological change and a living body group having no such element.
- the gene expression between the individual groups is determined separately by multivariate analysis, and each group It is possible to obtain discrimination criteria corresponding to physiological changes between groups by defining discrimination criteria between living body groups that have such elements and living body groups that do not have such elements. It is.
- Principal component analysis in the above embodiment is an analysis method for characterizing the relationship between individual samples using a principal component that is a new variable synthesized from a plurality of variable parameters. It is used to obtain a variable that can more clearly discriminate between a biological group having an element that induces a physical change and a biological group having no such element.
- the "linear discriminant method" in the above embodiment is an analysis method for obtaining a boundary between two groups of samples using a plurality of variables, and discriminates whether or not a biological change will occur in the future. It is used to define a boundary between a living body group having an element that induces the physiological change as a reference and a living body group having no such element.
- linear discrimination is performed on the principal components selected as variables for discriminating between the two groups obtained by the principal component analysis described above. Applying the method, it is possible to obtain points, straight lines, planes, or hyperplanes that serve as criteria for distinguishing the two groups.
- force and other information criterion using Allen's cross-validation criterion may be used.
- the Akaike information criterion, Bayesian information criterion, and Minimum Description Length (MDL) criterion may be used.
- the information criterion is a criterion for evaluating the magnitude of the relationship between the variable and the elements that classify the two groups.
- a comparative analysis other than the information amount criterion may be used.
- an analysis method that can evaluate the difference in gene expression between two groups by applying to the expression data of one gene such as t test, F test, c 2 test, rank sum test, etc. can be used.
- the analysis method may be a combination of a plurality of analysis methods.
- fibroblasts were obtained from Neuroscience Letters, 220 9-12.
- Aifymetrix oligonucleotide type DNA chip GeneChip for gene expression measurement HG-U133A Array was used. Specifically, preparation of cDNA from total RNA, preparation of labeled cRNA from the cDNA, fragmentation of labeled cRNA, fragmentation Hybridization of cRNA and DNA chip, fluorescent staining of hybridized cRNA, on DNA chip The method similar to the method described in Japanese Patent Application Laid-Open No. 2003-169687 was performed in the order of reading the fluorescence of the sample and measuring the gene expression level. Finally, the gene expression level was obtained by analyzing the fluorescence image of the HG-U133A Array using the analysis software Microarray Suite version 5.0.
- FIG. 19a to 19f The expression level data thus obtained are shown in Figs. 19a to 19f.
- the top column is the individual ID
- the leftmost column is the gene ID.
- the gene ID is indicated by a probe set number of Affymetritas.
- Figures 19a to 19c show the gene expression levels of individuals with Alzheimer's disease (with etiological gene holder)
- Figs. 19d to 19f show the gene expression levels of individuals with no Alzheimer's disease (with no etiological gene holder). Amount.
- 22,238 kinds of genes have only the marker gene data in the power diagram, and the other genes are omitted.
- the CV standard of the dependent model and the CV standard of the independent model were calculated from the number of samples stored in each section of the contingency table based on the following formula.
- CV standards were performed using commercially available statistical analysis software “Visual Minng Studio ver. 3.0” (mathematical system).
- n is the number of samples
- n (i, j) is the number of samples stored in the section of the i-th row and the j-th column.
- n is the number of samples
- n (i) is the number of samples in the i-th row
- n (j) is the number of samples in the j-th column
- Probe sets corresponding to the marker genes are shown in FIGS. 20a and 20b.
- item A represents the probe set identification number, and information on the corresponding gene is available from Affymetritas (http://www.alfymetrix.com/index.afik).
- item B represents the value of “CV standard for dependent model, CV standard for independent model”.
- Leave-One-Out cross-validation was performed on the 200 genes described above using a support vector machine (SVM). Specifically, remove one from 30 samples, and perform the discriminant analysis by SVM using the value of the expression signal of 200 probe sets for the remaining 29 samples! Above, we obtained a discriminant surface between the group with and without the familial Alzheimer's disease etiology gene (see Fig. 21). Then, only one sample was projected onto the discriminant space based on the value of the expression signal, and it was verified whether the presence or absence of the gene causing the familial Alzheimer's disease was correctly discriminated.
- SVM support vector machine
- an onset prediction formula serving as a criterion for determining whether or not to develop Alzheimer's disease in the future was set.
- Equation 3 X, Y, and ⁇ ⁇ are expressed by Equation 3 and Equation 4, respectively.
- ⁇ is 200.
- Second principal component y >> ⁇ 2 / Di
- ⁇ represents the value of the expression signal of each probe set. Also ⁇
- li, ⁇ , and i 2i and P are elements of the eigenvectors of the individual probe sets that make up the marker gene set
- ⁇ and ⁇ represent the mean ⁇ and standard deviation ⁇ of the expression values of 30 samples for each probe set. Specifically, the values were as shown in FIGS. 22a to 22d.
- A> 0 Alzheimer's disease is predicted to develop, and A ⁇ 0 If present, it is predicted that Alzheimer's disease will not develop.
- fibroblasts are obtained from the skin tissue provided by the subject, RNA is further extracted, and the expression level is measured by GeneChip HG-U133A Array.
- Diagnosis is based on X, Y, and ⁇ values. If ⁇ ⁇ ⁇ ⁇ ⁇ 0, the subject will be near and will not develop Arno or Imah disease in the future! Predicted to develop if A> 0. Is done.
- fibroblasts were isolated and cultured by the method described in Neuroscience Letters, 220 9-12 (1996), and 3 to 10 million fibroblasts per sample were obtained. Obtained.
- the expression level of each gene was measured using total RNA extracted from fibroblasts.
- an oligonucleotide type DNA chip GeneChip HG-U133A Array manufactured by Aifymetrix was used for the measurement of gene expression level. Specifically, preparation of cDNA from total RNA, preparation of labeled cRNA from the cDNA, fragmentation of labeled cRNA, fragmentation Hybridization of cRNA and DNA chip, fluorescent staining of hybridized cRNA, on DNA chip The method similar to the method described in Japanese Patent Application Laid-Open No. 2003-169687 was performed in the order of reading the fluorescence of the sample and measuring the gene expression level. Finally, the gene expression level was obtained by analyzing the fluorescence image of the HG-U133A Array using the analysis software Microarray Suite version 5.0.
- a 2X2 contingency table shown in Fig. 8 was created based on the presence and absence of familial Alzheimer's disease etiology genes and their expression levels.
- CV standard of the dependent model and the CV standard of the independent model were calculated from the number of samples contained in each section of the contingency table based on the following formula.
- CV standards were performed using commercially available statistical analysis software “Visual Minng Studio ver. 3.0” (Mathematical System).
- FIG. 23 shows a probe corresponding to the marker gene selected in this way.
- item A represents the probe set identification number, and the corresponding gene information is available from Affymelitas (http: www.afiVmetrix.com/index.affx).
- Item B represents the value of the “CV standard for the dependent model”.
- the identification number of the probe set of Affymetritas corresponds to the accession number of the gene information database “Genbank” of the National Center for Biotechnology Information (NCBI) as shown in FIG. And!
- a predictive expression formula was set as a criterion for determining whether or not it will develop Alzheimer's disease in the future.
- the onset prediction formula was obtained by the following formula.
- ⁇ represents the value of the expression signal of each probe set.
- P is an element of the eigenvector of each probe set constituting the marker gene set
- ⁇ i and ⁇ i represent the mean and standard deviation of the expression values of 30 samples for each probe set. Specifically, the values shown in Fig. 26 were obtained.
- ⁇ is the skin tissue fibroblast power of the person who is the target of predicting the onset of Arno-Ima disease. Similar to 30 samples used to extract the marker gene set and to set the expression prediction formula. By inputting the expression signal value of each of 51 probe sets included in the marker gene set among the expression signal values obtained by hybridization with the DNA chip GeneChip HG-U133A Array by the method of X, Y, And the value of ⁇ is obtained, and further the value of ⁇ is obtained. [0271] If the value of A is A> 0, it is predicted that Alzheimer's disease will develop, and if A ⁇ 0, it is predicted that Alno and Imah's disease will not develop.
- Example 4 3 ⁇ 4 30 woven sample donors in Example 4 contributed dermatofibrofibroma cells from 18 people, and 3 ⁇ 4 in Example 4 , the method of neck P, was used to introduce fibrosis from the dermis weed.
- the RNA was extracted and the expression level was measured by GeneChiD HG-U133A Arrav.
- the marker gene set shown in Example 4 51 Each expression signal value of the probe set was used to predict the onset marker gene Expression signal value of the set and individual vector elements of the individual probe sets constituting the marker gene set shown in Fig. 26 of Example 5 From the average expression values and standard deviations of the tissue sample donors in Example 4 in the past, according to the onset prediction formula (Formulas 11 and 12), if A ⁇ 0, the etiology of familial Arno-Ima disease It was predicted that the gene would be retained, and if A> 0, it was predicted to retain the etiological gene of familial Alzheimer's disease.
- Figure 27 shows the prediction results.
- fibroblasts are obtained from the skin tissue provided by the subject, RNA is further extracted, and the expression level is measured by GeneChip HG-U133A Array.
- Example 4 51 Each expression signal value of the probe set was used to predict the onset marker gene Expression signal value of the set and individual vector elements of the individual probe sets constituting the marker gene set shown in Fig. 26 of Example 5 Tissue sample provider in Example 4
- a method for predicting physiological changes in a living body based on the gene expression data using a sample collected from a site force different from the expression site, and effectively used for predicting physiological changes in the living body And a method for selecting a marker gene for predicting physiological changes in living organisms.
- FIG. 1 is a diagram showing a flow of discrimination criterion creation processing according to an embodiment of the present invention.
- FIG. 2 is a diagram showing a flow of a physiological change prediction process according to an embodiment of the present invention.
- FIG. 3 is a functional block diagram of a discrimination criterion creating apparatus according to an embodiment of the present invention.
- FIG. 4 This is a hardware configuration when the device of FIG. 3 is realized using a CPU.
- FIG. 5 is a flowchart of a determination criterion generation program.
- FIG. 6 is a flowchart of a judgment criterion generation program.
- FIG. 7 is a diagram for showing the data structure of recorded expression data.
- FIG. 9 is a diagram showing a data structure of CV values recorded for each gene.
- FIG. 11 is a functional block diagram of a prediction device.
- FIG. 12 This is a hardware configuration when the device of FIG. 11 is realized using a CPU.
- FIG. 13 is a flowchart of a prediction program.
- FIG. 14A is a cross-sectional view of a DNA chip according to another embodiment.
- FIG. 14B is a diagram showing details of the processing circuit.
- FIG. 15 is a configuration diagram of a prediction system.
- FIG. 16 shows the hardware configuration of the server device.
- FIG. 18 shows a probe
- FIG. 19a is a diagram showing expression data.
- FIG. 19b shows expression data
- FIG. 19c is a diagram showing expression data.
- FIG. 19d is a diagram showing expression data.
- FIG. 19e shows expression data.
- FIG. 19f is a diagram showing expression data.
- FIG. 20a Data showing the relationship between the probe set and the CV value.
- FIG. 20b Data showing the relationship between the probe set and the CV value.
- FIG. 21 is a diagram showing a boundary surface based on a discriminant equation.
- FIG. 22a is a diagram showing an average expression value, standard deviation ⁇ , eigenvector Pl, ⁇ 2, ⁇ 4, etc. of a marker gene group.
- FIG. 22b is a diagram showing the mean expression value, standard deviation ⁇ , eigenvector Pl, ⁇ 2, ⁇ 4, etc. of the marker gene group.
- FIG. 22c is a diagram showing the mean expression value, standard deviation ⁇ , eigenvector Pl, ⁇ 2, ⁇ 4, etc. of the marker gene group.
- FIG. 22d is a diagram showing the mean expression value, standard deviation ⁇ , eigenvector Pl, ⁇ 2, ⁇ 4, etc. of the marker gene group.
- FIG. 23 Data showing the relationship between the probe set and the CV value.
- FIG. 24 is a diagram showing the correspondence between NCBI GenBank accession numbers and Affymetritas p lobe set numbers.
- FIG. 25 shows the results of principal component analysis.
- FIG. 26 is a diagram showing the mean expression value, standard deviation ⁇ , eigenvector Pl, ⁇ 2, ⁇ 3, etc. of the marker gene group.
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Genetics & Genomics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Biotechnology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Chemical & Material Sciences (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Biology (AREA)
- Organic Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Artificial Intelligence (AREA)
- Analytical Chemistry (AREA)
- Software Systems (AREA)
- Public Health (AREA)
- Bioethics (AREA)
- Evolutionary Computation (AREA)
- Epidemiology (AREA)
- Zoology (AREA)
- Databases & Information Systems (AREA)
- Signal Processing (AREA)
- Data Mining & Analysis (AREA)
- Wood Science & Technology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Microbiology (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Biochemistry (AREA)
- General Engineering & Computer Science (AREA)
- Apparatus Associated With Microorganisms And Enzymes (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Le problème à résoudre dans cette invention consiste en ce que, en utilisant, par exemple, un échantillon collecté à partir d’un site différent d’un site montrant un changement physiologique dans un corps vivant, le changement physiologique est évalué avec une précision élevée. Selon la solution proposée, un tissu vital est collecté à partir de chacun des individus appartenant à un premier groupe de corps vivants (un groupe d’individus montrant un changement physiologique) et un second groupe de corps vivants (un groupe d’individus ne montrant pas de changement) (Étape P2). Dans cette étape, le tissu vital est collecté au niveau d’un site différent d’un site que l’on a évalué comme montrant le changement physiologique. Ensuite, une hybridation est effectuée sur des puces à ADN en utilisant les échantillons collectés sur les individus (Étape P4). Les puces à ADN hybridées sont scannées et les données de densité dans chaque région de sonde sont obtenues en tant que données d’expression génique (Étape P5). Sur la base des données d’expression génique pour chacun des individus obtenues ci-dessus, les gènes montrant une différence remarquable des données d’expression entre le premier groupe de corps vivants et le second groupe de corps vivants sont sélectionnés (Étape P 6). Une analyse multivariable est réalisée sur les données d’expression génique des gènes marqueurs de chacun des individus de manière à établir les critères pour différencier le premier groupe de corps vivants du second groupe de corps vivants.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2007503790A JPWO2006088208A1 (ja) | 2005-02-21 | 2006-02-21 | 生体の生理変化の予測方法および装置 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2005-044776 | 2005-02-21 | ||
JP2005044776 | 2005-02-21 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2006088208A1 true WO2006088208A1 (fr) | 2006-08-24 |
Family
ID=36916605
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2006/303083 WO2006088208A1 (fr) | 2005-02-21 | 2006-02-21 | Procede d’evaluation d’un changement physiologique dans un corps vivant et appareil |
Country Status (2)
Country | Link |
---|---|
JP (1) | JPWO2006088208A1 (fr) |
WO (1) | WO2006088208A1 (fr) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010064413A1 (fr) * | 2008-12-01 | 2010-06-10 | 国立大学法人山口大学 | Système pour prédire les effets et les effets indésirables d’un médicament et programme pour celui-ci |
WO2014007363A1 (fr) * | 2012-07-05 | 2014-01-09 | 独立行政法人科学技術振興機構 | Programme et procédé de typage de cellule et dispositif de typage de cellule |
JP2014139787A (ja) * | 2013-01-21 | 2014-07-31 | International Business Maschines Corporation | 表現型予測のためのエピスタシスの効率的なモデル化のための特徴選択方法、情報処理システム、及びコンピュータ・プログラム |
JP2022547771A (ja) * | 2019-07-30 | 2022-11-16 | アリファックス ソチエタ レスポンサビリタ リミタータ | 微生物を同定する方法およびシステム |
CN115828093A (zh) * | 2022-11-02 | 2023-03-21 | 四川帕诺米克生物科技有限公司 | 组学样本的分析方法、装置、电子设备及存储介质 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1994010205A1 (fr) * | 1992-10-23 | 1994-05-11 | Fujisawa Pharmaceutical Co., Ltd. | Proteine specifique de la maladie d'alzheimer et procede de diagnostic de cette maladie par detection de ladite proteine |
JPH11342000A (ja) * | 1998-02-09 | 1999-12-14 | Affymetrix Inc | 発現比較のコンピュ―タ支援による視覚化 |
WO2002072828A1 (fr) * | 2001-03-14 | 2002-09-19 | Dna Chip Research Inc. | Procede permettant de prevoir un cancer |
WO2003072065A2 (fr) * | 2002-02-28 | 2003-09-04 | Iconix Pharmaceuticals, Inc. | Signatures de medicaments |
WO2003085548A1 (fr) * | 2002-04-04 | 2003-10-16 | Ishihara Sangyo Kaisha, Ltd. | Dispositif et procede d'analyse de donnees |
JP2004208547A (ja) * | 2002-12-27 | 2004-07-29 | Hitachi Ltd | うつ病の評価方法 |
JP2004355174A (ja) * | 2003-05-28 | 2004-12-16 | Ishihara Sangyo Kaisha Ltd | データ解析方法及びそのシステム |
-
2006
- 2006-02-21 JP JP2007503790A patent/JPWO2006088208A1/ja active Pending
- 2006-02-21 WO PCT/JP2006/303083 patent/WO2006088208A1/fr active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1994010205A1 (fr) * | 1992-10-23 | 1994-05-11 | Fujisawa Pharmaceutical Co., Ltd. | Proteine specifique de la maladie d'alzheimer et procede de diagnostic de cette maladie par detection de ladite proteine |
JPH11342000A (ja) * | 1998-02-09 | 1999-12-14 | Affymetrix Inc | 発現比較のコンピュ―タ支援による視覚化 |
WO2002072828A1 (fr) * | 2001-03-14 | 2002-09-19 | Dna Chip Research Inc. | Procede permettant de prevoir un cancer |
WO2003072065A2 (fr) * | 2002-02-28 | 2003-09-04 | Iconix Pharmaceuticals, Inc. | Signatures de medicaments |
WO2003085548A1 (fr) * | 2002-04-04 | 2003-10-16 | Ishihara Sangyo Kaisha, Ltd. | Dispositif et procede d'analyse de donnees |
JP2004208547A (ja) * | 2002-12-27 | 2004-07-29 | Hitachi Ltd | うつ病の評価方法 |
JP2004355174A (ja) * | 2003-05-28 | 2004-12-16 | Ishihara Sangyo Kaisha Ltd | データ解析方法及びそのシステム |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010064413A1 (fr) * | 2008-12-01 | 2010-06-10 | 国立大学法人山口大学 | Système pour prédire les effets et les effets indésirables d’un médicament et programme pour celui-ci |
WO2014007363A1 (fr) * | 2012-07-05 | 2014-01-09 | 独立行政法人科学技術振興機構 | Programme et procédé de typage de cellule et dispositif de typage de cellule |
JP2014139787A (ja) * | 2013-01-21 | 2014-07-31 | International Business Maschines Corporation | 表現型予測のためのエピスタシスの効率的なモデル化のための特徴選択方法、情報処理システム、及びコンピュータ・プログラム |
JP2022547771A (ja) * | 2019-07-30 | 2022-11-16 | アリファックス ソチエタ レスポンサビリタ リミタータ | 微生物を同定する方法およびシステム |
JP7499795B2 (ja) | 2019-07-30 | 2024-06-14 | アリファックス ソチエタ レスポンサビリタ リミタータ | 微生物を同定する方法およびシステム |
CN115828093A (zh) * | 2022-11-02 | 2023-03-21 | 四川帕诺米克生物科技有限公司 | 组学样本的分析方法、装置、电子设备及存储介质 |
CN115828093B (zh) * | 2022-11-02 | 2024-04-05 | 四川帕诺米克生物科技有限公司 | 组学样本的分析方法、装置、电子设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
JPWO2006088208A1 (ja) | 2008-07-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xu et al. | Robust prostate cancer marker genes emerge from direct integration of inter-study microarray data | |
US20230222311A1 (en) | Generating machine learning models using genetic data | |
KR101828052B1 (ko) | 유전자의 복제수 변이(cnv)를 분석하는 방법 및 장치 | |
Feng et al. | Research issues and strategies for genomic and proteomic biomarker discovery and validation: a statistical perspective | |
CN112888459A (zh) | 卷积神经网络系统及数据分类方法 | |
US20030225526A1 (en) | Molecular cancer diagnosis using tumor gene expression signature | |
CA2877436C (fr) | Systemes et procedes pour generer des signatures de biomarqueurs | |
CN104968802B (zh) | 作为诊断标志物的新miRNA | |
EP3934684A1 (fr) | Apprentissage automatique dans des dosages de cancers fonctionnels | |
WO2006088208A1 (fr) | Procede d’evaluation d’un changement physiologique dans un corps vivant et appareil | |
KR101967248B1 (ko) | 개인의 유전 정보를 분석하는 방법 및 장치 | |
CN114174529A (zh) | Epi衰老:用于管理健康衰老的新型生态系统 | |
US20140180599A1 (en) | Methods and apparatus for analyzing genetic information | |
US20030194701A1 (en) | Diffuse large cell lymphoma diagnosis and outcome prediction by expression analysis | |
CN113257353B (zh) | 基于reads深度进行目的基因外显子水平缺失检测的方法及装置 | |
KR20150039484A (ko) | 유전 정보를 이용하여 암을 진단하는 방법 및 장치 | |
CN107429242A (zh) | 辅助大肠癌的预后诊断的方法、记录介质及判断装置 | |
US20100009370A1 (en) | Gene assaying method, gene assaying program, and gene assaying device | |
RU2827489C2 (ru) | Эпиэйджинг: новая экосистема для управления здоровым старением | |
JP2003079934A (ja) | ゲノム情報占い方法及びゲノム情報占い装置 | |
Yahya | Sequential Dimension Reduction and Prediction Methods with High-dimensional Microarray Data | |
Fundel et al. | Data processing effects on the interpretation of microarray gene expression experiments | |
CN117941002A (zh) | 染色体和亚染色体拷贝数变异检测 | |
WO2022165205A1 (fr) | Systèmes et méthodes de diagnostic de maladies neurodégénératives | |
Podila et al. | 24 Microarray Data Collection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2007503790 Country of ref document: JP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 06714223 Country of ref document: EP Kind code of ref document: A1 |