[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2015042980A1 - Method, system, and computer-readable medium for determining snp information in a predetermined chromosomal region - Google Patents

Method, system, and computer-readable medium for determining snp information in a predetermined chromosomal region Download PDF

Info

Publication number
WO2015042980A1
WO2015042980A1 PCT/CN2013/084783 CN2013084783W WO2015042980A1 WO 2015042980 A1 WO2015042980 A1 WO 2015042980A1 CN 2013084783 W CN2013084783 W CN 2013084783W WO 2015042980 A1 WO2015042980 A1 WO 2015042980A1
Authority
WO
WIPO (PCT)
Prior art keywords
snp
sequencing
haplotype
embryo
snp information
Prior art date
Application number
PCT/CN2013/084783
Other languages
French (fr)
Chinese (zh)
Inventor
李剑
张现东
李金良
刘赛军
叶敏兰
Original Assignee
深圳华大基因科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳华大基因科技有限公司 filed Critical 深圳华大基因科技有限公司
Priority to CN201380079613.2A priority Critical patent/CN106029899B/en
Priority to PCT/CN2013/084783 priority patent/WO2015042980A1/en
Priority to CN201480050879.9A priority patent/CN105555970B/en
Priority to PCT/CN2014/081672 priority patent/WO2015043278A1/en
Publication of WO2015042980A1 publication Critical patent/WO2015042980A1/en
Priority to HK16109816.5A priority patent/HK1221745A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6827Hybridisation assays for detection of mutation or polymorphism

Definitions

  • the present invention relates to the field of biomedicine and, in particular, to a method, system and computer readable medium for determining SNP information in a predetermined region of a chromosome. Background technique
  • the World Health Organization's 2012 Global Birth Defect Prevention Report shows that the global incidence of birth defects is 3%, with 3.2 million birth defects per year, of which 270,000 newborns die from birth defects. Studies have shown that most of the birth defects are related to genetic factors, and chromosomal abnormalities and monogenic genetic diseases are two important reasons. Among them, there are many types of monogenic genetic diseases, and the incidence rates are different, and most of these diseases cannot be cured, which brings a heavy economic and psychological burden to the whole society and families. Therefore, prevention of the occurrence of children with monogenic genetic diseases and reduction of the birth of children with genetic diseases are the focus of prevention and control of hereditary birth defects.
  • Preimplantation Genetic Diagnosis (PGD) technology can block the occurrence and transmission of genetic diseases from the roots, and advance the prevention of birth defects to the embryonic stage.
  • PTD Preimplantation Genetic Diagnosis
  • pre-implantation diagnosis of single-gene borne diseases has not been widely applied, and so far thousands of cases have been reported in the world. The reason is mainly due to the small amount of specimens (only 1 ⁇ 2 cells), easy to cause allele tripping (ADO) and pollution, the detection is more difficult, the existing detection technology can not fully meet the single genetic disease implant Clinical requirements for pre-diagnosis.
  • the haplotype analysis before embryo implantation is the main method for the detection of monogenic diseases before implantation.
  • This method determines mutational haplotypes by detecting mutation sites and multiple STRs (or SNPs) linked to them, reducing the effects of allelic amplification, ADO, and contamination.
  • Multiplex PCR MF-PCR
  • MF-PCR Multiplex PCR
  • the linkage markers used in MF-PCR are often far from the pathogenic site and may have a risk of misdiagnosis due to chromosomal recombination events.
  • SNP-army is an analysis of SNP loci in the whole genome region, and the SNP density is high and the number is large.
  • the advantage of this method is that it is suitable for haplotype analysis of all samples, and no pre-test is required to select molecular markers for individual samples.
  • the chip can detect multiple diseases at the same time. However, the chip can only be indirectly detected by haplotype analysis. It is not possible to directly detect the site of the disease.
  • the present invention aims to solve at least one of the technical problems existing in the prior art.
  • the present invention aims to propose a method for efficiently determining SNP information in a chromosome, particularly a predetermined region of an embryonic chromosome.
  • the invention proposes a method of determining SNP information in a predetermined region of a chromosome.
  • the method comprises: constructing a sequencing library for at least a portion of a chromosome; screening the sequencing library with a probe, wherein the probe specifically recognizes a known SNP position in the predetermined region At least one of points to obtain a target capture fragment, the target capture fragment comprising a SNP site; sequencing the sequenced sequencing library to obtain a sequencing result; and determining a SNP in the predetermined region based on the sequencing result information.
  • the method for determining SNP information in a predetermined region of a chromosome of the present invention it is possible to efficiently and accurately determine SNP information in a predetermined region of a chromosome, for example, information on a mutation site associated with a pathogenic gene of a sample, and further, the information can be effectively It is used to determine whether the genetic state of a subject is normal, carried or pathogenic, thereby providing a basis for clinical disease detection or treatment.
  • the invention also provides a method of determining SNP information in a predetermined region of an embryonic chromosome.
  • the method comprises: acquiring a whole genome of the embryo; and determining, for the whole genome of the embryo, a predetermined region of the embryo chromosome according to the method for determining SNP information in a predetermined region of the chromosome as described above SNP information in .
  • the method for determining SNP information in a predetermined region of an embryo's chromosome can effectively and accurately determine SNP information in a predetermined region of an embryo chromosome, and further, the information can be effectively used to determine whether the embryo's genetic state is normal, carried or pathogenic. Therefore, it can provide a basis for pre-implantation monogenic disease detection, prenatal diagnosis of pregnant women or clinical disease treatment.
  • the present invention also provides an apparatus for determining SNP information in a predetermined region of a chromosome.
  • the apparatus comprises: a library construction device, the library construction device being adapted to construct a sequencing library for at least a portion of a chromosome; a library screening device, the library screening device being coupled to the library construction device, and Suitable for screening the sequencing library with a probe, wherein the probe specifically recognizes at least one of known SNP sites in the predetermined region to obtain a target capture fragment, the target capture fragment comprising a SNP position a sequencing device, the sequencing device being coupled to the library screening device, adapted to sequence the sequenced sequencing library to obtain a sequencing result; and an analysis device coupled to the sequencing device and adapted Based on the sequencing result, SNP information in the predetermined area is determined.
  • the above-described method for determining SNP information in a predetermined region of a chromosome of the present invention can be effectively implemented, thereby enabling efficient and accurate determination of SNP information in a predetermined region of a chromosome, for example, by a pathogenic gene of a sample.
  • Mutation site information, and, in turn, the information can have Effectively used to determine whether a subject's genetic status is normal, carried or pathogenic, thereby providing a basis for clinical disease detection or treatment.
  • the invention also proposes a system for determining SNP information in a predetermined region of an embryonic chromosome.
  • the system comprises: a first whole genome acquisition device, the first whole genome acquisition device being adapted to acquire a whole genome of the embryo; and a SNP information determining device, the SNP information determining device and device
  • the first whole genome acquisition device is connected to determine SNP information in a predetermined region of the embryo chromosome, wherein the SNP information determining device is the device for determining SNP information in a predetermined region of the chromosome as described above.
  • the above-described method of determining SNP information in a predetermined region of a chromosome can be efficiently implemented, thereby effectively determining SNP information in a predetermined region of the chromosome, and further, the information can be effectively used to determine the genetic state of the fetus.
  • Normal, carrying or causing disease which can provide a basis for pre-implantation monogenic disease detection, prenatal diagnosis of pregnant women or clinical disease treatment.
  • the invention also provides a computer readable medium.
  • the computer readable medium stores instructions, the instructions being adapted to be executed by the processor to determine SNP information in a predetermined region of the chromosome based on the sequencing result, wherein the sequencing result is through the following Step obtained: constructing a sequencing library for at least a part of a chromosome; screening the sequencing library with a probe, wherein the probe specifically recognizes at least one of known SNP sites in the predetermined region, in order to obtain A target capture fragment, the target capture fragment comprising a SNP site; and sequencing of the sequenced sequencing library to obtain sequencing results.
  • SNP information in a predetermined region of a chromosome can be efficiently determined, for example, information on mutation sites associated with a pathogenic gene of a sample, and further, the information can be effectively used to determine a subject.
  • the genetic status is normal, carried or pathogenic, thus providing a basis for clinical disease detection or treatment.
  • the computer readable medium stores instructions adapted to be executed by a processor to determine a SNP in a predetermined region of the embryo's chromosome for a whole genome of the embryo information.
  • the invention also proposes an apparatus for determining SNP information in a predetermined region of a chromosome.
  • the apparatus comprises: a sequencing device; and the aforementioned computer-readable medium storing instructions adapted to be executed by the processor to determine SNP information in a predetermined region of the chromosome based on the sequencing result.
  • the apparatus of the present invention can accurately and efficiently determine SNP information in a predetermined region of a chromosome, for example, information on mutation sites associated with a pathogenic gene of a sample, and further, the information can be effectively used to determine a genetic state of a subject. It is normal, carried or pathogenic, which can provide a basis for clinical disease detection or treatment.
  • the invention also provides a system for determining SNP information in a predetermined region of an embryonic chromosome.
  • the system comprises: a sequencing device; and the aforementioned computer readable medium storing instructions adapted to be executed by the processor to determine SNP information in a predetermined region of the fetal chromosome for the whole genome of the embryo.
  • the system of the present invention can accurately and efficiently determine SNP information in a predetermined region of an embryonic chromosome, and further, The information can be effectively used to determine whether the genetic state of the embryo is normal, carried or pathogenic, thereby providing a basis for pre-implantation monogenic disease detection, prenatal diagnosis of pregnant women or clinical disease treatment.
  • the haplotype analysis method of the present invention can not only indirectly detect a target site, but also directly detect a target site.
  • the selected SNP locus in the present invention is concentrated in the target gene 1M range, and the density is high and the linkage is tight, which can greatly improve the sensitivity and accuracy of SNP information detection in the target region, and can reduce the detection cost.
  • the invention concentrates multiple target detection sites on one chip, and can detect multiple mutations of various diseases simultaneously based on the obtained SNP information, and does not need to design an experimental scheme differently from person to person, which shortens the detection.
  • the cycle reduces the cost of testing.
  • the invention adopts a chip comprising a plurality of target detection sites to simultaneously detect a plurality of samples, and the detection flux is greatly improved. This provides great technical support for the scaled application of PGD in the future.
  • the method of the present invention in addition to being capable of being used for single-gene genetic disease detection, is capable of simultaneously performing HLA typing and aneuploidy detection, and realizing multiple tests of a single sample, and providing personalized services for related IVF patients.
  • Figure 1 shows a flow chart of an analysis of embryo haplotypes in accordance with one embodiment of the present invention
  • FIG. 2 is a schematic diagram showing a method of determining distinguishing SNPs according to an embodiment of the present invention
  • Figure 3 shows the results of 2100 detection of a constructed library in accordance with one embodiment of the present invention
  • Figure 4 shows a simulation of a haplotype construction in accordance with one embodiment of the present invention
  • Figure 5 is a schematic flow chart showing analysis of embryo haplotype and embryo genetic condition according to one embodiment of the present invention.
  • FIG. 6 is a flow chart showing a method of determining SNP information in a predetermined region of a chromosome according to an embodiment of the present invention
  • Figure 7 is a flow chart showing a method of determining SNP information in a predetermined region of an embryonic chromosome according to an embodiment of the present invention
  • FIG. 8 shows the structure of an apparatus for determining SNP information of a predetermined region of a chromosome according to an embodiment of the present invention.
  • Figure 9 is a diagram showing the structure of a system for determining SNP information in a predetermined region of an embryonic chromosome according to an embodiment of the present invention. Detailed description of the invention
  • first and second are used for descriptive purposes only, and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, features defining “first”, “second” may explicitly or implicitly include one or more of the features. Further, in the description of the present invention, “multiple” means two or more unless otherwise stated.
  • the invention proposes a method of determining SNP information in a predetermined region of a chromosome.
  • the method includes:
  • the chromosome is a whole genome of embryonic cells obtained by whole genome amplification.
  • the method of performing whole genome amplification is not particularly limited, and according to some specific examples of the present invention, whole genome amplification is performed by at least one selected from the group consisting of PEP-PCR, DOP-PCR, OmniPlex WGA, and MDA. One carried out. Thereby, a small amount of embryonic cells can be efficiently amplified, thereby obtaining more embryonic whole genome samples.
  • the probe specifically recognizes at least one of known SNP sites in the predetermined region to obtain a target capture segment, the target capture segment comprising a SNP site.
  • the predetermined region comprises a target gene region and a SNP-marker region.
  • the target gene region comprises at least a portion of an exon and an exon adjacent region of the gene associated with the target disease.
  • the exon adjacent region comprises a region of 50 bp upstream of the 5' end of the exon and a region of 50 bp downstream of the exon; and the SNP-marker region comprises a range of 1 M upstream and downstream of the target gene.
  • the probe has a length of 20 to 200 nt.
  • the length of the probe is 60 to 80 nt.
  • the capture efficiency of the target SNP can be effectively improved.
  • the probe It is provided in the form of a chip. Therefore, by using a chip capable of including a plurality of target detection sites, it is possible to simultaneously detect multiple mutations of various diseases, and it is not necessary to design an experimental scheme differently from person to person, which shortens the detection period and reduces the detection cost; The chip can detect multiple samples at the same time, and the detection throughput is greatly improved.
  • the sequencing is performed using at least one of an SOLiD sequencing system selected from the group consisting of Illumina Hiseq 2000, Genome Analyzer, Miseq Sequencing Systems, Life Technologies, Ion Torrent Sequencing System and Roche 454 Sequencing System.
  • an SOLiD sequencing system selected from the group consisting of Illumina Hiseq 2000, Genome Analyzer, Miseq Sequencing Systems, Life Technologies, Ion Torrent Sequencing System and Roche 454 Sequencing System.
  • determining SNP information in the predetermined region based on the sequencing result further comprises: comparing the sequencing result with a reference sequence to obtain a unique alignment sequence; and using SNP analysis software to The unique alignment sequence acquires SNP information in the predetermined area.
  • the alignment is performed using a BWA software package in accordance with an embodiment of the present invention.
  • the comparison can be achieved quickly and accurately.
  • after obtaining the unique alignment sequence further comprising removing the sequence of PCR repeat extension from the unique alignment sequence. This facilitates subsequent SNP analysis.
  • the kind of SNP analysis software that can be employed according to an embodiment of the present invention is not particularly limited.
  • the SNP analysis software is at least one selected from the group consisting of SAMtools and GATK. Thereby, SNP analysis can be performed quickly and accurately.
  • the filtering of the obtained SNP information is further included.
  • the filtering condition is to remove a SNP that satisfies one of the following conditions: SNP sequencing depth is less than 10 X, preferably less than 20 X; and two base sequencing depths in the hybrid SNP The difference is above 20%, preferably above 10%, more preferably above 5%.
  • SNP sequencing depth is less than 10 X, preferably less than 20 X
  • two base sequencing depths in the hybrid SNP The difference is above 20%, preferably above 10%, more preferably above 5%.
  • the higher the sequencing depth, the closer the heterogeneous SNP sequencing depth ratio is to 1: 1, and the specific value of the sequencing depth and the sequencing depth difference in the SNP filtration condition, and the sample at the time of implementation, sequencing Depth, sequencing quality related, can be adjusted according to actual needs.
  • the embryo genetically related individual has a sequencing depth of 50 X
  • the embryo sample has a sequencing depth of 100 X
  • the sequencing quality is good, so that the remaining SNPs are accurately aligned with the actual SNP, and strictly filtered.
  • the method for determining SNP information in a predetermined region of a chromosome of the present invention can efficiently and accurately determine SNP information in a predetermined region of a chromosome, for example, information on a mutation site related to a pathogenic gene of a sample, and further, Information can be effectively used to determine whether a subject's genetic status is normal, carried or pathogenic, thereby providing a basis for clinical disease detection or treatment.
  • the invention also provides a method of determining SNP information in a predetermined region of an embryonic chromosome.
  • the method comprises: acquiring a whole genome of the embryo; and determining, for the whole genome of the embryo, a predetermined region of the embryo chromosome according to the method for determining SNP information in a predetermined region of the chromosome as described above SNP information in .
  • the method for determining SNP information in a predetermined region of an embryonic chromosome of the present invention specifically includes the following steps: acquiring a whole genome of the embryo; constructing a sequencing library for the whole genome of the embryo The sequencing library is screened by a probe to obtain a target capture fragment; the screened sequencing library is sequenced to obtain a sequencing result; based on the sequencing result, SNP information in a predetermined region of the embryo chromosome is determined.
  • the method for determining SNP information in a predetermined region of an embryonic chromosome of the present invention can effectively and accurately determine SNP information in a predetermined region of an embryonic chromosome, and further, the information can be effectively used to determine whether the genetic state of the fetus is normal, carried or pathogenic Therefore, it can provide a basis for pre-implantation monogenic disease detection, prenatal diagnosis of pregnant women or clinical disease treatment.
  • the whole genome of the embryo is obtained by whole genome amplification of embryonic cells.
  • the specific implementation method of whole genome amplification is not particularly limited.
  • whole genome amplification is selected from the group consisting of PEP-PCR, DOP-PCR, OmniPlex WGA and MDA At least one of them is carried out. Thereby, a small amount of embryonic cells can be efficiently amplified, thereby obtaining a larger whole genome sample of embryonic cells.
  • the method for determining SNP information in a predetermined region of an embryonic chromosome of the present invention further comprises:
  • the embryonic genetic related individual includes a father, a mother, and a proband of the embryo.
  • proband refers to a patient diagnosed with the disease-causing gene and exhibiting the symptoms of the disease, and is an organism having a genetic relationship with the aforementioned embryo, either The embryo or fetus can also be an individual after birth.
  • the father's SNP information based on the whole genome of the embryonic genetically related individual, the father's SNP information, the mother's SNP information, and the proband's SNP information are determined, respectively.
  • a distinguishing type SNP is determined based on the SNP information of the father and the SNP information of the mother.
  • the term "differentiated SNP" as used herein refers to a base which can effectively distinguish a parental haplotype, that is, one of the four bases of a parent at one position (autosomal) Different from other bases at this position, the base can determine the only one of the four haplotypes of the parents. For example, if the parental genotype of each position is AA, AG, then the G base is a differentiated SNP, because At this position G, a single haplotype can be determined, and A is present in the other three haplotypes, and the unique haplotype cannot be determined.
  • Figure 2 shows a schematic diagram of the method for determining parental SNPs based on Mendelian genetic principles.
  • a father SNP haplotype and a mother SNP haplotype are determined based on the distinguishing SNP and the SNP information of the proband. That is, based on the distinguishing SNP and the proband SNP, respectively constructing a first father haplotype and a second father haplotype for the two chromosomes corresponding to the predetermined region in the father and mother genomes, respectively.
  • the first mother haplotype and the second mother haplotype are used for the determination of subsequent embryo haplotypes.
  • the father SNP haplotype comprises a first father haplotype and a second father haplotype
  • the mother SNP haplotype comprising a first mother haplotype and a second mother haplotype
  • the first The father haplotype, the second father haplotype, the first mother haplotype, and the second mother haplotype are composed of the distinguishing SNPs.
  • the parent SNP-haplotype can be constructed according to the Mendelian genetic principle and the linkage exchange law, combined with the parental SNP locus and the proband SNPs information, and the construction principle is shown in FIG. 4 .
  • the SNP-haplotype consists entirely of distinguishing SNP position bases, each of which contains a plurality of distinguishing SNPs, and the distinguishing SNPs in the haplotype can be distinguished from other haplotypes.
  • the parental genotype of a certain position is AA, AG
  • G is a differentiated SNP
  • A is a non-differentiated SNP
  • a and G are the bases of the haplotype. Since the two haplotypes of the proband are inherited from the parents, the haplotype in which the mutation is located can be determined according to the disease.
  • the haplotype inherited by the proband from the father is the haplotype where the disease is the mutation; if the recessive genetic disease, the parents are carriers, the proband
  • Both haplotypes of the disease (disease) are haplotypes in which the disease is mutated.
  • the father SNP haplotype and the maternal SNP haplotype can be effectively determined, and based on the SNP information of the embryo, the father SNP haplotype and the mother SNP haplotype, The SNP haplotype of the embryo is efficiently determined.
  • the combination of the father SNP haplotype and the maternal SNP haplotype is determined to obtain the SNP haplotype of the embryo. . That is, determining the SNP type in the predetermined region of the fetal chromosome based on the SNP information of the embryo and the aforementioned first father haplotype, second father haplotype, first mother haplotype, and second mother haplotype. And determining the SNP haplotype of the embryo.
  • the SNP haplotype of the embryo is obtained by determining the father haplotype of the embryo that is significantly supported by the SNP information of the embryo as the paternal source haplotype of the embryo; and determining the SNP information of the embryo A significantly supported maternal haplotype is used as the maternal source haplotype of the embryo.
  • the number of the distinguishing SNPs is not less than 10, which is an indication of significant support.
  • the SNPs can be analyzed according to the information of the embryonic SNPs and the combination of the two haplotypes of the embryonic SNPs. 4 is shown.
  • the statistical calculation of the number of distinguishing SNPs can be used, and the embryo haplotype is determined according to the numerical value.
  • the specific process is shown in FIG. 5 .
  • the number of single-type distinguishing SNPs is greater than 10
  • it can be determined that the haplotype is one of the haplotypes of the embryo; if the number of singular-type SNPs is less than 4, the monomer can be judged. Type is caused by a SNP error.
  • the number of SNP supports of a correct haplotype is set to be no less than 10, and the number of haplotype SNP supports is not more than 3, because the previously set SNP filtration conditions are more stringent, that is, the correct rate of SNP used in haplotype construction is higher, and the number of candidate SNPs is large,
  • the test data shows that the number of SNPs supported by the correct haplotype is much higher than 10, and the number of SNPs supported by the wrong haplotype is generally zero.
  • the present invention it has been verified that for an autosomal disease, only two haplotypes satisfying the requirements can be obtained per embryo by the method of the present invention; for an X chromosome disease, by the method of the present invention Analysis, one (male) or two (female) can be obtained to meet the required haplotype.
  • the SNP haplotype of the embryo can be accurately and efficiently determined, and the genetic state of the embryo can be effectively determined. That is, the method can effectively determine whether the embryo inherits the pathogenic haplotype of the parent according to the parent haplotype constructed by the simulation, thereby judging whether the embryo's genetic state is normal, carrying or causing disease.
  • the present invention also provides an apparatus for determining SNP information in a predetermined region of a chromosome.
  • the apparatus 1000 includes a library construction device 100, a library screening device 200, a sequencing device 300, and an analysis device 400.
  • the library construction device 100 is adapted to construct a sequencing library for at least a portion of a chromosome; a library screening device 200 is coupled to the library construction device 100 and is adapted to screen the sequencing library with a probe, Wherein the probe specifically recognizes at least one of known SNP sites in the predetermined region to obtain a target capture segment, the target capture segment comprising the SNP site; sequencing device 300 and the library screening device 200 connected, suitable for sequencing the sequenced sequencing library to obtain sequencing results; the analysis device 400 is coupled to the sequencing device 300 and is adapted to determine SNP information in the predetermined region based on the sequencing result.
  • the above-described method for determining SNP information in a predetermined region of a chromosome of the present invention can be effectively implemented, thereby enabling efficient and accurate determination of SNP information in a predetermined region of a chromosome, for example, by a pathogenic gene of a sample.
  • the mutation site information in turn, can be effectively used to determine whether the subject's genetic state is normal, carried or pathogenic, thereby providing a basis for clinical disease detection or treatment.
  • the predetermined area comprises a target gene region and a SNP-marker region.
  • the target gene region comprises at least a portion of an exon and an exon adjacent region of a gene associated with the target disease.
  • the exon adjacent region comprises a region 50 bp upstream from the 5′ end of the exon and a region 50 bp downstream of the exon;
  • the SNP-marker region includes 1 M upstream and downstream of the target gene The scope.
  • the probe has a length of 20 to 200 nt.
  • the length of the probe is 60 to 80 nt.
  • the probe is provided in the form of a chip.
  • the chromosome preparation device is connected to the library construction device 100, and is adapted to obtain an embryonic cell whole genome by whole genome amplification, The whole genome of the embryonic cells constitutes at least a portion of the chromosome.
  • the chromosome preparation device is adapted to perform the whole genome amplification by at least one selected from the group consisting of PEP-PCR, DOP-PCR, OmniPlex WGA and MDA.
  • the DNA extraction device is connected to the library construction device 100, and is adapted to obtain DNA extraction from peripheral blood of the living body to obtain At least a portion of the chromosome.
  • the sequencing device 300 is at least one selected from the group consisting of Illumina Hiseq 2000, Genome Analyzer, Miseq sequencing series, Life technologies' SOLiD sequencing system, Ion Torrent sequencing system and Roche's 454 sequencing system.
  • the analyzing device 400 further includes: a comparing unit, the comparing unit is adapted to compare the sequencing result with a reference sequence to obtain a unique alignment sequence; and a SNP information acquiring unit And the SNP information acquiring unit is connected to the comparison unit, and is adapted to acquire SNP information in the predetermined area from the unique alignment sequence by using SNP analysis software.
  • the comparison unit is adapted to perform the comparison using a BWA software package.
  • the analysis means further comprises means adapted to remove the sequence of PCR repeat extensions from the unique alignment sequence.
  • the SNP analysis software is at least one selected from the group consisting of SAMtools and GATK.
  • the analysis device 400 further comprises means adapted to filter the obtained SNP information.
  • the filtering condition is to remove a SNP that satisfies one of the following conditions: SNP sequencing depth is less than 10 X, preferably less than 20 X ; and the difference in sequencing depth between the two bases in the hybrid SNP is higher than 20%, preferably more than 10%, more preferably more than 5%.
  • each device of the device can implement the corresponding steps in the method for determining the SNP information of the predetermined region of the chromosome of the present invention, and the foregoing description of the advantages and effects of the method for determining the SNP information in the predetermined region of the chromosome is also applicable to the device. , will not repeat them here.
  • the invention also proposes a system for determining SNP information in a predetermined region of an embryonic chromosome.
  • the system 10000 includes: a first whole genome acquisition device 2000, and a SNP information determination device 1000, the first whole genome acquisition device 2000 being adapted to acquire a whole genome of the embryo;
  • the SNP information determining device 1000 is connected to the first genome-wide acquiring device for determining SNP information in a predetermined region of the fetal chromosome, wherein the SNP information determining device 1000 is a predetermined region for determining a chromosome as described above.
  • Device 1000 for SNP information for SNP information.
  • the above-described method of determining SNP information in a predetermined region of a chromosome can be efficiently implemented, thereby enabling effective and accurate determination of SNP information in a predetermined region of an embryonic chromosome, and further, the information can be effectively used for Determining the genetic status of the fetus is normal, carrying or causing disease, which can provide a basis for pre-implantation monogenic disease detection, prenatal diagnosis of pregnant women or clinical disease treatment.
  • the first whole genome acquisition device 2000 is adapted to obtain a whole genome of the embryo by whole genome amplification of the embryonic cells.
  • the first whole genome acquisition device 2000 is adapted to obtain at least one selected from the group consisting of PEP-PCR, DOP-PCR, OmniPlex WGA and MDA. The whole genome of the embryo.
  • the system 10000 further includes: a second whole genome acquisition device (not shown), wherein the second whole genome acquisition device is adapted to acquire a whole genome of an embryo genetically related individual, wherein The embryo genetically related individual includes a father, a mother, and a proband of the embryo; a distinguishing SNP determining device (not shown) adapted to be based on the father's SNP information and the mother SNP information, determining a distinguishing SNP; a first haplotype determining device (not shown), the first haplotype determining device being adapted to be based on the distinguishing SNP and the SNP information of the proband Determining a father SNP haplotype and a mother SNP haplotype; and a second haplotype determining device (not shown) adapted to be based on SNP information of the embryo, father The SNP haplotype and the maternal SNP haplotype are determined by recombination of the father SNP haplotype and the maternal SNP haplotype to obtain the
  • the second haplotype determining apparatus further comprises: determining a father haplotype in which the SNP information of the embryo is significantly supported as a unit of the paternal source haplotype of the embryo; and determining the SNP information of the embryo is significant
  • the supported maternal haplotype is the unit of the haplotype derived from the mother of the embryo.
  • the number of distinguishing SNPs of not less than 10 is an indication of significant support.
  • each device included in the above system can implement the corresponding steps in the method for determining the SNP information of the predetermined region of the chromosome of the present invention, and the foregoing description of the advantages and effects of the method for determining the SNP information in the predetermined region of the embryonic chromosome is also applicable to This system will not be described here.
  • Computer readable medium
  • the invention also provides a computer readable medium.
  • the computer readable medium stores instructions, the instructions being adapted to be executed by the processor to determine SNP information in a predetermined region of the chromosome based on the sequencing result, it being understood that, when the program is executed, All or part of the steps of determining a chromosome including a predetermined region of the embryonic chromosome SNP information may be performed by instructing related hardware, and the computer readable medium may include: a read only memory, a random access memory, a magnetic disk or an optical disk, or the like.
  • the sequencing result is obtained by: constructing a sequencing library for at least a part of the chromosome; screening the sequencing library with a probe, wherein the probe specifically recognizes the known region At least one of the SNP sites to obtain a target capture fragment, the target capture fragment comprising a SNP site; and sequencing the sequenced sequencing library to obtain sequencing results.
  • the predetermined area comprises a target gene region and a SNP-marker region.
  • the target gene region comprises at least a portion of an exon and an exon adjacent region of the gene associated with the target disease.
  • the exon adjacent region includes a range of 50 bp upstream and downstream of the exon; and the SNP-marker region includes a range of 1 M upstream and downstream of the target gene.
  • the probe has a length of 20 to 200 nt. Preferably, the length of the probe is 60 to 80 nt.
  • the probe is provided in the form of a chip.
  • At least a portion of the chromosome is a whole genome of embryonic cells obtained by whole genome amplification.
  • whole genome amplification is performed by at least one of PEP-PCR, DOP-PCR, OmniPlex WGA and MDA.
  • At least a portion of the chromosome is obtained by DNA extraction of peripheral blood of the organism.
  • the sequencing was performed according to an embodiment of the invention using Illumina Hiseq 2000, Genome Analyzer, Miseq sequencing series, Life technologies' SOLiD sequencing system, Ion Torrent sequencing system, Roche's 454 sequencing system.
  • determining SNP information in the predetermined region based on the sequencing result further comprises: comparing the sequencing result with a reference sequence to obtain a unique alignment sequence; and using SNP analysis software to The unique alignment sequence acquires SNP information in the predetermined area.
  • the alignment is performed using a BWA software package.
  • after obtaining the unique alignment sequence further comprising removing the sequence of PCR repeat extension from the unique alignment sequence.
  • the SNP analysis software is at least one selected from the group consisting of SAMtools and GATK.
  • the filtering of the obtained SNP information is further included.
  • the filtering condition is to remove a SNP that satisfies one of the following conditions: SNP sequencing depth is less than 10 X , preferably less than 20 X; and the difference in sequencing depth between the two bases in the hybrid SNP is higher than 20%, preferably more than 10%, more preferably more than 5%.
  • SNP sequencing depth is less than 10 X , preferably less than 20 X
  • the difference in sequencing depth between the two bases in the hybrid SNP is higher than 20%, preferably more than 10%, more preferably more than 5%.
  • the embryo genetically related individual has a sequencing depth of 50 X
  • the embryo sample has a sequencing depth of 100 X
  • the sequencing quality is good, so that the remaining SNPs are accurately aligned with the actual SNP, and strictly filtered.
  • filter can be set higher than 20% heterozygous SNP.
  • At least a portion of the chromosome is a whole genome of an embryo such that SNP information in a predetermined region of the fetal chromosome is determined for the whole genome of the fetus.
  • the instructions are further adapted to be executed by a processor to: acquire a whole genome of an embryo genetically related individual, wherein the embryonic genetically related individual comprises a father, a mother, and a proband of the embryo And determining a SNP information of the father based on the whole genome of the embryo genetically related individual, the mother's SNP information and SNP information of the proband; determining a differentiated SNP based on the SNP information of the father and the SNP information of the mother; determining the father based on the distinguishing SNP and the SNP information of the proband a SNP haplotype and a maternal SNP haplotype; and determining a combination of the father SNP haplotype and the maternal SNP haplotype based on the SNP information of the embryo, the father SNP haplotype, and the maternal SNP haplotype, In order to obtain the SNP haplotype of the embryo.
  • the SNP haplotype of the embryo is obtained by: determining that the SNP information of the embryo significantly supports the father haplotype as the paternal source haplotype of the embryo; and determining the embryo
  • the SNP information significantly supports the maternal haplotype as the maternal source haplotype of the embryo.
  • the number of the distinguishing SNPs is not less than 10, which is an indication of significant support.
  • the invention also proposes an apparatus for determining SNP information in a predetermined region of a chromosome.
  • the apparatus comprises: a sequencing device; and the aforementioned computer-readable medium storing instructions adapted to be executed by the processor to determine SNP information in a predetermined region of the chromosome based on the sequencing result.
  • the apparatus of the present invention can accurately and efficiently determine SNP information in a predetermined region of a chromosome, for example, information on mutation sites associated with a pathogenic gene of a sample, and further, the information can be effectively used to determine a genetic state of a subject. It is normal, carried or pathogenic, which can provide a basis for clinical disease detection or treatment.
  • the computer readable medium stores instructions adapted to be executed by a processor to determine a SNP in a predetermined region of the fetal chromosome for the whole genome of the fetus information.
  • the invention also provides a system for determining SNP information in a predetermined region of an embryonic chromosome.
  • the system comprises: a sequencing device; and the aforementioned computer readable medium storing instructions adapted to be executed by the processor to determine SNP information in a predetermined region of the fetal chromosome for the whole genome of the fetus.
  • the system of the invention can accurately and effectively determine the SNP information in the predetermined region of the embryo chromosome, and further, the information can be effectively used to determine that the genetic state of the fetus is normal, carried or pathogenic, thereby enabling preimplantation of a single gene for the embryo.
  • the capture chip designed by the present invention comprises two parts, one part is a target gene region; the other part is a SNP-marker area.
  • the target gene region is mainly the exon and the exon-intron junction region, which covers most of the pathogenic mutations and can be used for direct detection of disease mutations.
  • the SNP-marker region is the upstream and downstream region of the target gene region, which contains thousands of high-frequency SNPs (that is, SNPs with a frequency greater than 0.3 in the 1000-person database). This region is used to detect parental differential SNPs, combined with the family.
  • the proband SNP information constructs the disease-causing gene haplotype.
  • the SNP-haplotype of the gene is affected.
  • the recombination rate is less than 1% (the human recombination rate is 1% per 1M area).
  • the range of SNP-marker regions contained in the chip capture can be determined based on the general recombination rate of the human genome.
  • the range of upstream and downstream of the target gene region is generally selected to be small, and the captured SNP is accurate, but the number is small, and the range of selection is large.
  • the SNP-marker region is limited to 1M upstream and downstream of the target gene, thereby reducing the probability of recombination of the target gene region and the SNP-marker region. To one ten thousandth.
  • H g 19 as a reference sequence to determine the location of a target gene, to finalize the capture region.
  • the SNP loci with higher frequency in the population were selected within 1M distance from the upstream and downstream of the position. Having the selected SNP site located in the middle of the target capture segment is advantageous for increasing the probability of the SNP being captured.
  • the capture fragment size of the capture probe is mainly About 200 bp
  • the SNP-marker capture region is the region of these SNP sites and about 100 bp above and below (so that the selected SNP is located at 1/2 200 bp).
  • the probe is specifically evaluated by the Sequence Search and Alignment by Hashing Algorithm (SSAHA), and the chip is synthesized after the evaluation is passed.
  • SSAHA Sequence Search and Alignment by Hashing Algorithm
  • Embryonic cell genomes were collected and whole-genome amplification (WGA) of embryonic cells was performed using PEP-PCR, DOP-PCR, OmniPlex WGA or MDA (multiple strand displacement amplification) methods, and parental and proband peripheral blood was extracted (or Samples of other family members of the family were collected according to the type of disease) DNA.
  • WGA whole-genome amplification
  • the libraries obtained above were mixed, and the mixed library was hybridized with the designed capture probe, and the hybridization procedure was followed by the technical procedure provided by the chip synthesis service company.
  • Sequencing was performed using Illumina Hiseq2000, Genome Analyzer, Miseq sequencing system ij ij , Life technologies' SOLiD sequencing system, Ion Torrent sequencing system or Roche's 454 sequencing system.
  • the analysis process includes:
  • the low-quality sequencing data is filtered out, the sequence containing the library linker is removed, and the sequencing data is compared with the human reference genome by using analysis software such as BWA (Burrows Wheeler Aligner) software package, according to the default optimality.
  • analysis software such as BWA (Burrows Wheeler Aligner) software package, according to the default optimality.
  • the parameters (-1 -i 15 -L -k 2 -1 31 -t 4) were compared in the alignment result to the read of the chip target region and the sequence of the PCR repeat extension was removed by SAMtools for subsequent analysis.
  • SNP analysis software such as SAMtools and GATK are used for analysis to obtain all SNP information in the target area.
  • the SNP obtained above is filtered under certain conditions to improve the accuracy of the SNP.
  • the filtration conditions are: Filter out any of the following conditions: 1.
  • the SNP sequencing depth is less than 10 X; 2.
  • the difference in sequencing depth between the two bases in the hybrid SNP is higher than 10%. This is because the low sequencing depth may result in the failure of one of the bases in the partially heterozygous SNP.
  • the difference in the depth of the two bases in the heterozygous SNP may not be correctly distinguished from the sequencing error. . Filtering by the above conditions can remove potentially erroneous SNPs.
  • a distinguishing SNP means that one of the four bases of the parent at a certain position (the autosome) is different from any other base at the position, and the base can be determined in the four haplotypes of both parents.
  • a specific example is shown in Figure 2. According to the requirements of the figure, the parental distinguishing SNPs can be selected according to the Mendelian genetic principle. 6.5. Building parental haplotypes
  • the parental SNP locus and the proband SNPs were combined to construct the parent SNP-haplotype.
  • the construction principle is shown in Figure 4, which firstly combines the parental SNPs locus information. And the proband SNPs information, constructing the parental haplotype according to the basic Mendelian genetic principle and the chain exchange law; then combining the parental haplotype results and embryonic SNPs information to predict the embryo haplotype results.
  • the red-marked base letter indicates the father's distinguishing SNPs; the yellow-marked base letter indicates the mother's SNPs; the italicized and underlined base letters indicate that the site is in WGA ADO occurs during the process; G* indicates the pathogenic mutant base; -- indicates the site where the test failed.
  • the SNP-haplotype consists entirely of distinguishing SNP position bases, each of which contains a plurality of distinguishing SNPs, and the distinguishing SNPs in the haplotype can be distinguished from other haplotypes.
  • the parental genotypes of a certain position are AA, AG, G is a differentiated SNP, A is a non-differentiated SNP, and A and G are the bases in the haplotype, respectively.
  • the haplotype of the disease-causing mutation can be determined according to the disease. If the dominant genetic disease, the father is sick, the mother is normal, the haplotype inherited by the proband from the father is the haplotype where the disease is the mutation; if the recessive genetic disease, the parents are carriers, the proband Both haplotypes of the disease (disease) are haplotypes in which the disease is mutated.
  • the analysis can be based on the information of the embryonic SNPs combined with the parent SNP-haplotype, and the combination of the two haplotypes of the embryonic SNPs is determined.
  • the analysis principle is shown in Fig. 4. .
  • the number of differentiated SNPs can be statistically calculated, and the embryo haplotype is determined according to the numerical value, as shown in Fig. 5.
  • the haplotype is one of the haplotypes of the embryo; if the number of singular-type SNPs is less than 4, the haplotype can be judged to be a SNP error;
  • the number of SNP supports of a correct haplotype is set to be no less than 10, and the number of SNPs supported by the haplotype is not more than 3, as set in the 6.3 step.
  • SNP filtration conditions are more stringent, that is, the correct rate of SNP used in haplotype construction is higher, and the number of candidate SNPs is large.
  • the actual test data indicates that the number of SNPs supported by the correct haplotype is much higher than 10, and the number of incorrect haplotype SNPs is supported. Usually 0.
  • For an autosomal disease after this process analysis, only 2 haplotypes can be obtained for each embryo; for an X-chromosome disease, one (male) or two (female) can be obtained through this procedure. The required haplotype.
  • the genetic state of the embryo is judged to be normal, carried or pathogenic depending on whether the embryo is genetically parental.
  • a general method and a detection procedure are used for a phenylketonuria (classic) family (family-type, autosomal recessive) sample and a fertility progressive muscular dystrophy (DMD) family (family two) , X chromosome recessive inheritance) samples were tested.
  • a couple of families obtained 7 embryos by IVF and used MF-PCR method for PAH Gene detection, screening of 2 normal embryo implantation, and finally obtaining a baby girl, the umbilical cord blood gene test confirmed that the baby girl is normal.
  • the two couples obtained 9 embryos through IVF, and used the MF-PCR method to carry out the DMD gene PGD. Three normal embryos were selected, and two of them were selected. Finally, a male baby (one of which was not developed) was passed through the umbilical cord. Blood genetic testing confirmed that the baby was normal.
  • the family is the same as the parent, the sick daughter (proband) peripheral blood and 7 embryo blastomere single cells.
  • the father is a carrier of PAH gene R243Q (c.728G>A)
  • the mother is a carrier of PAH gene V399V (C.1197A>T) mutation
  • the proband is PAH gene R243Q (c.728G>A )
  • Compound mutation with V399V (C.1197A>T) which is characterized by phenylketonuria.
  • Seven embryo blastomere single cells (labeled Ell, E12, E13, E14, E15, E16, E17, respectively) were tested by multiplex PCR after WGA. The results are shown in Table 1.
  • the second family sample included parents, daughter (normal phenotype) peripheral blood and 9 embryo blastomere single cells.
  • the father was normal, and the mother and daughter were carriers of the DMD gene R2905X (c. 8713C>T).
  • Nine embryonic blastomeres (labeled E21, E22, E23, E24, E25, E26, E27, E28, E29) were tested by multiplex PCR after WGA. The results are shown in Table 2.
  • the above samples were retrospectively tested by using the technical scheme and the detection procedure of the present invention, and the obtained test results were consistent with the MF-PCR detection results, and the result coincidence rate was 100%.
  • the results show that the technology of the present invention can accurately detect the SNP information of the predetermined region of the embryo chromosome, and further detect the embryo genotype to guide the embryo implantation based on the obtained SNP information, and has a short detection period (11 days), high throughput, low cost.
  • the specific implementation is as follows:
  • the DNA samples and WGA products obtained above were first interrupted with a CovarisTM interrupter to a fragment of 200 bp, and then constructed according to the requirements of the illumia® HiSeq2000TM sequencer. The specific steps are as follows:
  • the purified product was subjected to 37.5 ⁇ , and the end-repair reaction was carried out, and the system was as follows (reagents were purchased from Enzymatics):
  • the reaction conditions were: Thermomixer 20 ° C warm bath for 30 min.
  • reaction product was recovered by Qiagen DNA Purification Kit and dissolved in 32 ⁇ M of hydrazine.
  • the reaction conditions were: Thermomixer at 37 ° C for 30 min.
  • reaction product was recovered and purified by Qiagen DNA Purification Kit (QIAGEN) and dissolved in 38 ⁇ l of EB.
  • QIAGEN Qiagen DNA Purification Kit
  • the reaction conditions were as follows: Thermomixer 16 ° C bath for 16 h.
  • the reaction product was purified by 60 ul of Ampure Beads (Beckman Coulter Genomics) and dissolved 20 ⁇ M.
  • Hybrid Library Hybridization was performed using NimbleGen's custom-made liquid phase chip SeqCap EZ Choice XL Library (see Nimblegen SeqCap EZ Exome Capture Operating Instructions for specific procedures). After 72 hours of hybridization, elution was performed using the NmibleGenwashkit according to the instructions. The final eluted product was subjected to enrichment detection, Qpcr and 2100 detection.
  • sequencing data is first subjected to mass filtration and removal of the contaminant-contaminated sequence, high-quality sequencing reads. Perform the following analysis:
  • the sequencing reads were aligned to the human reference genome (HG19, NCBI release GRCh37) using the comparison software BWA (version 0.5.10), and the parameter was set to (-1 -i 15 -L -k 2 - 1 31 -t 4), the only comparison in the alignment results to the target region of the chip and the SAMtools removal PCR repeat extension sequence for subsequent analysis.
  • the amount of data obtained by sequencing is shown in (Table 4).
  • peripheral blood samples of parents and probands were sequenced to a depth of approximately 100x, and the embryonic cell WGA samples were sequenced to a depth of approximately 50 ⁇ . Then, a sample SNP and indel analysis were performed using the Genome Analysis Toolkit (GATK) software package to obtain the genotype of each sample. Part of the gene region genotypes are shown in (Table 5, Table 6):
  • SNP information corresponds to the antisense strand of the reference genome. - Indicates that SNP is not available at this point (no data coverage or depth is too low), and italics indicate disease-causing mutations.
  • the 103237426 coordinates and the 103246707 coordinates in the table correspond to the V399V (C.1197A>T) and R243Q (c.728G>A) sites in the PAH database.
  • the antisense strand information of the two mutation sites has been changed to the formal representation of the corresponding sense strand.
  • Parental haplotypes can be constructed according to the SNP information of parents and probands according to the method shown in Figure 4 above, including the haplotypes in which the disease-causing mutations are located.
  • Tables 7 and 8 show the haplotypes of PAH and DMD genes, respectively. Construct.
  • F-Hapl and F-Hap2 respectively represent the father's two haplotypes
  • M-Hapl and M-Hap2 represent the mother's two haplotypes, respectively.
  • This SNP information corresponds to the negative strand of the reference genome.
  • - Indicates that there is no SNP (no data coverage or too low depth) and italic mutations.
  • the 103237426 coordinates and 103246707 coordinates in the table correspond to the V399V (c.ll97A>T) and R243Q (c.728G>A) sites in the PAH database.
  • the antisense strand information of the two mutation sites has been changed to the form representation of the corresponding sense strand.
  • the F-Hap in the table indicates the father haplotype (the male has only one X chromosome), M-Hapl and M-Hap2 indicate the mother's two haplotypes respectively.
  • the italic is the pathogenic mutation.
  • the coordinates of 32456388 in the table correspond.
  • the embryo-disaggregated SNPs were counted according to the method shown in Fig. 4, and then the embryos were judged according to the number of SNPs supported by each haplotype.
  • the haplotype is used to determine whether the embryo is ill. For autosomes, an embryo has only 2 haplotypes, and generally only two haplotypes have SNP support, but occasionally a 3rd or 4th haplotype occurs, which is due to a SNP error.
  • the SNP is less than 5% in the total SNP. Furthermore, due to the existence of ADO and sequencing errors, there may be individual SNP loss or error in the embryonic SNP.
  • haplotype with at least 10 differentiated SNPs to support.
  • the large amount of data in this embodiment shows that the wrong haplotypes support no more than three distinct SNPs, and the correct haplotypes support more than 20 differentiated SNPs, indicating that individual errors will not affect. Embryo haplotype judgment. Therefore, in order to ensure accurate results, the present invention defines the number of SNP supports of the correct haplotype to be no less than 10, and the number of SNPs of the wrong haplotype is not more than three. The specific analysis process is shown in Figure 5.
  • Figure 5 shows the embryonic state analysis process for a chromosomal recessive genetic disease in which the parent's Hapl is the haplotype of the disease-causing mutation.
  • the individual embryos shown in the figure show that the SNP supports the third haplotype, but the number of SNPs supported is very small and does not affect the judgment of the results.
  • the embryo status can be judged from the above analysis results, as shown in Table 9. This result is consistent with the results of the traditional method of MF-PCR, and the coincidence rate is 100%. .
  • the above process development software is automatically completed.
  • the method, system and computer readable medium of the present invention for determining SNP information in a predetermined region of an (embryo) chromosome can be effectively used to determine SNP information in a predetermined region of a chromosome, such as SNP information in a predetermined region of an embryonic chromosome, and the accuracy of the information High, can be effectively used to determine the genetic status of the fetus is normal, carrying or causing disease, which can provide a basis for pre-implantation monogenic disease detection, prenatal diagnosis of pregnant women or clinical disease treatment.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention provides a method, system, and computer-readable medium for determining SNP information in a predetermined chromosomal region. The method for determining SNP information in a predetermined chromosomal region comprises: constructing a sequencing library for at least one part of a chromosome; using a probe to screen said sequencing library; the probe specifically identifying at least one of the known SNP sites in said predetermined region so as to obtain a target capture fragment, said target capture fragment including the SNP site; sequencing the screened sequencing library so as to obtain sequencing results; and determining on the basis of the sequencing results the SNP information in said predetermined region.

Description

确定染色体预定区域中 SNP信息的方法、  a method of determining SNP information in a predetermined region of a chromosome,
系统和计算机可读介质 优先权信息  System and computer readable media priority information
无 技术领域  No technical field
本发明涉及生物医学领域, 具体而言, 涉及确定染色体预定区域中 SNP信息的方法、 系统和计算机可读介质。 背景技术  The present invention relates to the field of biomedicine and, in particular, to a method, system and computer readable medium for determining SNP information in a predetermined region of a chromosome. Background technique
世界卫生组织 2012全球出生缺陷防治报告显示, 全球出生缺陷总发生率为 3%, 每年 有 320万出生缺陷患儿出生, 其中 27万新生儿因出生缺陷而死亡。 研究表明, 绝大部分出 生缺陷与遗传因素有关, 染色体异常与单基因遗传病是两个重要原因。 其中, 单基因遗传 病种类众多, 发病率各有不同, 且这些疾病绝大多数无法治愈, 给整个社会和家庭带来沉 重的经济和心理负担。 因此防止单基因遗传病患儿的发生和减少遗传病患儿的出生是遗传 性出生缺陷防控的重点。 胚胎植入前诊断 (Preimplantation Genetic Diagnosis,PGD) 技术可 从根源上阻断遗传病的发生和传递, 将出生缺陷的预防提前到胚胎阶段。 然而, 单基因遗 传病的植入前诊断并未广泛应用, 至今为止世界上才几千例报道。 究其原因, 主要是由于 标本量少 (仅 1〜2个细胞), 容易产生等位基因脱扣 (ADO)和污染, 检测较为困难, 现 有的检测技术无法完全满足单基因遗传病植入前诊断的临床需求。  The World Health Organization's 2012 Global Birth Defect Prevention Report shows that the global incidence of birth defects is 3%, with 3.2 million birth defects per year, of which 270,000 newborns die from birth defects. Studies have shown that most of the birth defects are related to genetic factors, and chromosomal abnormalities and monogenic genetic diseases are two important reasons. Among them, there are many types of monogenic genetic diseases, and the incidence rates are different, and most of these diseases cannot be cured, which brings a heavy economic and psychological burden to the whole society and families. Therefore, prevention of the occurrence of children with monogenic genetic diseases and reduction of the birth of children with genetic diseases are the focus of prevention and control of hereditary birth defects. Preimplantation Genetic Diagnosis (PGD) technology can block the occurrence and transmission of genetic diseases from the roots, and advance the prevention of birth defects to the embryonic stage. However, pre-implantation diagnosis of single-gene borne diseases has not been widely applied, and so far thousands of cases have been reported in the world. The reason is mainly due to the small amount of specimens (only 1~2 cells), easy to cause allele tripping (ADO) and pollution, the detection is more difficult, the existing detection technology can not fully meet the single genetic disease implant Clinical requirements for pre-diagnosis.
胚胎植入前单体型分析是目前植入前单基因病检测的主要方法。 该方法通过检测突变 位点和多个与其连锁的 STR (或 SNP)来确定突变连锁单体型, 降低了等位基因扩增不平、 ADO及污染的影响。 多重荧光 PCR技术 (MF-PCR) 是基于该方法最常用的技术。 由于多 重 PCR技术具备荧光 PCR高灵敏的特点, 同时又结合了多个连锁 STR进行突变位点的单 体型分析, 一度被认为是植入前单基因病诊断的金标准。 但是该方法使用的连锁标记太少, 具体到个别临床案例时, 甚至会出现没有连锁标记可用的情况。 所以在每次临床检测前, 都需要进行预试验来为患者寻找和选择合适的分子标记。另外, MF-PCR使用的连锁标记通 常离致病位点比较远, 会因为染色体重组事件而带有一定的误诊风险。  The haplotype analysis before embryo implantation is the main method for the detection of monogenic diseases before implantation. This method determines mutational haplotypes by detecting mutation sites and multiple STRs (or SNPs) linked to them, reducing the effects of allelic amplification, ADO, and contamination. Multiplex PCR (MF-PCR) is the most commonly used technique based on this method. Because of the high sensitivity of fluorescent PCR, and the combination of multiple linked STRs for single-type analysis of mutation sites, it was once considered the gold standard for the diagnosis of pre-implantation monogenic diseases. However, there are too few linkage markers used in this method, and even in individual clinical cases, there may even be cases where no linkage markers are available. Therefore, before each clinical test, a pre-test is needed to find and select the appropriate molecular marker for the patient. In addition, the linkage markers used in MF-PCR are often far from the pathogenic site and may have a risk of misdiagnosis due to chromosomal recombination events.
SNP-army是在全基因组区域对 SNP位点进行检査分析, SNP密度高, 数量多。该方法 的优点是几乎适用于所有样本的单体型分析, 不需要预试验为个别样本选择分子标记。 另 夕卜, 该芯片可以同时检测多种疾病。 但是该芯片只能通过单体型分析的方法进行间接检测, 而不能对致病位点进行直接检测。 SNP-army is an analysis of SNP loci in the whole genome region, and the SNP density is high and the number is large. The advantage of this method is that it is suitable for haplotype analysis of all samples, and no pre-test is required to select molecular markers for individual samples. In addition, the chip can detect multiple diseases at the same time. However, the chip can only be indirectly detected by haplotype analysis. It is not possible to directly detect the site of the disease.
因而, 目前确定染色体尤其是胚胎染色体预定区域中的 SNP信息的方法仍有待改进。 发明内容  Thus, methods for determining SNP information in chromosomes, particularly in predetermined regions of the embryonic chromosome, have yet to be improved. Summary of the invention
本发明旨在至少解决现有技术中存在的技术问题之一。本发明旨在提出一种能够有 效地确定染色体尤其是胚胎染色体预定区域中 SNP信息的方法。  The present invention aims to solve at least one of the technical problems existing in the prior art. The present invention aims to propose a method for efficiently determining SNP information in a chromosome, particularly a predetermined region of an embryonic chromosome.
在本发明的一个方面, 本发明提出了一种确定染色体预定区域中 SNP信息的方法。 根 据本发明的实施例, 该方法包括: 针对染色体的至少一部分, 构建测序文库; 利用探针对 所述测序文库进行筛选, 其中, 所述探针特异性识别所述预定区域中已知 SNP位点的至少 一个, 以便获得目标捕获片段, 所述目标捕获片段包含 SNP位点; 对经过筛选的测序文库 进行测序, 以便获得测序结果; 以及基于所述测序结果, 确定所述预定区域中的 SNP信息。 利用本发明的确定染色体预定区域中 SNP信息的方法, 能够高效准确地确定染色体预定区 域中的 SNP信息, 例如受试样本的致病基因相关的突变位点信息, 进而, 该信息能够有效 地用于确定受试者的遗传状态是正常、 携带或致病, 从而能够为临床疾病检测或治疗提供 依据。  In one aspect of the invention, the invention proposes a method of determining SNP information in a predetermined region of a chromosome. According to an embodiment of the invention, the method comprises: constructing a sequencing library for at least a portion of a chromosome; screening the sequencing library with a probe, wherein the probe specifically recognizes a known SNP position in the predetermined region At least one of points to obtain a target capture fragment, the target capture fragment comprising a SNP site; sequencing the sequenced sequencing library to obtain a sequencing result; and determining a SNP in the predetermined region based on the sequencing result information. By using the method for determining SNP information in a predetermined region of a chromosome of the present invention, it is possible to efficiently and accurately determine SNP information in a predetermined region of a chromosome, for example, information on a mutation site associated with a pathogenic gene of a sample, and further, the information can be effectively It is used to determine whether the genetic state of a subject is normal, carried or pathogenic, thereby providing a basis for clinical disease detection or treatment.
在本发明的另一方面, 本发明还提出了一种确定胚胎染色体预定区域中 SNP信息的方 法。 根据本发明的实施例, 该方法包括: 获取所述胚胎的全基因组; 以及针对所述胚胎的 全基因组, 根据前面所述的确定染色体预定区域中 SNP信息的方法, 确定所述胚胎染色体 预定区域中的 SNP信息。利用本发明的确定胚胎染色体预定区域中 SNP信息的方法, 能够 有效、 准确地确定胚胎染色体预定区域中 SNP信息, 进而, 该信息能够有效地用于确定胚 胎的遗传状态是正常、 携带或致病, 从而能够为胚胎植入前单基因病检测、 孕妇产前诊断 或临床疾病治疗提供依据。  In another aspect of the invention, the invention also provides a method of determining SNP information in a predetermined region of an embryonic chromosome. According to an embodiment of the present invention, the method comprises: acquiring a whole genome of the embryo; and determining, for the whole genome of the embryo, a predetermined region of the embryo chromosome according to the method for determining SNP information in a predetermined region of the chromosome as described above SNP information in . The method for determining SNP information in a predetermined region of an embryo's chromosome can effectively and accurately determine SNP information in a predetermined region of an embryo chromosome, and further, the information can be effectively used to determine whether the embryo's genetic state is normal, carried or pathogenic. Therefore, it can provide a basis for pre-implantation monogenic disease detection, prenatal diagnosis of pregnant women or clinical disease treatment.
在本发明的再一方面, 本发明还提出了一种确定染色体预定区域中 SNP信息的设备。 根据本发明的实施例, 该设备包括: 文库构建装置, 所述文库构建装置适于针对染色体的 至少一部分, 构建测序文库; 文库筛选装置, 所述文库筛选装置与所述文库构建装置相连, 并且适于利用探针对所述测序文库进行筛选, 其中, 所述探针特异性识别所述预定区域中 已知 SNP位点的至少一个, 以便获得目标捕获片段, 所述目标捕获片段包含 SNP位点; 测 序装置, 所述测序装置与所述文库筛选装置相连, 适于对经过筛选的测序文库进行测序, 以便获得测序结果; 以及分析装置, 所述分析装置与所述测序装置相连, 并且适于基于所 述测序结果, 确定所述预定区域中的 SNP信息。 利用本发明的该设备, 能够有效地实施本 发明上述的确定染色体预定区域中 SNP信息的方法, 从而能够高效、 准确地确定染色体预 定区域中 SNP信息, 例如受试样本的致病基因相关的突变位点信息, 进而, 该信息能够有 效地用于确定受试者的遗传状态是正常、 携带或致病, 从而能够为临床疾病检测或治疗提 供依据。 In still another aspect of the present invention, the present invention also provides an apparatus for determining SNP information in a predetermined region of a chromosome. According to an embodiment of the invention, the apparatus comprises: a library construction device, the library construction device being adapted to construct a sequencing library for at least a portion of a chromosome; a library screening device, the library screening device being coupled to the library construction device, and Suitable for screening the sequencing library with a probe, wherein the probe specifically recognizes at least one of known SNP sites in the predetermined region to obtain a target capture fragment, the target capture fragment comprising a SNP position a sequencing device, the sequencing device being coupled to the library screening device, adapted to sequence the sequenced sequencing library to obtain a sequencing result; and an analysis device coupled to the sequencing device and adapted Based on the sequencing result, SNP information in the predetermined area is determined. With the apparatus of the present invention, the above-described method for determining SNP information in a predetermined region of a chromosome of the present invention can be effectively implemented, thereby enabling efficient and accurate determination of SNP information in a predetermined region of a chromosome, for example, by a pathogenic gene of a sample. Mutation site information, and, in turn, the information can have Effectively used to determine whether a subject's genetic status is normal, carried or pathogenic, thereby providing a basis for clinical disease detection or treatment.
在本发明的又一方面, 本发明还提出了一种确定胚胎染色体预定区域中 SNP信息的系 统。 根据本发明的实施例, 该系统包括: 第一全基因组获取设备, 所述第一全基因组获取 设备适于获取所述胚胎的全基因组; 以及 SNP信息确定设备, 所述 SNP信息确定设备与所 述第一全基因组获取设备相连, 用于确定所述胚胎染色体预定区域中的 SNP信息, 其中, 所述 SNP信息确定设备为前面所述的确定染色体预定区域中 SNP信息的设备。利用本发明 的该系统, 能够高效地实施前面所述的确定染色体预定区域中 SNP信息的方法, 从而有效 确定染色体预定区域中 SNP信息, 进而, 该信息能够有效地用于确定胎儿的遗传状态是正 常、 携带或致病, 从而能够为胚胎植入前单基因病检测、 孕妇产前诊断或临床疾病治疗提 供依据。  In still another aspect of the invention, the invention also proposes a system for determining SNP information in a predetermined region of an embryonic chromosome. According to an embodiment of the invention, the system comprises: a first whole genome acquisition device, the first whole genome acquisition device being adapted to acquire a whole genome of the embryo; and a SNP information determining device, the SNP information determining device and device The first whole genome acquisition device is connected to determine SNP information in a predetermined region of the embryo chromosome, wherein the SNP information determining device is the device for determining SNP information in a predetermined region of the chromosome as described above. With the system of the present invention, the above-described method of determining SNP information in a predetermined region of a chromosome can be efficiently implemented, thereby effectively determining SNP information in a predetermined region of the chromosome, and further, the information can be effectively used to determine the genetic state of the fetus. Normal, carrying or causing disease, which can provide a basis for pre-implantation monogenic disease detection, prenatal diagnosis of pregnant women or clinical disease treatment.
在本发明的另一个方面, 本发明还提出了一种计算机可读介质。 根据本发明的实施例, 所述计算机可读介质上存储有指令, 所述指令适于被处理器执行以便基于测序结果, 确定 染色体预定区域中的 SNP信息, 其中, 所述测序结果是通过下列步骤获得的: 针对染色体 的至少一部分, 构建测序文库; 利用探针对所述测序文库进行筛选, 其中, 所述探针特异 性识别所述预定区域中已知 SNP位点的至少一个, 以便获得目标捕获片段, 所述目标捕获 片段包含 SNP位点; 以及对经过筛选的测序文库进行测序, 以便获得测序结果。 利用本发 明的计算机可读介质, 能够有效地确定染色体预定区域中的 SNP信息, 例如受试样本的致 病基因相关的突变位点信息, 进而, 该信息能够有效地用于确定受试者的遗传状态是正常、 携带或致病, 从而能够为临床疾病检测或治疗提供依据。 其中, 当所述染色体的至少一部 分为胚胎的全基因组时, 所述计算机可读介质存储的指令适于被处理器执行以便针对所述 胚胎的全基因组, 确定所述胚胎染色体预定区域中的 SNP信息。  In another aspect of the invention, the invention also provides a computer readable medium. According to an embodiment of the invention, the computer readable medium stores instructions, the instructions being adapted to be executed by the processor to determine SNP information in a predetermined region of the chromosome based on the sequencing result, wherein the sequencing result is through the following Step obtained: constructing a sequencing library for at least a part of a chromosome; screening the sequencing library with a probe, wherein the probe specifically recognizes at least one of known SNP sites in the predetermined region, in order to obtain A target capture fragment, the target capture fragment comprising a SNP site; and sequencing of the sequenced sequencing library to obtain sequencing results. With the computer readable medium of the present invention, SNP information in a predetermined region of a chromosome can be efficiently determined, for example, information on mutation sites associated with a pathogenic gene of a sample, and further, the information can be effectively used to determine a subject. The genetic status is normal, carried or pathogenic, thus providing a basis for clinical disease detection or treatment. Wherein, when at least a portion of the chromosome is a whole genome of an embryo, the computer readable medium stores instructions adapted to be executed by a processor to determine a SNP in a predetermined region of the embryo's chromosome for a whole genome of the embryo information.
在本发明的再一个方面,本发明还提出了一种确定染色体预定区域中 SNP信息的设备。 根据本发明的实施例, 该设备包括: 测序装置; 以及前面所述的存储有适于被处理器执行 的指令以便基于测序结果确定染色体预定区域中的 SNP信息计算机可读介质。 利用本发明 的该设备能够准确有效地确定染色体预定区域中 SNP信息, 例如受试样本的致病基因相关 的突变位点信息, 进而, 该信息能够有效地用于确定受试者的遗传状态是正常、 携带或致 病, 从而能够为临床疾病检测或治疗提供依据。  In still another aspect of the invention, the invention also proposes an apparatus for determining SNP information in a predetermined region of a chromosome. According to an embodiment of the invention, the apparatus comprises: a sequencing device; and the aforementioned computer-readable medium storing instructions adapted to be executed by the processor to determine SNP information in a predetermined region of the chromosome based on the sequencing result. The apparatus of the present invention can accurately and efficiently determine SNP information in a predetermined region of a chromosome, for example, information on mutation sites associated with a pathogenic gene of a sample, and further, the information can be effectively used to determine a genetic state of a subject. It is normal, carried or pathogenic, which can provide a basis for clinical disease detection or treatment.
在本发明的又一个方面, 本发明还提出了一种确定胚胎染色体预定区域中 SNP信息的 系统。 根据本发明的实施例, 该系统包括: 测序装置; 以及前面所述的存储有适于被处理 器执行的指令以便针对胚胎的全基因组确定胎儿染色体预定区域中的 SNP信息的计算机可 读介质。利用本发明的该系统能够准确有效地确定胚胎染色体预定区域中 SNP信息,进而, 该信息能够有效地用于确定胚胎的遗传状态是正常、 携带或致病, 从而能够为胚胎植入前 单基因病检测、 孕妇产前诊断或临床疾病治疗提供依据。 In still another aspect of the invention, the invention also provides a system for determining SNP information in a predetermined region of an embryonic chromosome. According to an embodiment of the invention, the system comprises: a sequencing device; and the aforementioned computer readable medium storing instructions adapted to be executed by the processor to determine SNP information in a predetermined region of the fetal chromosome for the whole genome of the embryo. The system of the present invention can accurately and efficiently determine SNP information in a predetermined region of an embryonic chromosome, and further, The information can be effectively used to determine whether the genetic state of the embryo is normal, carried or pathogenic, thereby providing a basis for pre-implantation monogenic disease detection, prenatal diagnosis of pregnant women or clinical disease treatment.
需要说明的是, 本发明提供的上述基于高通量目标区域捕获测序技术的确定染色体预 定区域中 SNP信息的手段, 相对于现有技术, 至少具有以下优势:  It should be noted that the above-mentioned means for determining SNP information in a predetermined region of a chromosome based on the high-throughput target region capture sequencing technology provided by the present invention has at least the following advantages over the prior art:
1、 本发明通过单体型分析的方法不仅能对目标位点进行间接检测, 还能够对目标位点 进行直接检测。  1. The haplotype analysis method of the present invention can not only indirectly detect a target site, but also directly detect a target site.
2、本发明选择的 SNP位点集中在目标基因 1M范围内, 密度高、 连锁紧密, 既可以大 大提高目标区域 SNP信息检测的灵敏度和准确性, 又可降低检测成本。  2. The selected SNP locus in the present invention is concentrated in the target gene 1M range, and the density is high and the linkage is tight, which can greatly improve the sensitivity and accuracy of SNP information detection in the target region, and can reduce the detection cost.
3、本发明将多个目标检测位点集中于一张芯片上,从能够基于获得的 SNP信息同时对 多种疾病的多种突变进行检测, 无需因人而异设计实验方案, 既缩短了检测周期, 又降低 了检测成本。  3. The invention concentrates multiple target detection sites on one chip, and can detect multiple mutations of various diseases simultaneously based on the obtained SNP information, and does not need to design an experimental scheme differently from person to person, which shortens the detection. The cycle reduces the cost of testing.
4、 本发明采用包含多个目标检测位点的芯片可以同时检测多个样本, 检测通量极大提 高。 这为未来 PGD的规模化应用提供巨大技术支持。  4. The invention adopts a chip comprising a plurality of target detection sites to simultaneously detect a plurality of samples, and the detection flux is greatly improved. This provides great technical support for the scaled application of PGD in the future.
5、 本发明的方法, 除了能够用于单基因遗传病检测, 还能够同时进行 HLA分型、 非 整倍体检测, 实现了单个样本的多项检测, 可为相关 IVF病人提供个性化服务。  5. The method of the present invention, in addition to being capable of being used for single-gene genetic disease detection, is capable of simultaneously performing HLA typing and aneuploidy detection, and realizing multiple tests of a single sample, and providing personalized services for related IVF patients.
本发明的附加方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得 明显, 或通过本发明的实践了解到。 附图说明  The additional aspects and advantages of the invention will be set forth in part in the description which follows. DRAWINGS
本发明的上述和 /或附加的方面和优点从结合下面附图对实施例的描述中将变得明 显和容易理解, 其中:  The above and/or additional aspects and advantages of the present invention will become apparent and readily understood from
图 1显示了根据本发明一个实施例的胚胎单体型分析流程图;  Figure 1 shows a flow chart of an analysis of embryo haplotypes in accordance with one embodiment of the present invention;
图 2显示了根据本发明一个实施例, 确定区分型 SNPs方法的示意图;  2 is a schematic diagram showing a method of determining distinguishing SNPs according to an embodiment of the present invention;
图 3显示了根据本发明一个实施例, 构建的文库的 2100检测结果;  Figure 3 shows the results of 2100 detection of a constructed library in accordance with one embodiment of the present invention;
图 4显示了根据本发明一个实施例的单体型构建模拟图;  Figure 4 shows a simulation of a haplotype construction in accordance with one embodiment of the present invention;
图 5 显示了根据本发明一个实施例的胚胎单体型与胚胎遗传状况分析的流程示意 图;  Figure 5 is a schematic flow chart showing analysis of embryo haplotype and embryo genetic condition according to one embodiment of the present invention;
图 6显示了根据本发明一个实施例的确定染色体预定区域中 SNP信息的方法的流 程示意图;  6 is a flow chart showing a method of determining SNP information in a predetermined region of a chromosome according to an embodiment of the present invention;
图 7显示了根据本发明一个实施例的确定胚胎染色体预定区域中 SNP信息的方法 的流程示意图;  Figure 7 is a flow chart showing a method of determining SNP information in a predetermined region of an embryonic chromosome according to an embodiment of the present invention;
图 8显示了根据本发明一个实施例的确定染色体预定区域 SNP信息的设备的结构 示意图; 以及 FIG. 8 shows the structure of an apparatus for determining SNP information of a predetermined region of a chromosome according to an embodiment of the present invention. Schematic;
图 9显示了根据本发明一个实施例的确定胚胎染色体预定区域中 SNP信息的系统 的结构示意图。 发明详细描述  Figure 9 is a diagram showing the structure of a system for determining SNP information in a predetermined region of an embryonic chromosome according to an embodiment of the present invention. Detailed description of the invention
下面详细描述本发明的实施例, 所述实施例的示例在附图中示出, 其中自始至终相同 或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。 下面通过参考附图描 述的实施例是示例性的, 仅用于解释本发明, 而不能理解为对本发明的限制。  The embodiments of the present invention are described in detail below, and the examples of the embodiments are illustrated in the drawings, wherein the same or similar reference numerals are used to refer to the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the drawings are intended to be illustrative of the invention and are not to be construed as limiting.
需要说明的是, 术语 "第一"、 "第二"仅用于描述目的, 而不能理解为指示或暗示相 对重要性或者隐含指明所指示的技术特征的数量。 由此, 限定有 "第一"、 "第二" 的特征 可以明示或者隐含地包括一个或者更多个该特征。 进一步地, 在本发明的描述中, 除非另 有说明, "多个" 的含义是两个或两个以上。  It should be noted that the terms "first" and "second" are used for descriptive purposes only, and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, features defining "first", "second" may explicitly or implicitly include one or more of the features. Further, in the description of the present invention, "multiple" means two or more unless otherwise stated.
方法  Method
在本发明的一个方面, 本发明提出了一种确定染色体预定区域中 SNP信息的方法。 根 据本发明的实施例, 参照图 6, 该方法包括:  In one aspect of the invention, the invention proposes a method of determining SNP information in a predetermined region of a chromosome. According to an embodiment of the present invention, referring to FIG. 6, the method includes:
针对染色体的至少一部分, 构建测序文库  Build a sequencing library for at least a portion of the chromosome
根据本发明的实施例, 所述染色体的至少一部分是通过全基因组扩增获得的胚胎细胞 全基因组。 根据本发明的实施例, 全基因组扩增的实施方法不受特别限制, 根据本发明的 一些具体示例, 全基因组扩增是通过选自 PEP-PCR, DOP-PCR, OmniPlex WGA和 MDA 的至少之一进行的。 由此, 能够将少量的胚胎细胞进行有效扩增, 从而获得较多的胚胎细 胞全基因组样品。  According to an embodiment of the invention, at least a portion of the chromosome is a whole genome of embryonic cells obtained by whole genome amplification. According to an embodiment of the present invention, the method of performing whole genome amplification is not particularly limited, and according to some specific examples of the present invention, whole genome amplification is performed by at least one selected from the group consisting of PEP-PCR, DOP-PCR, OmniPlex WGA, and MDA. One carried out. Thereby, a small amount of embryonic cells can be efficiently amplified, thereby obtaining more embryonic whole genome samples.
利用探针对所述测序文库进行筛选, 以便获得目标捕获片段  Screening the sequencing library with a probe to obtain a target capture fragment
根据本发明的实施例,所述探针特异性识别所述预定区域中已知 SNP位点的至少一个, 以便获得目标捕获片段, 所述目标捕获片段包含 SNP位点。根据本发明的实施例, 所述预 定区域包括目标基因区域和 SNP-marker区域。 根据本发明的实施例, 所述目标基因区域包 括与所述目标疾病相关基因的外显子和外显子毗邻区的至少一部分。 其中, 所述外显子毗 邻区包括外显子 5' 端上游 50bp的区域和所述外显子下游 50bp的区域; 所述 SNP-marker 区域包括所述目标基因上下游 1M的范围。由此,在筛选过程中能够有效降低基因重组的影 响, 甚至能够将目标基因区与 SNP-marker区域的重组的概率降低到万分之一, 从而能够确 保后续检测的准确性。  According to an embodiment of the invention, the probe specifically recognizes at least one of known SNP sites in the predetermined region to obtain a target capture segment, the target capture segment comprising a SNP site. According to an embodiment of the invention, the predetermined region comprises a target gene region and a SNP-marker region. According to an embodiment of the present invention, the target gene region comprises at least a portion of an exon and an exon adjacent region of the gene associated with the target disease. Wherein the exon adjacent region comprises a region of 50 bp upstream of the 5' end of the exon and a region of 50 bp downstream of the exon; and the SNP-marker region comprises a range of 1 M upstream and downstream of the target gene. Thereby, the influence of gene recombination can be effectively reduced in the screening process, and the probability of recombination of the target gene region and the SNP-marker region can be reduced to one ten thousandth, thereby ensuring the accuracy of subsequent detection.
根据本发明的实施例, 所述探针的长度为 20~200nt, 优选情况下, 所述探针的长度为 60~80nt。 由此, 能够有效提高目标 SNP的捕获效率。 根据本发明的一个实施例, 所述探针 是以芯片的形式提供的。 由此, 利用能够包含多个目标检测位点的芯片, 能够同时对多种 疾病多种突变进行检测, 无需因人而异设计实验方案, 既缩短了检测周期, 又降低了检测 成本; 并且利用芯片可以同时检测多个样本, 检测通量极大提高。 According to an embodiment of the invention, the probe has a length of 20 to 200 nt. Preferably, the length of the probe is 60 to 80 nt. Thereby, the capture efficiency of the target SNP can be effectively improved. According to an embodiment of the invention, the probe It is provided in the form of a chip. Therefore, by using a chip capable of including a plurality of target detection sites, it is possible to simultaneously detect multiple mutations of various diseases, and it is not necessary to design an experimental scheme differently from person to person, which shortens the detection period and reduces the detection cost; The chip can detect multiple samples at the same time, and the detection throughput is greatly improved.
对经过筛选的测序文库进行测序, 以便获得测序结果  Sequencing the sequenced sequencing library for sequencing results
根据本发明的实施例, 利用选自 Illumina Hiseq2000, Genome Analyzer, Miseq测序系 歹 lj, Life technologies的 SOLiD测序系统, Ion Torrent测序系统和罗氏的 454测序系统的至 少之一进行所述测序。 由此, 能够有效提高测序的效率和通量。  According to an embodiment of the invention, the sequencing is performed using at least one of an SOLiD sequencing system selected from the group consisting of Illumina Hiseq 2000, Genome Analyzer, Miseq Sequencing Systems, Life Technologies, Ion Torrent Sequencing System and Roche 454 Sequencing System. Thereby, the efficiency and throughput of sequencing can be effectively improved.
基于所述测序结果, 确定所述预定区域中的 SNP信息  Determining SNP information in the predetermined area based on the sequencing result
根据本发明的实施例, 基于所述测序结果, 确定所述预定区域中的 SNP信息进一步包 括: 将所述测序结果与参考序列进行比对, 以便获得唯一比对序列; 以及利用 SNP分析软 件从所述唯一比对序列获取所述预定区域中的 SNP信息。 其中, 根据本发明的实施例, 所 述比对是利用 BWA软件包进行的。由此,能够快速准确地实现比对。根据本发明的实施例, 在获得唯一比对序列后,进一步包括从所述唯一比对序列去除 PCR重复扩展的序列。由此, 有利于后续的 SNP分析。根据本发明的实施例, 可以采用的 SNP分析软件的种类不受特别 限制。 根据本发明的一些实施例, 所述 SNP分析软件为选自 SAMtools和 GATK的至少之 一。 由此, 能够快速准确地进行 SNP分析。  According to an embodiment of the present invention, determining SNP information in the predetermined region based on the sequencing result further comprises: comparing the sequencing result with a reference sequence to obtain a unique alignment sequence; and using SNP analysis software to The unique alignment sequence acquires SNP information in the predetermined area. Therein, the alignment is performed using a BWA software package in accordance with an embodiment of the present invention. Thereby, the comparison can be achieved quickly and accurately. According to an embodiment of the invention, after obtaining the unique alignment sequence, further comprising removing the sequence of PCR repeat extension from the unique alignment sequence. This facilitates subsequent SNP analysis. The kind of SNP analysis software that can be employed according to an embodiment of the present invention is not particularly limited. According to some embodiments of the invention, the SNP analysis software is at least one selected from the group consisting of SAMtools and GATK. Thereby, SNP analysis can be performed quickly and accurately.
根据本发明的实施例, 进一步包括对所获得的 SNP信息进行过滤。 其中, 根据本发明 的一些实施例,所述过滤的条件为去除满足下列条件之一的 SNP: SNP测序深度低于 10 X , 优选低于 20 X ; 以及杂合 SNP中两种碱基测序深度差异高于 20%, 优选高于 10% , 更优选 高于 5%。 由此, 经过过滤的 SNP信息准确可信。 需要说明的是, 理论上测序深度越高, 杂 合 SNP测序深度比值越接近 1 : 1, 且 SNP过滤条件中的测序深度、测序深度差异度的具体 数值的设定与实施时的样本、 测序深度、 测序质量相关, 可根据实际需要调整。 在本发明 的一个实施例中胚胎遗传相关个体的测序深度为 50 X、 胚胎样本的测序深度为 100 X且测 序质量较好,为使留下的都是测序准确符合实际的 SNP,严格过滤,过滤掉低于 10 X的 SNP, 也过滤掉测序深度差异高于 10%的杂合 SNP, 去除了大量的杂合 SNP; 可以理解的, 采用 更高深度测序 (> 100 X ), 若也要严格过滤保证剩余 SNP的真实准确, 可过滤掉如低于 20 X的 SNP, 过滤掉如差异高于 5%的杂合 SNP, 相反的, 对于相对低深度测序的数据, 可设 置过滤掉高于 20%的杂合 SNP。  According to an embodiment of the invention, the filtering of the obtained SNP information is further included. Wherein, according to some embodiments of the present invention, the filtering condition is to remove a SNP that satisfies one of the following conditions: SNP sequencing depth is less than 10 X, preferably less than 20 X; and two base sequencing depths in the hybrid SNP The difference is above 20%, preferably above 10%, more preferably above 5%. Thus, the filtered SNP information is accurate and reliable. It should be noted that, in theory, the higher the sequencing depth, the closer the heterogeneous SNP sequencing depth ratio is to 1: 1, and the specific value of the sequencing depth and the sequencing depth difference in the SNP filtration condition, and the sample at the time of implementation, sequencing Depth, sequencing quality related, can be adjusted according to actual needs. In one embodiment of the present invention, the embryo genetically related individual has a sequencing depth of 50 X, the embryo sample has a sequencing depth of 100 X, and the sequencing quality is good, so that the remaining SNPs are accurately aligned with the actual SNP, and strictly filtered. Filter out SNPs below 10 X and also filter out heterozygous SNPs with a difference in sequencing depth greater than 10%, removing a large number of heterozygous SNPs; understandably, using higher depth sequencing (> 100 X), if Strict filtering ensures the true accuracy of the remaining SNPs. It can filter out SNPs such as below 20 X, and filter out heterozygous SNPs with a difference of more than 5%. Conversely, for relatively low-depth sequencing data, filter can be set higher than 20% heterozygous SNP.
发明人发现, 利用本发明的确定染色体预定区域中 SNP信息的方法, 能够高效准确地 确定染色体预定区域中的 SNP信息,例如受试样本的致病基因相关的突变位点信息,进而, 该信息能够有效地用于确定受试者的遗传状态是正常、 携带或致病, 从而能够为临床疾病 检测或治疗提供依据。 在本发明的另一方面, 本发明还提出了一种确定胚胎染色体预定区域中 SNP信息的方 法。 根据本发明的实施例, 该方法包括: 获取所述胚胎的全基因组; 以及针对所述胚胎的 全基因组, 根据前面所述的确定染色体预定区域中 SNP信息的方法, 确定所述胚胎染色体 预定区域中的 SNP信息。 The inventors have found that the method for determining SNP information in a predetermined region of a chromosome of the present invention can efficiently and accurately determine SNP information in a predetermined region of a chromosome, for example, information on a mutation site related to a pathogenic gene of a sample, and further, Information can be effectively used to determine whether a subject's genetic status is normal, carried or pathogenic, thereby providing a basis for clinical disease detection or treatment. In another aspect of the invention, the invention also provides a method of determining SNP information in a predetermined region of an embryonic chromosome. According to an embodiment of the present invention, the method comprises: acquiring a whole genome of the embryo; and determining, for the whole genome of the embryo, a predetermined region of the embryo chromosome according to the method for determining SNP information in a predetermined region of the chromosome as described above SNP information in .
根据本发明的另一些实施例, 参照图 7, 本发明的确定胚胎染色体预定区域中 SNP信 息的方法具体包括以下步骤: 获取所述胚胎的全基因组; 针对所述胚胎的全基因组, 构建 测序文库; 利用探针对所述测序文库进行筛选, 以便获得目标捕获片段; 对经过筛选的测 序文库进行测序, 以便获得测序结果; 基于所述测序结果, 确定所述胚胎染色体预定区域 中的 SNP信息。 利用本发明的确定胚胎染色体预定区域中 SNP信息的方法, 能够有效、 准 确地确定胚胎染色体预定区域中 SNP信息, 进而, 该信息能够有效地用于确定胎儿的遗传 状态是正常、 携带或致病, 从而能够为胚胎植入前单基因病检测、 孕妇产前诊断或临床疾 病治疗提供依据。  According to still another embodiment of the present invention, referring to FIG. 7, the method for determining SNP information in a predetermined region of an embryonic chromosome of the present invention specifically includes the following steps: acquiring a whole genome of the embryo; constructing a sequencing library for the whole genome of the embryo The sequencing library is screened by a probe to obtain a target capture fragment; the screened sequencing library is sequenced to obtain a sequencing result; based on the sequencing result, SNP information in a predetermined region of the embryo chromosome is determined. The method for determining SNP information in a predetermined region of an embryonic chromosome of the present invention can effectively and accurately determine SNP information in a predetermined region of an embryonic chromosome, and further, the information can be effectively used to determine whether the genetic state of the fetus is normal, carried or pathogenic Therefore, it can provide a basis for pre-implantation monogenic disease detection, prenatal diagnosis of pregnant women or clinical disease treatment.
根据本发明的实施例, 所述胚胎的全基因组是通过对胚胎细胞进行全基因组扩增而获 得的。 其中, 根据本发明的实施例, 全基因组扩增的具体实施方法不受特别限制, 根据本 发明的一些具体示例, 全基因组扩增是通过选自 PEP-PCR, DOP-PCR, OmniPlex WGA和 MDA的至少之一进行的。 由此, 能够将少量的胚胎细胞进行有效扩增, 从而获得较多的胚 胎细胞全基因组样品。  According to an embodiment of the invention, the whole genome of the embryo is obtained by whole genome amplification of embryonic cells. Wherein, according to an embodiment of the present invention, the specific implementation method of whole genome amplification is not particularly limited. According to some specific examples of the present invention, whole genome amplification is selected from the group consisting of PEP-PCR, DOP-PCR, OmniPlex WGA and MDA At least one of them is carried out. Thereby, a small amount of embryonic cells can be efficiently amplified, thereby obtaining a larger whole genome sample of embryonic cells.
根据本发明的实施例, 本发明的确定胚胎染色体预定区域中 SNP信息的方法进一步包 括:  According to an embodiment of the present invention, the method for determining SNP information in a predetermined region of an embryonic chromosome of the present invention further comprises:
首先, 获取胚胎遗传相关个体的全基因组, 其中, 所述胚胎遗传相关个体包括所述胚 胎的父亲、 母亲和先证者。 需要说明的是, 这里所使用的术语 "先证者 "指确诊为遗传了 该致病基因, 并表现出该疾病症状的患者, 且其是与前述胚胎具有遗传关系的生物体, 既 可以是胚胎或者胎儿, 也可以是出生后的个体。  First, a whole genome of an embryo genetically related individual is obtained, wherein the embryonic genetic related individual includes a father, a mother, and a proband of the embryo. It should be noted that the term "proband" as used herein refers to a patient diagnosed with the disease-causing gene and exhibiting the symptoms of the disease, and is an organism having a genetic relationship with the aforementioned embryo, either The embryo or fetus can also be an individual after birth.
其次, 基于所述胚胎遗传相关个体的全基因组, 分别确定所述父亲的 SNP信息, 所述 母亲的 SNP信息以及所述先证者的 SNP信息。  Next, based on the whole genome of the embryonic genetically related individual, the father's SNP information, the mother's SNP information, and the proband's SNP information are determined, respectively.
接着, 基于所述父亲的 SNP信息和所述母亲的 SNP信息, 确定区分型 SNP。 需要说明 的是, 在这里所使用的术语 "区分型 SNP"指的是可以有效区分父母单体型的碱基, 即在 某一位置父母双方 4个碱基中其中一碱基 (常染色体) 与该位置的其他碱基都不相同, 该 碱基可以确定父母双方 4条单体型中的唯一一条, 如某位置父母基因型分别为 AA、 AG, 则 G碱基为区分型 SNP, 因为在该位置 G可以确定唯一的一个单体型, 而 A在其他 3个单 体型中都存在, 无法确定唯一单体型。 其中图 2显示了根据孟德尔遗传原理, 确定父母区 分型 SNPs位点方法的示意图。 接下来, 基于所述区分型 SNP和所述先证者的 SNP信息, 确定父亲 SNP单体型和母 亲 SNP单体型。 也即: 基于所述区分型 SNP和所述先证者 SNP, 分别针对父亲和母亲基因 组中与所述预定区域对应的两条染色体, 分别构建第一父亲单体型、 第二父亲单体型、 第 一母亲单体型和第二母亲单体型, 以便用于后续胚胎单体型的确定。 其中, 所述父亲 SNP 单体型包括第一父亲单体型和第二父亲单体型, 所述母亲 SNP单体型包括第一母亲单体型 和第二母亲单体型, 所述第一父亲单体型、 第二父亲单体型、 第一母亲单体型和第二母亲 单体型是由所述区分型 SNP构成的。 根据本发明的实施例, 可以根据孟德尔遗传原理与连 锁交换定律, 结合父母区分型 SNP位点和先证者 SNPs信息构建出父母 SNP-单体型, 构建 原理如图 4所示。所述 SNP-单体型完全由区分型 SNP位置碱基组成, 每条单体型都含有众 多区分型 SNP, 单体型中的区分型 SNP能够与其他单体型相区分。 如某一位置父母基因型 分别为 AA、 AG, G为区分型 SNP, A为非区分型 SNP, A、 G分别为单体型中该处的碱基。 由于先证者的 2条单体型, 分别遗传自父母, 可根据疾病情况确定致病突变所在的单体型。 如显性遗传病, 父亲患病, 母亲正常, 则先证者所遗传自父亲的单体型为致病突变所在的 单体型; 如隐性遗传病, 父母都是携带者, 则先证者 (患病) 的两个单体型都为致病突变 所在的单体型。 由此, 基于区分型 SNP和先证者的 SNP信息, 能够有效确定父亲 SNP单 体型和母亲 SNP单体型,进而基于胚胎的 SNP信息、父亲 SNP单体型和母亲 SNP单体型, 能够有效确定所述胚胎的 SNP单体型。 Next, a distinguishing type SNP is determined based on the SNP information of the father and the SNP information of the mother. It should be noted that the term "differentiated SNP" as used herein refers to a base which can effectively distinguish a parental haplotype, that is, one of the four bases of a parent at one position (autosomal) Different from other bases at this position, the base can determine the only one of the four haplotypes of the parents. For example, if the parental genotype of each position is AA, AG, then the G base is a differentiated SNP, because At this position G, a single haplotype can be determined, and A is present in the other three haplotypes, and the unique haplotype cannot be determined. Figure 2 shows a schematic diagram of the method for determining parental SNPs based on Mendelian genetic principles. Next, a father SNP haplotype and a mother SNP haplotype are determined based on the distinguishing SNP and the SNP information of the proband. That is, based on the distinguishing SNP and the proband SNP, respectively constructing a first father haplotype and a second father haplotype for the two chromosomes corresponding to the predetermined region in the father and mother genomes, respectively. The first mother haplotype and the second mother haplotype are used for the determination of subsequent embryo haplotypes. Wherein the father SNP haplotype comprises a first father haplotype and a second father haplotype, the mother SNP haplotype comprising a first mother haplotype and a second mother haplotype, the first The father haplotype, the second father haplotype, the first mother haplotype, and the second mother haplotype are composed of the distinguishing SNPs. According to the embodiment of the present invention, the parent SNP-haplotype can be constructed according to the Mendelian genetic principle and the linkage exchange law, combined with the parental SNP locus and the proband SNPs information, and the construction principle is shown in FIG. 4 . The SNP-haplotype consists entirely of distinguishing SNP position bases, each of which contains a plurality of distinguishing SNPs, and the distinguishing SNPs in the haplotype can be distinguished from other haplotypes. For example, the parental genotype of a certain position is AA, AG, G is a differentiated SNP, A is a non-differentiated SNP, and A and G are the bases of the haplotype. Since the two haplotypes of the proband are inherited from the parents, the haplotype in which the mutation is located can be determined according to the disease. If the dominant genetic disease, the father is sick, the mother is normal, the haplotype inherited by the proband from the father is the haplotype where the disease is the mutation; if the recessive genetic disease, the parents are carriers, the proband Both haplotypes of the disease (disease) are haplotypes in which the disease is mutated. Thus, based on the SNP information of the distinguishing SNP and the proband, the father SNP haplotype and the maternal SNP haplotype can be effectively determined, and based on the SNP information of the embryo, the father SNP haplotype and the mother SNP haplotype, The SNP haplotype of the embryo is efficiently determined.
然后, 基于所述胚胎的 SNP信息、 父亲 SNP单体型和母亲 SNP单体型, 确定所述父 亲 SNP单体型和母亲 SNP单体型的组合方式, 以便获得所述胚胎的 SNP单体型。 即基于 所述胚胎的 SNP信息与前述的第一父亲单体型、 第二父亲单体型、 第一母亲单体型和第二 母亲单体型, 确定所述胎儿染色体预定区域中的 SNP类型, 进而确定所述胚胎的 SNP单体 型。根据本发明的实施例,所述胚胎的 SNP单体型是通过下列步骤获得的:确定胚胎的 SNP 信息显著支持的父亲单体型作为胚胎的父本来源单体型; 以及确定胚胎的 SNP信息显著支 持的母亲单体型作为胚胎的母本来源单体型。其中,根据本发明的实施例,所述区分型 SNP 数不低于 10个是显著支持的指示。具体地,由于胚胎的 2个单体型分别遗传自父母各一条, 可以根据胚胎 SNPs信息结合父母 SNP-单体型进行分析,判断胚胎 SNPs是哪两条单体型的 组合, 分析原理如图 4所示。 分析中可采用区分型 SNP数目统计计算, 根据数值的大小确 定胚胎单体型, 具体流程如图 5所示。 根据本发明的实施例, 一单体型区分型 SNP数大于 10, 则可确定该单体型为胚胎其中一条单体型; 如一单体型区分型 SNP数小于 4, 则可判 断该单体型为 SNP错误导致。 根据本发明一些具体示例, 为确保准确, 将一正确单体型的 SNP支持数定于为不低于 10个,错误单体型 SNP支持数不高于 3个,这是因为前面设定的 SNP过滤条件较为严格, 即单体型构建中所用 SNP正确率较高, 并且候选 SNP数量大, 实 际测试数据表明正确单体型的 SNP支持数远高于 10个, 错误单体型 SNP支持数一般为 0。 根据本发明的一些实施例, 经验证, 对于一常染色体疾病, 通过本发明的方法分析, 每个 胚胎只能得到 2个满足要求的单体型; 对于一 X染色体疾病, 通过本发明的方法分析, 可 得到一个 (男胎) 或两个 (女胎)满足要求的单体型。 Then, based on the SNP information of the embryo, the father SNP haplotype and the maternal SNP haplotype, the combination of the father SNP haplotype and the maternal SNP haplotype is determined to obtain the SNP haplotype of the embryo. . That is, determining the SNP type in the predetermined region of the fetal chromosome based on the SNP information of the embryo and the aforementioned first father haplotype, second father haplotype, first mother haplotype, and second mother haplotype. And determining the SNP haplotype of the embryo. According to an embodiment of the invention, the SNP haplotype of the embryo is obtained by determining the father haplotype of the embryo that is significantly supported by the SNP information of the embryo as the paternal source haplotype of the embryo; and determining the SNP information of the embryo A significantly supported maternal haplotype is used as the maternal source haplotype of the embryo. Wherein, according to an embodiment of the present invention, the number of the distinguishing SNPs is not less than 10, which is an indication of significant support. Specifically, since the two haplotypes of the embryo are inherited from each parent, the SNPs can be analyzed according to the information of the embryonic SNPs and the combination of the two haplotypes of the embryonic SNPs. 4 is shown. In the analysis, the statistical calculation of the number of distinguishing SNPs can be used, and the embryo haplotype is determined according to the numerical value. The specific process is shown in FIG. 5 . According to an embodiment of the present invention, if the number of single-type distinguishing SNPs is greater than 10, it can be determined that the haplotype is one of the haplotypes of the embryo; if the number of singular-type SNPs is less than 4, the monomer can be judged. Type is caused by a SNP error. According to some specific examples of the present invention, in order to ensure accuracy, the number of SNP supports of a correct haplotype is set to be no less than 10, and the number of haplotype SNP supports is not more than 3, because the previously set SNP filtration conditions are more stringent, that is, the correct rate of SNP used in haplotype construction is higher, and the number of candidate SNPs is large, The test data shows that the number of SNPs supported by the correct haplotype is much higher than 10, and the number of SNPs supported by the wrong haplotype is generally zero. According to some embodiments of the present invention, it has been verified that for an autosomal disease, only two haplotypes satisfying the requirements can be obtained per embryo by the method of the present invention; for an X chromosome disease, by the method of the present invention Analysis, one (male) or two (female) can be obtained to meet the required haplotype.
由此, 能够准确有效地确定胚胎的 SNP单体型, 进而能够有效确定所述胚胎的遗传状 态。 即利用该方法能够有效地根据模拟构建的父母单体型, 确定胚胎是否遗传父母的致病 单体型, 从而判断胚胎的遗传状态是正常、 携带或致病。 设备和系统  Thereby, the SNP haplotype of the embryo can be accurately and efficiently determined, and the genetic state of the embryo can be effectively determined. That is, the method can effectively determine whether the embryo inherits the pathogenic haplotype of the parent according to the parent haplotype constructed by the simulation, thereby judging whether the embryo's genetic state is normal, carrying or causing disease. Equipment and systems
在本发明的再一方面, 本发明还提出了一种确定染色体预定区域中 SNP信息的设备。 根据本发明的实施例,参照图 8,该设备 1000包括:文库构建装置 100、文库筛选装置 200、 测序装置 300和分析装置 400。根据本发明的实施例, 文库构建装置 100适于针对染色体的 至少一部分, 构建测序文库; 文库筛选装置 200与所述文库构建装置 100相连, 并且适于 利用探针对所述测序文库进行筛选, 其中, 所述探针特异性识别所述预定区域中已知 SNP 位点的至少一个, 以便获得目标捕获片段, 所述目标捕获片段包含所述 SNP位点; 测序装 置 300与所述文库筛选装置 200相连, 适于对经过筛选的测序文库进行测序, 以便获得测 序结果; 分析装置 400与所述测序装置 300相连, 并且适于基于所述测序结果, 确定所述 预定区域中的 SNP信息。 利用本发明的该设备 1000, 能够有效地实施本发明上述的确定染 色体预定区域中 SNP信息的方法,从而能够高效准确地确定染色体预定区域中的 SNP信息, 例如受试样本的致病基因相关的突变位点信息, 进而, 该信息能够有效地用于确定受试者 的遗传状态是正常、 携带或致病, 从而能够为临床疾病检测或治疗提供依据。  In still another aspect of the present invention, the present invention also provides an apparatus for determining SNP information in a predetermined region of a chromosome. According to an embodiment of the present invention, referring to FIG. 8, the apparatus 1000 includes a library construction device 100, a library screening device 200, a sequencing device 300, and an analysis device 400. According to an embodiment of the invention, the library construction device 100 is adapted to construct a sequencing library for at least a portion of a chromosome; a library screening device 200 is coupled to the library construction device 100 and is adapted to screen the sequencing library with a probe, Wherein the probe specifically recognizes at least one of known SNP sites in the predetermined region to obtain a target capture segment, the target capture segment comprising the SNP site; sequencing device 300 and the library screening device 200 connected, suitable for sequencing the sequenced sequencing library to obtain sequencing results; the analysis device 400 is coupled to the sequencing device 300 and is adapted to determine SNP information in the predetermined region based on the sequencing result. With the apparatus 1000 of the present invention, the above-described method for determining SNP information in a predetermined region of a chromosome of the present invention can be effectively implemented, thereby enabling efficient and accurate determination of SNP information in a predetermined region of a chromosome, for example, by a pathogenic gene of a sample. The mutation site information, in turn, can be effectively used to determine whether the subject's genetic state is normal, carried or pathogenic, thereby providing a basis for clinical disease detection or treatment.
根据本发明的实施例, 所述预定区域包括目标基因区域和 SNP-marker区域。 根据本发 明的实施例, 所述目标基因区域包括与所述目标疾病相关基因的外显子和外显子毗邻区的 至少一部分。 根据本发明的实施例, 所述外显子毗邻区包括外显子 5 ' 端上游 50bp的区域 和所述外显子下游 50bp的区域;所述 SNP-marker区域包括所述目标基因上下游 1M的范围。  According to an embodiment of the invention, the predetermined area comprises a target gene region and a SNP-marker region. According to an embodiment of the invention, the target gene region comprises at least a portion of an exon and an exon adjacent region of a gene associated with the target disease. According to an embodiment of the present invention, the exon adjacent region comprises a region 50 bp upstream from the 5′ end of the exon and a region 50 bp downstream of the exon; the SNP-marker region includes 1 M upstream and downstream of the target gene The scope.
根据本发明的实施例, 所述探针的长度为 20~200nt, 优选情况下, 所述探针的长度为 60~80nt。 根据本发明的一个实施例, 所述探针是以芯片的形式提供的。  According to an embodiment of the invention, the probe has a length of 20 to 200 nt. Preferably, the length of the probe is 60 to 80 nt. According to an embodiment of the invention, the probe is provided in the form of a chip.
根据本发明的实施例, 进一步包括染色体制备装置 (图中未示出), 所述染色体制备装 置与所述文库构建装置 100相连, 并且适用于通过全基因组扩增获得胚胎细胞全基因组, 所述胚胎细胞全基因组构成所述染色体的至少一部分。 根据本发明的实施例, 所述染色体 制备装置适于通过选自 PEP-PCR, DOP-PCR, OmniPlex WGA和 MDA的至少之一进行所 述全基因组扩增。 根据本发明的实施例, 进一步包括 DNA提取装置(图中未示出), 所述 DNA提取装置 与所述文库构建装置 100相连, 并且适于通过对生物体的外周血进行 DNA提取, 以便获得 所述染色体的至少一部分。 According to an embodiment of the present invention, further comprising a chromosome preparation device (not shown), the chromosome preparation device is connected to the library construction device 100, and is adapted to obtain an embryonic cell whole genome by whole genome amplification, The whole genome of the embryonic cells constitutes at least a portion of the chromosome. According to an embodiment of the invention, the chromosome preparation device is adapted to perform the whole genome amplification by at least one selected from the group consisting of PEP-PCR, DOP-PCR, OmniPlex WGA and MDA. According to an embodiment of the present invention, further comprising a DNA extraction device (not shown), the DNA extraction device is connected to the library construction device 100, and is adapted to obtain DNA extraction from peripheral blood of the living body to obtain At least a portion of the chromosome.
根据本发明的实施例,所述测序装置 300为选自 Illumina Hiseq2000, Genome Analyzer, Miseq测序系列, Life technologies的 SOLiD测序系统, Ion Torrent测序系统和罗氏的 454 测序系统的至少之一。  According to an embodiment of the invention, the sequencing device 300 is at least one selected from the group consisting of Illumina Hiseq 2000, Genome Analyzer, Miseq sequencing series, Life technologies' SOLiD sequencing system, Ion Torrent sequencing system and Roche's 454 sequencing system.
根据本发明的实施例, 所述分析装置 400进一步包括: 比对单元, 所述比对单元适于 将所述测序结果与参考序列进行比对, 以便获得唯一比对序列; 以及 SNP信息获取单元, 所述 SNP信息获取单元与所述比对单元相连,并且适于利用 SNP分析软件从所述唯一比对 序列获取所述预定区域中的 SNP信息。根据本发明的实施例,所述比对单元适于利用 BWA 软件包进行所述比对。 根据本发明的实施例, 所述分析装置进一步包括适于从所述唯一比 对序列去除 PCR重复扩展的序列的单元。根据本发明的实施例, 所述 SNP分析软件为选自 SAMtools和 GATK的至少之一。  According to an embodiment of the present invention, the analyzing device 400 further includes: a comparing unit, the comparing unit is adapted to compare the sequencing result with a reference sequence to obtain a unique alignment sequence; and a SNP information acquiring unit And the SNP information acquiring unit is connected to the comparison unit, and is adapted to acquire SNP information in the predetermined area from the unique alignment sequence by using SNP analysis software. According to an embodiment of the invention, the comparison unit is adapted to perform the comparison using a BWA software package. According to an embodiment of the invention, the analysis means further comprises means adapted to remove the sequence of PCR repeat extensions from the unique alignment sequence. According to an embodiment of the invention, the SNP analysis software is at least one selected from the group consisting of SAMtools and GATK.
根据本发明的实施例, 所述分析装置 400进一步包括适于对所获得的 SNP信息进行过 滤的单元。 根据本发明的实施例, 所述过滤的条件为去除满足下列条件之一的 SNP: SNP 测序深度低于 10 X, 优选低于 20 X ; 以及杂合 SNP中两种碱基测序深度差异高于 20%, 优 选高于 10%, 更优选高于 5%。 According to an embodiment of the invention, the analysis device 400 further comprises means adapted to filter the obtained SNP information. According to an embodiment of the invention, the filtering condition is to remove a SNP that satisfies one of the following conditions: SNP sequencing depth is less than 10 X, preferably less than 20 X ; and the difference in sequencing depth between the two bases in the hybrid SNP is higher than 20%, preferably more than 10%, more preferably more than 5%.
需要说明的是, 所述设备的各个装置可以实现本发明确定染色体预定区域 SNP信息方 法中的相应步骤, 前面对确定染色体预定区域中 SNP信息的方法的优点和效果的描述同样 适用于该设备, 在此不再赘述。  It should be noted that each device of the device can implement the corresponding steps in the method for determining the SNP information of the predetermined region of the chromosome of the present invention, and the foregoing description of the advantages and effects of the method for determining the SNP information in the predetermined region of the chromosome is also applicable to the device. , will not repeat them here.
在本发明的又一方面, 本发明还提出了一种确定胚胎染色体预定区域中 SNP信息的系 统。 根据本发明的实施例, 参照图 9, 该系统 10000包括: 第一全基因组获取设备 2000, 以及 SNP信息确定设备 1000, 所述第一全基因组获取设备 2000适于获取所述胚胎的全基 因组;所述 SNP信息确定设备 1000与所述第一全基因组获取设备相连,用于确定所述胎儿 染色体预定区域中的 SNP信息, 其中, 所述 SNP信息确定设备 1000为前面所述的确定染 色体预定区域中 SNP信息的设备 1000。 利用本发明的该系统 10000, 能够高效地实施前面 所述的确定染色体预定区域中 SNP信息的方法, 从而能够有效、 准确地确定胚胎染色体预 定区域中 SNP信息, 进而, 该信息能够有效地用于确定胎儿的遗传状态是正常、 携带或致 病, 从而能够为胚胎植入前单基因病检测、 孕妇产前诊断或临床疾病治疗提供依据。  In still another aspect of the invention, the invention also proposes a system for determining SNP information in a predetermined region of an embryonic chromosome. According to an embodiment of the present invention, referring to FIG. 9, the system 10000 includes: a first whole genome acquisition device 2000, and a SNP information determination device 1000, the first whole genome acquisition device 2000 being adapted to acquire a whole genome of the embryo; The SNP information determining device 1000 is connected to the first genome-wide acquiring device for determining SNP information in a predetermined region of the fetal chromosome, wherein the SNP information determining device 1000 is a predetermined region for determining a chromosome as described above. Device 1000 for SNP information. With the system 10000 of the present invention, the above-described method of determining SNP information in a predetermined region of a chromosome can be efficiently implemented, thereby enabling effective and accurate determination of SNP information in a predetermined region of an embryonic chromosome, and further, the information can be effectively used for Determining the genetic status of the fetus is normal, carrying or causing disease, which can provide a basis for pre-implantation monogenic disease detection, prenatal diagnosis of pregnant women or clinical disease treatment.
根据本发明的实施例, 所述第一全基因组获取设备 2000适于通过对胚胎细胞进行全基 因组扩增而获得所述胚胎的全基因组。 其中, 根据本发明的实施例, 所述第一全基因组获 取设备 2000适于利用选自 PEP-PCR, DOP-PCR, OmniPlex WGA和 MDA的至少之一获得 所述胚胎的全基因组。 According to an embodiment of the invention, the first whole genome acquisition device 2000 is adapted to obtain a whole genome of the embryo by whole genome amplification of the embryonic cells. Wherein, according to an embodiment of the present invention, the first whole genome acquisition device 2000 is adapted to obtain at least one selected from the group consisting of PEP-PCR, DOP-PCR, OmniPlex WGA and MDA. The whole genome of the embryo.
根据本发明的实施例, 所述系统 10000进一步包括: 第二全基因组获取设备 (图中未 示出), 所述第二全基因组获取设备适于获取胚胎遗传相关个体的全基因组, 其中, 所述胚 胎遗传相关个体包括所述胚胎的父亲、母亲和先证者;区分型 SNP确定设备(图中未示出), 所述区分型确定设备适于基于所述父亲的 SNP信息和所述母亲的 SNP信息, 确定区分型 SNP ; 第一单体型确定设备 (图中未示出), 所述第一单体型确定设备适于基于所述区分型 SNP和所述先证者的 SNP信息, 确定父亲 SNP单体型和母亲 SNP单体型; 以及第二单体 型确定设备(图中未示出), 所述第二单体型确定设备适于基于所述胚胎的 SNP信息、父亲 SNP单体型和母亲 SNP单体型, 确定所述父亲 SNP单体型和母亲 SNP单体型的重组合方 式, 以便获得所述胚胎的 SNP单体型。  According to an embodiment of the present invention, the system 10000 further includes: a second whole genome acquisition device (not shown), wherein the second whole genome acquisition device is adapted to acquire a whole genome of an embryo genetically related individual, wherein The embryo genetically related individual includes a father, a mother, and a proband of the embryo; a distinguishing SNP determining device (not shown) adapted to be based on the father's SNP information and the mother SNP information, determining a distinguishing SNP; a first haplotype determining device (not shown), the first haplotype determining device being adapted to be based on the distinguishing SNP and the SNP information of the proband Determining a father SNP haplotype and a mother SNP haplotype; and a second haplotype determining device (not shown) adapted to be based on SNP information of the embryo, father The SNP haplotype and the maternal SNP haplotype are determined by recombination of the father SNP haplotype and the maternal SNP haplotype to obtain the SNP haplotype of the embryo.
根据本发明的实施例, 所述第二单体型确定设备进一步包括: 确定胚胎的 SNP信息显 著支持的父亲单体型作为胚胎的父本来源单体型的单元; 以及确定胚胎的 SNP信息显著支 持的母亲单体型作为胚胎的母本来源单体型的单元。根据本发明的实施例,所述区分型 SNP 数不低于 10个是显著支持的指示。  According to an embodiment of the present invention, the second haplotype determining apparatus further comprises: determining a father haplotype in which the SNP information of the embryo is significantly supported as a unit of the paternal source haplotype of the embryo; and determining the SNP information of the embryo is significant The supported maternal haplotype is the unit of the haplotype derived from the mother of the embryo. According to an embodiment of the invention, the number of distinguishing SNPs of not less than 10 is an indication of significant support.
需要说明的是, 上述系统所包含的各个设备可以实现本发明确定染色体预定区域 SNP 信息方法中的相应步骤, 前面对确定胚胎染色体预定区域中 SNP信息的方法的优点和效果 的描述同样适用于该系统, 在此不再赘述。 计算机可读介质  It should be noted that each device included in the above system can implement the corresponding steps in the method for determining the SNP information of the predetermined region of the chromosome of the present invention, and the foregoing description of the advantages and effects of the method for determining the SNP information in the predetermined region of the embryonic chromosome is also applicable to This system will not be described here. Computer readable medium
在本发明的另一个方面, 本发明还提出了一种计算机可读介质。 根据本发明的实施例, 所述计算机可读介质上存储有指令, 所述指令适于被处理器执行以便基于测序结果, 确定 染色体预定区域中的 SNP信息, 可以理解, 在执行该程序时, 通过指令相关硬件可完成确 定染色体包括胚胎染色体预定区域 SNP信息方法的全部或部分步骤, 所述计算机可读介质 可以包括: 只读存储器、 随机存储器、 磁盘或光盘等。 其中, 所述测序结果是通过下列步 骤获得的: 针对染色体的至少一部分, 构建测序文库; 利用探针对所述测序文库进行筛选, 其中, 所述探针特异性识别所述预定区域中已知 SNP位点的至少一个, 以便获得目标捕获 片段, 所述目标捕获片段包含 SNP位点; 以及对经过筛选的测序文库进行测序, 以便获得 测序结果。  In another aspect of the invention, the invention also provides a computer readable medium. According to an embodiment of the invention, the computer readable medium stores instructions, the instructions being adapted to be executed by the processor to determine SNP information in a predetermined region of the chromosome based on the sequencing result, it being understood that, when the program is executed, All or part of the steps of determining a chromosome including a predetermined region of the embryonic chromosome SNP information may be performed by instructing related hardware, and the computer readable medium may include: a read only memory, a random access memory, a magnetic disk or an optical disk, or the like. Wherein the sequencing result is obtained by: constructing a sequencing library for at least a part of the chromosome; screening the sequencing library with a probe, wherein the probe specifically recognizes the known region At least one of the SNP sites to obtain a target capture fragment, the target capture fragment comprising a SNP site; and sequencing the sequenced sequencing library to obtain sequencing results.
根据本发明的实施例, 所述预定区域包括目标基因区域和 SNP-marker区域。 根据本发 明的实施例, 所述目标基因区域包括与所述目标疾病相关基因的外显子和外显子毗邻区的 至少一部分。根据本发明的实施例,所述外显子毗邻区包括所述外显子上下游 50bp的范围; 所述 SNP-marker区域包括所述目标基因上下游 1M的范围。 根据本发明的实施例, 所述探针的长度为 20~200nt, 优选情况下, 所述探针的长度为 60~80nt。 根据本发明的一个实施例, 所述探针是以芯片的形式提供的。 According to an embodiment of the invention, the predetermined area comprises a target gene region and a SNP-marker region. According to an embodiment of the invention, the target gene region comprises at least a portion of an exon and an exon adjacent region of the gene associated with the target disease. According to an embodiment of the present invention, the exon adjacent region includes a range of 50 bp upstream and downstream of the exon; and the SNP-marker region includes a range of 1 M upstream and downstream of the target gene. According to an embodiment of the invention, the probe has a length of 20 to 200 nt. Preferably, the length of the probe is 60 to 80 nt. According to an embodiment of the invention, the probe is provided in the form of a chip.
根据本发明的实施例, 所述染色体的至少一部分是通过全基因组扩增获得的胚胎细胞 全基因组。根据本发明的实施例,全基因组扩增是通过 PEP-PCR, DOP-PCR, OmniPlex WGA 和 MDA的至少之一进行的。  According to an embodiment of the invention, at least a portion of the chromosome is a whole genome of embryonic cells obtained by whole genome amplification. According to an embodiment of the invention, whole genome amplification is performed by at least one of PEP-PCR, DOP-PCR, OmniPlex WGA and MDA.
根据本发明的实施例,所述染色体的至少一部分是通过对生物体的外周血进行 DNA提 取而获得的。  According to an embodiment of the invention, at least a portion of the chromosome is obtained by DNA extraction of peripheral blood of the organism.
根据本发明的实施例, 利用 Illumina Hiseq2000, Genome Analyzer, Miseq测序系列, Life technologies的 SOLiD测序系统, Ion Torrent测序系统, 罗氏的 454测序系统进行所述 测序。  The sequencing was performed according to an embodiment of the invention using Illumina Hiseq 2000, Genome Analyzer, Miseq sequencing series, Life technologies' SOLiD sequencing system, Ion Torrent sequencing system, Roche's 454 sequencing system.
根据本发明的实施例, 基于所述测序结果, 确定所述预定区域中的 SNP信息进一步包 括: 将所述测序结果与参考序列进行比对, 以便获得唯一比对序列; 以及利用 SNP分析软 件从所述唯一比对序列获取所述预定区域中的 SNP信息。 根据本发明的实施例, 所述比对 是利用 BWA软件包进行的。根据本发明的实施例, 在获得唯一比对序列后, 进一步包括从 所述唯一比对序列去除 PCR重复扩展的序列。根据本发明的实施例, 所述 SNP分析软件为 选自 SAMtools和 GATK的至少之一。 根据本发明的实施例, 进一步包括对所获得的 SNP 信息进行过滤。 根据本发明的实施例, 所述过滤的条件为去除满足下列条件之一的 SNP: SNP测序深度低于 10 X ,优选低于 20 X;以及杂合 SNP中两种碱基测序深度差异高于 20%, 优选高于 10%, 更优选高于 5%。 需要说明的是, 理论上测序深度越高, 杂合 SNP测序深 度比值越接近 1 : 1, 且 SNP过滤条件中的测序深度、 测序深度差异度的具体数值的设定与 实施时的样本、 测序深度、 测序质量相关, 可根据实际需要调整。 在本发明的一个实施例 中胚胎遗传相关个体的测序深度为 50 X、 胚胎样本的测序深度为 100 X且测序质量较好, 为使留下的都是测序准确符合实际的 SNP, 严格过滤, 过滤掉低于 10 X的 SNP, 也过滤掉 测序深度差异高于 10%的杂合 SNP, 去除了大量的杂合 SNP; 可以理解的, 采用更高深度 测序(> 100 X ),若也要严格过滤保证剩余 SNP的真实准确,可过滤掉如低于 20 X的 SNP, 过滤掉如差异高于 5%的杂合 SNP, 相反的, 对于相对低深度测序的数据, 可设置过滤掉高 于 20%的杂合 SNP。  According to an embodiment of the present invention, determining SNP information in the predetermined region based on the sequencing result further comprises: comparing the sequencing result with a reference sequence to obtain a unique alignment sequence; and using SNP analysis software to The unique alignment sequence acquires SNP information in the predetermined area. According to an embodiment of the invention, the alignment is performed using a BWA software package. According to an embodiment of the invention, after obtaining the unique alignment sequence, further comprising removing the sequence of PCR repeat extension from the unique alignment sequence. According to an embodiment of the present invention, the SNP analysis software is at least one selected from the group consisting of SAMtools and GATK. According to an embodiment of the invention, the filtering of the obtained SNP information is further included. According to an embodiment of the invention, the filtering condition is to remove a SNP that satisfies one of the following conditions: SNP sequencing depth is less than 10 X , preferably less than 20 X; and the difference in sequencing depth between the two bases in the hybrid SNP is higher than 20%, preferably more than 10%, more preferably more than 5%. It should be noted that, in theory, the higher the sequencing depth, the closer the heterogeneous SNP sequencing depth ratio is to 1: 1, and the specific value of the sequencing depth and the sequencing depth difference in the SNP filtering condition, and the sample at the time of implementation, sequencing Depth, sequencing quality related, can be adjusted according to actual needs. In one embodiment of the present invention, the embryo genetically related individual has a sequencing depth of 50 X, the embryo sample has a sequencing depth of 100 X, and the sequencing quality is good, so that the remaining SNPs are accurately aligned with the actual SNP, and strictly filtered. Filter out SNPs below 10 X and also filter out heterozygous SNPs with a difference in sequencing depth greater than 10%, removing a large number of heterozygous SNPs; understandably, using higher depth sequencing (> 100 X ), if Strict filtering ensures that the remaining SNPs are true and accurate. Filter out SNPs such as below 20 X and filter out heterozygous SNPs with a difference of more than 5%. Conversely, for relatively low-depth sequencing data, filter can be set higher than 20% heterozygous SNP.
根据本发明的实施例, 所述染色体的至少一部分为胚胎的全基因组, 以便针对所述胎 儿的全基因组, 确定所述胎儿染色体预定区域中的 SNP信息。  According to an embodiment of the invention, at least a portion of the chromosome is a whole genome of an embryo such that SNP information in a predetermined region of the fetal chromosome is determined for the whole genome of the fetus.
由此, 根据本发明的实施例, 所述指令进一步适于被处理器执行以便: 获取胚胎遗传 相关个体的全基因组, 其中, 所述胚胎遗传相关个体包括所述胚胎的父亲、 母亲和先证者; 以及基于所述胚胎遗传相关个体的全基因组, 分别确定所述父亲的 SNP信息, 所述母亲的 SNP信息以及所述先证者的 SNP信息; 基于所述父亲的 SNP信息和所述母亲的 SNP信息, 确定区分型 SNP; 基于所述区分型 SNP和所述先证者的 SNP信息, 确定父亲 SNP单体型 和母亲 SNP单体型; 以及基于所述胚胎的 SNP信息、父亲 SNP单体型和母亲 SNP单体型, 确定所述父亲 SNP单体型和母亲 SNP单体型的组合方式, 以便获得所述胚胎的 SNP单体 型。 其中, 根据本发明的实施例, 所述胚胎的 SNP单体型是通过下列步骤获得的: 确定胚 胎的 SNP信息显著支持的父亲单体型作为胚胎的父本来源单体型; 以及确定胚胎的 SNP信 息显著支持的母亲单体型作为胚胎的母本来源单体型。 根据本发明的实施例, 所述区分型 SNP数不低于 10个是显著支持的指示。 Thus, in accordance with an embodiment of the present invention, the instructions are further adapted to be executed by a processor to: acquire a whole genome of an embryo genetically related individual, wherein the embryonic genetically related individual comprises a father, a mother, and a proband of the embryo And determining a SNP information of the father based on the whole genome of the embryo genetically related individual, the mother's SNP information and SNP information of the proband; determining a differentiated SNP based on the SNP information of the father and the SNP information of the mother; determining the father based on the distinguishing SNP and the SNP information of the proband a SNP haplotype and a maternal SNP haplotype; and determining a combination of the father SNP haplotype and the maternal SNP haplotype based on the SNP information of the embryo, the father SNP haplotype, and the maternal SNP haplotype, In order to obtain the SNP haplotype of the embryo. Wherein, according to an embodiment of the present invention, the SNP haplotype of the embryo is obtained by: determining that the SNP information of the embryo significantly supports the father haplotype as the paternal source haplotype of the embryo; and determining the embryo The SNP information significantly supports the maternal haplotype as the maternal source haplotype of the embryo. According to an embodiment of the present invention, the number of the distinguishing SNPs is not less than 10, which is an indication of significant support.
在本发明的再一个方面,本发明还提出了一种确定染色体预定区域中 SNP信息的设备。 根据本发明的实施例, 该设备包括: 测序装置; 以及前面所述的存储有适于被处理器执行 的指令以便基于测序结果确定染色体预定区域中的 SNP信息计算机可读介质。 利用本发明 的该设备能够准确有效地确定染色体预定区域中 SNP信息, 例如受试样本的致病基因相关 的突变位点信息, 进而, 该信息能够有效地用于确定受试者的遗传状态是正常、 携带或致 病, 从而能够为临床疾病检测或治疗提供依据。 其中, 当所述染色体的至少一部分为胚胎 的全基因组时, 所述计算机可读介质存储的指令适于被处理器执行以便针对所述胎儿的全 基因组, 确定所述胎儿染色体预定区域中的 SNP信息。  In still another aspect of the invention, the invention also proposes an apparatus for determining SNP information in a predetermined region of a chromosome. According to an embodiment of the invention, the apparatus comprises: a sequencing device; and the aforementioned computer-readable medium storing instructions adapted to be executed by the processor to determine SNP information in a predetermined region of the chromosome based on the sequencing result. The apparatus of the present invention can accurately and efficiently determine SNP information in a predetermined region of a chromosome, for example, information on mutation sites associated with a pathogenic gene of a sample, and further, the information can be effectively used to determine a genetic state of a subject. It is normal, carried or pathogenic, which can provide a basis for clinical disease detection or treatment. Wherein, when at least a portion of the chromosome is a whole genome of an embryo, the computer readable medium stores instructions adapted to be executed by a processor to determine a SNP in a predetermined region of the fetal chromosome for the whole genome of the fetus information.
在本发明的又一个方面, 本发明还提出了一种确定胚胎染色体预定区域中 SNP信息的 系统。 根据本发明的实施例, 该系统包括: 测序装置; 以及前面所述的存储有适于被处理 器执行的指令以便针对胎儿的全基因组确定胎儿染色体预定区域中的 SNP信息的计算机可 读介质。利用本发明的该系统能够准确有效地确定胚胎染色体预定区域中 SNP信息,进而, 该信息能够有效地用于确定胎儿的遗传状态是正常、 携带或致病, 从而能够为胚胎植入前 单基因病检测、 孕妇产前诊断或临床疾病治疗提供依据。  In still another aspect of the invention, the invention also provides a system for determining SNP information in a predetermined region of an embryonic chromosome. According to an embodiment of the invention, the system comprises: a sequencing device; and the aforementioned computer readable medium storing instructions adapted to be executed by the processor to determine SNP information in a predetermined region of the fetal chromosome for the whole genome of the fetus. The system of the invention can accurately and effectively determine the SNP information in the predetermined region of the embryo chromosome, and further, the information can be effectively used to determine that the genetic state of the fetus is normal, carried or pathogenic, thereby enabling preimplantation of a single gene for the embryo. Provide evidence for disease testing, prenatal diagnosis of pregnant women or treatment of clinical diseases.
需要说明的是, 前面描述的本发明的计算机可读介质的优点和效果同样适用于上述确 定染色体预定区域中 SNP信息的设备以及确定胚胎染色体预定区域中 SNP信息的系统,在 此不再赘述。 下面将结合实施例对本发明的方案进行解释。 本领域技术人员将会理解, 下面的实施 例仅用于说明本发明, 而不应视为限定本发明的范围。 实施例中未注明具体技术或条件的, 按照本领域内的文献所描述的技术或条件(例如参考 J.萨姆布鲁克等著, 黄培堂等译的《分 子克隆实验指南》, 第三版, 科学出版社) 或者按照产品说明书进行。 所用试剂或仪器未注 明生产厂商者, 均为可以通过市购获得的常规产品, 例如可以采购自 Illumina公司。  It should be noted that the advantages and effects of the computer readable medium of the present invention described above are equally applicable to the above-described apparatus for determining SNP information in a predetermined region of a chromosome and a system for determining SNP information in a predetermined region of an embryonic chromosome, which will not be described herein. The solution of the present invention will be explained below in conjunction with the embodiments. Those skilled in the art will appreciate that the following examples are merely illustrative of the invention and are not to be considered as limiting the scope of the invention. In the examples, the specific techniques or conditions are not indicated, according to the techniques or conditions described in the literature in the field (for example, refer to J. Sambrook et al., Huang Peitang et al., Molecular Cloning Experimental Guide, Third Edition, Science Press) Or follow the product manual. The reagents or instruments used are not specified by the manufacturer, and are conventional products that are commercially available, for example, from Illumina.
一般方法 参考图 1, 在下面的实施例中主要步骤如下: General method Referring to Figure 1, the main steps in the following embodiments are as follows:
1、 根据目标区域设计探针, 定制捕获芯片  1. Design probes based on the target area, custom capture chips
本发明所设计的捕获芯片包含两部分,一部分为目标基因区域;另一部分为 SNP-marker 区域。 目标基因区域主要为外显子及外显子与内含子交界区域, 该区域覆盖了绝大部分的 致病突变, 可用于疾病突变的直接检测。 SNP-marker区域为目标基因区域上下游区域, 该 区域包含了上千个高频 SNP (即千人数据库中频率大于 0.3的 SNP), 该区域用于检测父母 差异化的 SNP, 结合家系中的先证者 SNP信息构建致病基因单体型。 由于减数分裂中同源 染色体间基因重组的存在, 会对基因的 SNP-单体型造成影响。 SNP-marker间距离越小, 重 组率越小, 当距离小于 1M时, 重组率低于 1% (人的重组率是 1%每 1M的区域)。 芯片捕 获包含的 SNP-marker区域的范围可以基于人类基因组的一般重组率大概估计选择确定, 一 般地选择的目标基因区域上下游的范围小, 捕获得的 SNP准确, 但是数量少, 选择的范围 大, 捕获得的 SNP数量多, 但是范围大发生的重组概率也会越高, 且选择的上下游范围大 SNP数量多, 设计合成花费相对高。 在本发明的一个实施例中为降低基因重组的影响, 确 保检测准确性, 将 SNP-marker区域限定在目标基因上下游 1M内, 这样可以把目标基因区 与 SNP-marker区域的重组的概率降低到万分之一。  The capture chip designed by the present invention comprises two parts, one part is a target gene region; the other part is a SNP-marker area. The target gene region is mainly the exon and the exon-intron junction region, which covers most of the pathogenic mutations and can be used for direct detection of disease mutations. The SNP-marker region is the upstream and downstream region of the target gene region, which contains thousands of high-frequency SNPs (that is, SNPs with a frequency greater than 0.3 in the 1000-person database). This region is used to detect parental differential SNPs, combined with the family. The proband SNP information constructs the disease-causing gene haplotype. Due to the presence of genetic recombination between homologous chromosomes in meiosis, the SNP-haplotype of the gene is affected. The smaller the distance between SNP-markers, the smaller the recombination rate. When the distance is less than 1M, the recombination rate is less than 1% (the human recombination rate is 1% per 1M area). The range of SNP-marker regions contained in the chip capture can be determined based on the general recombination rate of the human genome. The range of upstream and downstream of the target gene region is generally selected to be small, and the captured SNP is accurate, but the number is small, and the range of selection is large. The number of captured SNPs is large, but the probability of recombination occurring in a large range is also higher, and the number of large SNPs in the upstream and downstream ranges is selected, and the design and synthesis cost is relatively high. In one embodiment of the present invention, in order to reduce the influence of genetic recombination and ensure detection accuracy, the SNP-marker region is limited to 1M upstream and downstream of the target gene, thereby reducing the probability of recombination of the target gene region and the SNP-marker region. To one ten thousandth.
1.1 目标基因捕获芯片设计  1.1 Target gene capture chip design
首先确定目标基因, 然后以 Hg19为参考序列确定目标基因所在位置, 最后确定捕获区 域。 First determine the target gene, then H g 19 as a reference sequence to determine the location of a target gene, to finalize the capture region.
1.2 SNP-marker捕获芯片设计  1.2 SNP-marker capture chip design
根据 1.1中确定的各目标基因位置,在该位置的上下游 1M距离内选取在人群中频率较 高的 SNP位点。 使选取的 SNP位点位于目标捕获片段中间, 有利于提高 SNP被捕获下来 的几率, 在本发明的一个实施例中, 由于构建的文库大小在 200bp左右, 即捕获探针的捕 获片段大小主要在 200bp左右, 为提高目标 SNP的捕获效率, 将这些 SNP位点及其上下 lOObp左右 (使选取的 SNP大致位于 1/2 200bp处) 的区域为 SNP-marker捕获区域。  According to the target gene positions determined in 1.1, the SNP loci with higher frequency in the population were selected within 1M distance from the upstream and downstream of the position. Having the selected SNP site located in the middle of the target capture segment is advantageous for increasing the probability of the SNP being captured. In one embodiment of the present invention, since the size of the constructed library is about 200 bp, the capture fragment size of the capture probe is mainly About 200 bp, in order to improve the capture efficiency of the target SNP, the SNP-marker capture region is the region of these SNP sites and about 100 bp above and below (so that the selected SNP is located at 1/2 200 bp).
1.3 芯片评估  1.3 Chip Evaluation
芯片设计完成后采用专业评估软件 (Sequence Search and Alignment by Hashing Algorithm , SSAHA) 对探针特异性评估, 评估合格后进行芯片合成。  After the chip design is completed, the probe is specifically evaluated by the Sequence Search and Alignment by Hashing Algorithm (SSAHA), and the chip is synthesized after the evaluation is passed.
2、 家系样本制备  2, family sample preparation
采集胚胎细胞基因组, 并采用 PEP-PCR, DOP-PCR, OmniPlex WGA或者 MDA (多重 链置换扩增)方法进行胚胎细胞全基因组扩增(WGA), 并提取父母及先证者的外周血(或 根据疾病类型采集家族其他患病者样本) DNA。  Embryonic cell genomes were collected and whole-genome amplification (WGA) of embryonic cells was performed using PEP-PCR, DOP-PCR, OmniPlex WGA or MDA (multiple strand displacement amplification) methods, and parental and proband peripheral blood was extracted (or Samples of other family members of the family were collected according to the type of disease) DNA.
3、 文库制备 根据将选择的测序平台 (Illumina Hiseq2000, Genome Analyzer, Miseq测序系列, Life technologies的 SOLiD测序系统, Ion Torrent测序系统或罗氏的 454测序系统)的测序要求, 将上述父母及先证者的外周血 DNA及胚胎细胞基因组的 WGA产物分别进行文库构建, 文 库构建完成后进行 2100、 Q-PCR及富集度的检测。 3. Library preparation Peripheral blood DNA of the above parents and probands according to the sequencing requirements of the selected sequencing platform (Illumina Hiseq2000, Genome Analyzer, Miseq sequencing series, Life technologies SOLiD sequencing system, Ion Torrent sequencing system or Roche's 454 sequencing system) The WGA products of the embryonic cell genome were separately constructed, and 2100, Q-PCR and enrichment were detected after the library was constructed.
4、 探针捕获杂交  4, probe capture hybridization
将上述获得的各文库混合, 并将混合文库与设计好的捕获探针进行杂交, 杂交流程参 照芯片合成服务公司提供的技术流程。  The libraries obtained above were mixed, and the mixed library was hybridized with the designed capture probe, and the hybridization procedure was followed by the technical procedure provided by the chip synthesis service company.
5、 高通量测序  5. High-throughput sequencing
使用 Illumina Hiseq2000, Genome Analyzer, Miseq测序系歹 ij , Life technologies的 SOLiD 测序系统, Ion Torrent测序系统或罗氏的 454测序系统等进行测序。  Sequencing was performed using Illumina Hiseq2000, Genome Analyzer, Miseq sequencing system ij ij , Life technologies' SOLiD sequencing system, Ion Torrent sequencing system or Roche's 454 sequencing system.
6、 数据分析  6, data analysis
参考图 1, 分析过程包括:  Referring to Figure 1, the analysis process includes:
6.1、 参考序列比对  6.1, reference sequence alignment
根据不同测序平台要求, 过滤掉低质量的测序数据, 去除含有文库接头的序列, 然后 利用分析软件如 BWA(Burrows Wheeler Aligner)软件包将测序数据与人类参考基因组进行序 列比对, 按照默认最优参数 (-1 -i 15 -L -k 2 -1 31 -t 4), 取比对结果中比对到芯片目标区域的 reads并用 SAMtools去除 PCR重复扩展的序列进行后续分析。  According to the requirements of different sequencing platforms, the low-quality sequencing data is filtered out, the sequence containing the library linker is removed, and the sequencing data is compared with the human reference genome by using analysis software such as BWA (Burrows Wheeler Aligner) software package, according to the default optimality. The parameters (-1 -i 15 -L -k 2 -1 31 -t 4) were compared in the alignment result to the read of the chip target region and the sequence of the PCR repeat extension was removed by SAMtools for subsequent analysis.
6.2、 SNP calling  6.2, SNP calling
对得到的有效数据, 应用 SNP分析软件如 SAMtools和 GATK进行分析, 获得目标区 域内所有的 SNP信息。  For the valid data obtained, SNP analysis software such as SAMtools and GATK are used for analysis to obtain all SNP information in the target area.
6.3、 SNP过滤  6.3, SNP filtering
对上述得到的 SNP以一定的条件进行过滤, 提高 SNP准确性。 过滤条件为, 过滤掉满 足下列任一条件的: 1、 SNP测序深度低于 10 X ; 2、 杂合 SNP中两种碱基测序深度差异高 于 10%。这是由于测序深度过低可能会导致部分杂合 SNP中其中一碱基未能测到,杂合 SNP 中两碱基测序深度差异过大也会导致无法与测序错误正确区分, 判断为纯合。 经以上条件 过滤可以去除潜在错误的 SNP。  The SNP obtained above is filtered under certain conditions to improve the accuracy of the SNP. The filtration conditions are: Filter out any of the following conditions: 1. The SNP sequencing depth is less than 10 X; 2. The difference in sequencing depth between the two bases in the hybrid SNP is higher than 10%. This is because the low sequencing depth may result in the failure of one of the bases in the partially heterozygous SNP. The difference in the depth of the two bases in the heterozygous SNP may not be correctly distinguished from the sequencing error. . Filtering by the above conditions can remove potentially erroneous SNPs.
6.4、 筛选可以有效区分父母单体型的碱基 (即区分型 SNP)  6.4. Screening can effectively distinguish bases of parental haplotypes (ie, differentiated SNPs)
区分型 SNP是指在某一位置父母双方 4个碱基中其中一碱基 (常染色体) 与该位置的 其他任一碱基不相同, 该碱基可以确定父母双方 4条单体型中的唯一一条, 如某位置父母 基因型分别为 AA、 AG, 则 G碱基为区分型 SNP, 因为在该位置 G可以确定唯一的一个单 体型, 而 A在其他 3个单体型中都存在, 无法确定唯一单体型。 具体示例如图 2所示。 按 照图示要求即可以根据孟德尔遗传原理, 选择确定父母区分型 SNPs位点。 6.5、 构建父母单体型 A distinguishing SNP means that one of the four bases of the parent at a certain position (the autosome) is different from any other base at the position, and the base can be determined in the four haplotypes of both parents. The only one, if the parental genotype of a certain location is AA, AG, then the G base is a differentiated SNP, because G can determine the only one haplotype at this position, and A exists in the other three haplotypes. , Unable to determine the only haplotype. A specific example is shown in Figure 2. According to the requirements of the figure, the parental distinguishing SNPs can be selected according to the Mendelian genetic principle. 6.5. Building parental haplotypes
根据孟德尔遗传原理与连锁交换定律, 结合父母区分型 SNP位点和先证者 SNPs信息 构建出父母 SNP-单体型,构建原理如图 4所示,即首先结合父母区分型 SNPs位点信息和 先证者 SNPs信息, 按照基本的孟德尔遗传原理和连锁交换定律构建父母单体型; 然后 结合父母单体型结果和胚胎 SNPs信息预测胚胎单体型结果。 其中, 如图 4所示, 红色 标记的碱基字母表示父亲的区分型 SNPs位点; 黄色标记的碱基字母表示母亲的 SNPs 位点; 斜体和下划线标记的碱基字母表示该位点在 WGA过程中发生了 ADO ; G*表示 致病突变碱基; -- 表示检测失败的位点。 其中, SNP-单体型完全由区分型 SNP位置碱基 组成, 每条单体型都含有众多区分型 SNP, 单体型中的区分型 SNP能够与其他单体型相区 分。 如某一位置父母基因型分别为 AA、 AG , G为区分型 SNP, A为非区分型 SNP, A、 G 分别为单体型中该处的碱基。 由于先证者的 2条单体型, 分别遗传自父母, 可根据疾病情 况确定致病突变所在的单体型。 如显性遗传病, 父亲患病, 母亲正常, 则先证者所遗传自 父亲的单体型为致病突变所在的单体型; 如隐性遗传病, 父母都是携带者, 则先证者 (患 病) 的两个单体型都为致病突变所在的单体型。  According to the Mendelian genetic principle and the chain exchange law, the parental SNP locus and the proband SNPs were combined to construct the parent SNP-haplotype. The construction principle is shown in Figure 4, which firstly combines the parental SNPs locus information. And the proband SNPs information, constructing the parental haplotype according to the basic Mendelian genetic principle and the chain exchange law; then combining the parental haplotype results and embryonic SNPs information to predict the embryo haplotype results. Wherein, as shown in Figure 4, the red-marked base letter indicates the father's distinguishing SNPs; the yellow-marked base letter indicates the mother's SNPs; the italicized and underlined base letters indicate that the site is in WGA ADO occurs during the process; G* indicates the pathogenic mutant base; -- indicates the site where the test failed. Among them, the SNP-haplotype consists entirely of distinguishing SNP position bases, each of which contains a plurality of distinguishing SNPs, and the distinguishing SNPs in the haplotype can be distinguished from other haplotypes. For example, the parental genotypes of a certain position are AA, AG, G is a differentiated SNP, A is a non-differentiated SNP, and A and G are the bases in the haplotype, respectively. Since the two haplotypes of the proband are inherited from the parents, the haplotype of the disease-causing mutation can be determined according to the disease. If the dominant genetic disease, the father is sick, the mother is normal, the haplotype inherited by the proband from the father is the haplotype where the disease is the mutation; if the recessive genetic disease, the parents are carriers, the proband Both haplotypes of the disease (disease) are haplotypes in which the disease is mutated.
6.6、 分析胚胎单体型  6.6. Analysis of embryo haplotypes
由于胚胎的 2个单体型分别遗传自父母各一条,可以根据胚胎 SNPs信息结合父母 SNP- 单体型进行分析, 判断胚胎 SNPs是哪两条单体型的组合, 分析原理如图 4所示。 分析中可 采用区分型 SNP数目统计计算, 根据数值的大小确定胚胎单体型, 如图 5所示。 如一单体 型区分型 SNP数大于 10,则可确定该单体型为胚胎其中一条单体型;如一单体型区分型 SNP 数小于 4, 则可判断该单体型为 SNP错误导致; 本发明的一个实施例中, 为确保准确, 将 一正确单体型的 SNP支持数定于为不低于 10个, 错误单体型 SNP支持数不高于 3个, 由 于 6.3步骤中设定的 SNP过滤条件较为严格, 即单体型构建中所用 SNP正确率较高, 并且 候选 SNP数量大, 实际测试数据表明正确单体型的 SNP支持数远高于 10个, 错误单体型 SNP支持数一般为 0。对于一常染色体疾病, 经过本流程分析, 每个胚胎只能得到 2个满足 要求的单体型; 对于一 X染色体疾病, 经过本流程可得到一个 (男胎) 或两个 (女胎) 满 足要求的单体型。  Since the two haplotypes of the embryo are inherited from each parent, the analysis can be based on the information of the embryonic SNPs combined with the parent SNP-haplotype, and the combination of the two haplotypes of the embryonic SNPs is determined. The analysis principle is shown in Fig. 4. . In the analysis, the number of differentiated SNPs can be statistically calculated, and the embryo haplotype is determined according to the numerical value, as shown in Fig. 5. If the number of singular-type SNPs is greater than 10, it can be determined that the haplotype is one of the haplotypes of the embryo; if the number of singular-type SNPs is less than 4, the haplotype can be judged to be a SNP error; In an embodiment of the invention, in order to ensure accuracy, the number of SNP supports of a correct haplotype is set to be no less than 10, and the number of SNPs supported by the haplotype is not more than 3, as set in the 6.3 step. SNP filtration conditions are more stringent, that is, the correct rate of SNP used in haplotype construction is higher, and the number of candidate SNPs is large. The actual test data indicates that the number of SNPs supported by the correct haplotype is much higher than 10, and the number of incorrect haplotype SNPs is supported. Usually 0. For an autosomal disease, after this process analysis, only 2 haplotypes can be obtained for each embryo; for an X-chromosome disease, one (male) or two (female) can be obtained through this procedure. The required haplotype.
6.7、 结果分析  6.7, analysis of results
根据胚胎是否遗传父母的致病单体型判断胚胎的遗传状态是正常、 携带或致病。  The genetic state of the embryo is judged to be normal, carried or pathogenic depending on whether the embryo is genetically parental.
实施例 1 Example 1
在本实施例中, 采用一般方法和检测流程分别对一苯丙酮尿症(经典型)家系 (家系一, 常染色体隐性遗传)样本及一生育进行性肌营养不良 (DMD )家系(家系二, X染色体隐性 遗传)样本进行检测。 家系一夫妇经过 IVF获得 7个胚胎, 并采用 MF-PCR方法进行 PAH 基因检测, 筛选出 2个正常胚胎植入, 最终获得一个女婴, 经脐带血基因检测确认该女婴 正常。 家系二夫妇经过 IVF获得 9个胚胎, 并采用 MF-PCR方法进行 DMD基因 PGD, 筛 选出 3个正常胚胎, 选择其中 2个植入, 最终获得一个男婴 (其中有一胚胎没发育), 经脐 带血基因检测确认该男婴正常。 In this example, a general method and a detection procedure are used for a phenylketonuria (classic) family (family-type, autosomal recessive) sample and a fertility progressive muscular dystrophy (DMD) family (family two) , X chromosome recessive inheritance) samples were tested. A couple of families obtained 7 embryos by IVF and used MF-PCR method for PAH Gene detection, screening of 2 normal embryo implantation, and finally obtaining a baby girl, the umbilical cord blood gene test confirmed that the baby girl is normal. The two couples obtained 9 embryos through IVF, and used the MF-PCR method to carry out the DMD gene PGD. Three normal embryos were selected, and two of them were selected. Finally, a male baby (one of which was not developed) was passed through the umbilical cord. Blood genetic testing confirmed that the baby was normal.
家系一样本包括父母、 患病女儿 (先证者)外周血及 7个胚胎卵裂球单细胞。 经 PAH 基因检测, 父亲为 PAH基因 R243Q (c.728G>A)突变携带者, 母亲为 PAH基因 V399V (C.1197A>T)突变携带者, 先证者为 PAH基因 R243Q (c.728G>A)与 V399V (C.1197A>T)复合 突变, 表现为苯丙酮尿症。 7个胚胎卵裂球单细胞(分别标记为 Ell、 E12、 E13、 E14、 E15、 E16、 E17 ) 经 WGA后采用多重 PCR检测, 检测结果如表 1。  The family is the same as the parent, the sick daughter (proband) peripheral blood and 7 embryo blastomere single cells. According to the PAH gene test, the father is a carrier of PAH gene R243Q (c.728G>A), the mother is a carrier of PAH gene V399V (C.1197A>T) mutation, and the proband is PAH gene R243Q (c.728G>A ) Compound mutation with V399V (C.1197A>T), which is characterized by phenylketonuria. Seven embryo blastomere single cells (labeled Ell, E12, E13, E14, E15, E16, E17, respectively) were tested by multiplex PCR after WGA. The results are shown in Table 1.
表 1 家系一 Ί个胚胎的 MF-PCR检测结果  Table 1 MF-PCR results of one embryo in the family
Figure imgf000018_0001
Figure imgf000018_0001
家系二样本包括父母、 女儿(表型正常)外周血及 9个胚胎卵裂球单细胞。经 DMD基 因检测,父亲正常,母亲及女儿为 DMD基因 R2905X (c. 8713C>T)突变携带者。 9个胚胎卵 裂球单细胞 (分别标记为 E21、 E22、 E23、 E24、 E25、 E26、 E27、 E28、 E29 )经 WGA后 采用多重 PCR检测, 检测结果如表 2。  The second family sample included parents, daughter (normal phenotype) peripheral blood and 9 embryo blastomere single cells. After the DMD gene test, the father was normal, and the mother and daughter were carriers of the DMD gene R2905X (c. 8713C>T). Nine embryonic blastomeres (labeled E21, E22, E23, E24, E25, E26, E27, E28, E29) were tested by multiplex PCR after WGA. The results are shown in Table 2.
表 2家系二 9个胚胎的 MF-PCR检测结果  Table 2 Results of MF-PCR of two embryos in family 2
样本 检测结果  Sample test result
E21 女, 正常  E21 female, normal
E22 女, R2905X (c. 8713C>T)携带者  E22 Female, R2905X (c. 8713C>T) carrier
E23 男, R2905X (c. 8713C>T)突变  E23 male, R2905X (c. 8713C>T) mutation
E24 女, R2905X (c. 8713C>T)携带者  E24 female, R2905X (c. 8713C>T) carrier
E25 男, R2905X (c. 8713C>T)突变  E25 male, R2905X (c. 8713C>T) mutation
E26 女, 正常  E26 female, normal
E27 女, R2905X (c. 8713C>T)携带者  E27 Female, R2905X (c. 8713C>T) carrier
E28 男, 正常 E29 男, R2905X (c. 8713C>T)突变 E28 male, normal E29 male, R2905X (c. 8713C>T) mutation
采用本发明的技术方案和检测流程对上述样本进行回顾检测, 得到的检测结果与 MF-PCR检测结果相符, 结果符合率为 100%。 结果表明本发明的技术能够准确检测胚胎染 色体预定区域的 SNP信息, 并进一步基于获得的 SNP信息检测胚胎基因型指导胚胎植入, 且具有检测周期短 (11天)、 高通量、 低成本的优势。 具体实施按以下步骤操作:  The above samples were retrospectively tested by using the technical scheme and the detection procedure of the present invention, and the obtained test results were consistent with the MF-PCR detection results, and the result coincidence rate was 100%. The results show that the technology of the present invention can accurately detect the SNP information of the predetermined region of the embryo chromosome, and further detect the embryo genotype to guide the embryo implantation based on the obtained SNP information, and has a short detection period (11 days), high throughput, low cost. Advantage. The specific implementation is as follows:
1. 样本提取与 WGA ( 1天)  1. Sample extraction and WGA (1 day)
父母、 先证者外周血采用 QIAamp DNA Blood MidiKit (Qiagen)试剂盒按说明提取 DNA, 并用 Nanodrop检测, 浓度大于 30ng/ul. 7个胚胎卵裂球单细胞分别采用 REPLI-g ® Single Cell WGA kit (Qiagen)试剂盒并按操作说明进行全基因组扩增,产物进行琼脂糖凝胶电泳及 Qubit 定量。 样品标记分别为: Fl、 Ml、 Pl、 Ell、 E12、 E13、 E14、 E15、 E16、 E17, F2、 M2、 P2、 E21、 E22、 E23、 E24、 E25、 E26、 E27、 E28、 E29。  Parents, probands peripheral blood using QIAamp DNA Blood MidiKit (Qiagen) kit according to the instructions to extract DNA, and using Nanodrop detection, the concentration is greater than 30ng / ul. 7 embryo blastomere single cells using REPLI-g ® Single Cell WGA kit (Qiagen) kit and complete genome-wide amplification according to the instructions. The product was subjected to agarose gel electrophoresis and qubit quantification. The sample marks are: Fl, Ml, Pl, Ell, E12, E13, E14, E15, E16, E17, F2, M2, P2, E21, E22, E23, E24, E25, E26, E27, E28, E29.
2. Illumina Hiseq文库构建(2天)  2. Illumina Hiseq library construction (2 days)
上述获得的 DNA样品及 WGA产物先用 CovarisTM打断仪打断至 200bp大小的片段,然后 根据 illumia®公司 HiSeq2000TM测序仪的上机要求进行建库, 具体步骤如下:  The DNA samples and WGA products obtained above were first interrupted with a CovarisTM interrupter to a fragment of 200 bp, and then constructed according to the requirements of the illumia® HiSeq2000TM sequencer. The specific steps are as follows:
2.1 样品打断 2.1 Sample interruption
22管基因组 DNA及 WGA产物各取总量 3ug用 Covaris microTube with AFA fiber and Snap - Cap在 Covaris S2(Covaris公司)上打断。 打断条件如下:  A total of 3 ug of 22 genomic DNA and WGA products were interrupted on Covaris S2 (Covaris) using Covaris microTube with AFA fiber and Snap - Cap. The breaking conditions are as follows:
Figure imgf000019_0001
Figure imgf000019_0001
打断后用 Qiagen DNA Purification Kit ( Qiagen) 纯化, 溶于 327.5μ1的 EB中  After interruption, purified with Qiagen DNA Purification Kit (Qiagen), dissolved in 327.5μ1 EB
2.2末端修复: 2.2 end repair:
取纯化产物 37.5μί, 进行末端修复反应, 体系如下 (试剂均购自 Enzymatics公司): The purified product was subjected to 37.5 μί, and the end-repair reaction was carried out, and the system was as follows (reagents were purchased from Enzymatics):
上一步产物 3 .5  Previous product 3 .5
10x 多核苷酸激酶缓冲液 (B904) 5 μL  10x Polynucleotide Kinase Buffer (B904) 5 μL
dNTP Solutm Set(10mM each) 2 μ  dNTP Solutm Set (10mM each) 2 μ
T4 DNA聚合酶 2.5  T4 DNA polymerase 2.5
T4 多核苷酸激酶 2.5  T4 polynucleotide kinase 2.5
Klenow 片段 0.5
Figure imgf000020_0001
Klenow Fragment 0.5
Figure imgf000020_0001
反应条件为: Thermomixer 20 °C温浴 30 min。  The reaction conditions were: Thermomixer 20 ° C warm bath for 30 min.
反应产物经 Qiagen DNA Purification Kit回收纯化, 溶于 32 μΐ的 ΕΒ中。  The reaction product was recovered by Qiagen DNA Purification Kit and dissolved in 32 μM of hydrazine.
2.3 3'末端加 Α反应 2.3 3' end addition reaction
DNA的 3'末端加 A反应, 体系如下 (试剂均购自 Enzymatics公司):  The 3' end of the DNA was reacted with A, and the system was as follows (reagents were purchased from Enzymatics):
Figure imgf000020_0002
Figure imgf000020_0002
反应条件为: Thermomixer 37 °C温浴 30 min。  The reaction conditions were: Thermomixer at 37 ° C for 30 min.
反应产物经 Qiagen DNA Purification Kit (QIAGEN公司) 回收纯化, 溶于 38μ1的 EB中。 2.4连接 Illumina Hiseq接头 (adaptor)  The reaction product was recovered and purified by Qiagen DNA Purification Kit (QIAGEN) and dissolved in 38 μl of EB. 2.4 connection Illumina Hiseq connector (adaptor)
22个文库分别加不同的文库标签, 并记录下文库标签和文库的对应关系。体系如下(试 剂均购自 Illumina公司):  Twenty-two libraries were each added with different library tags, and the correspondence between library tags and libraries was recorded. The system is as follows (the reagents are all purchased from Illumina):
Figure imgf000020_0003
Figure imgf000020_0003
反应条件为: Thermomixer 16 °C温浴 16 h。  The reaction conditions were as follows: Thermomixer 16 ° C bath for 16 h.
反应产物经 60ul Ampure Beads(Beckman Coulter Genomics)纯化后溶 20μΙ^ΕΒ。  The reaction product was purified by 60 ul of Ampure Beads (Beckman Coulter Genomics) and dissolved 20 μM.
2.5 文库构建完成后经 Agilent®Bioanalyzer 2100检测片段分布范围符合要求, 结果如图 3, 经荧光定量 PCR (QPCR)检测到文库浓度结果如表 3: 2.5 After the library was constructed, the range distribution of the fragments was determined by Agilent® Bioanalyzer 2100. The results are shown in Figure 3. The results of library concentration detected by real-time PCR (QPCR) are shown in Table 3:
表 3 QPCR定量检测文库的相对浓度  Table 3 QPCR quantitative detection library relative concentration
样本 文库号 QPCR浓度(nM )  Sample library number QPCR concentration (nM)
F1 文库 1 66.14 Ml 文库 2 53.62 F1 Library 1 66.14 Ml Library 2 53.62
PI 文库 3 47.35  PI Library 3 47.35
Ell 文库 4 76.30  Ell Library 4 76.30
E12 文库 5 53.77  E12 Library 5 53.77
E13 文库 6 90.65  E13 Library 6 90.65
E14 文库 7 78.46  E14 Library 7 78.46
E15 文库 8 47.86  E15 Library 8 47.86
E16 文库 9 71.87  E16 Library 9 71.87
E17 文库 10 51.92  E17 Library 10 51.92
F2 文库 11 60.54  F2 Library 11 60.54
M2 文库 12 63.42  M2 Library 12 63.42
P2 文库 13 57.65  P2 Library 13 57.65
E21 文库 14 67.35  E21 Library 14 67.35
E22 文库 15 54.76  E22 Library 15 54.76
E23 文库 16 70.66  E23 Library 16 70.66
E24 文库 17 75.26  E24 Library 17 75.26
E25 文库 18 57.14  E25 Library 18 57.14
E26 文库 19 72.07  E26 Library 19 72.07
E27 文库 20 56.91  E27 Library 20 56.91
E28 文库 21 71.87  E28 Library 21 71.87
E29 文库 22 61.94  E29 Library 22 61.94
3、 芯片捕获 (3天)  3, chip capture (3 days)
上述 22个文库分 2组, 每组 11个, 按等比例混合成总量 500ng的 2个混合文库。 混合文库 采用 NimbleGen公司定制的液相芯片 SeqCap EZ Choice XL Library按操作说明进行杂交 (具 体步骤见 Nimblegen SeqCap EZ Exome Capture操作说明书)。 杂交 72个小时后采用 NmibleGenwashkit按操作说明进行洗脱。 最后洗脱产物进行富集度检测、 Qpcr和 2100检测。 The above 22 libraries were divided into 2 groups of 11 each, which were mixed in equal proportions into a total of 500 ng of 2 mixed libraries. Hybrid Library Hybridization was performed using NimbleGen's custom-made liquid phase chip SeqCap EZ Choice XL Library (see Nimblegen SeqCap EZ Exome Capture Operating Instructions for specific procedures). After 72 hours of hybridization, elution was performed using the NmibleGenwashkit according to the instructions. The final eluted product was subjected to enrichment detection, Qpcr and 2100 detection.
4、 Hiseq2500测序(3天) 4, Hiseq2500 sequencing (3 days)
上述杂交产物上 illumina® HiSeq2500TM测序仪测序, 测序循环数为 PElOlindex (即双 向 lOlbp index测序), 其中仪器的参数设置及操作方法都按照 illumina®操作手册 (可由 http:〃 www.illumina.com/support/documentation.ilmn获取 )。  The above hybridization products were sequenced on an illumina® HiSeq2500TM sequencer, and the number of sequencing cycles was PElOlindex (ie, bidirectional lOlbp index sequencing), in which the instrument parameters were set and operated in accordance with the illumina® operating manual (available at http:〃 www.illumina.com/support) /documentation.ilmn gets).
5、 结果分析 (2天) 5. Analysis of results (2 days)
测序完成后,首先对测序数据进行质量过滤和去除接头污染的序列,高质量的测序 reads 的进行以下分析: After sequencing is complete, the sequencing data is first subjected to mass filtration and removal of the contaminant-contaminated sequence, high-quality sequencing reads. Perform the following analysis:
5.1 总体数据评价 5.1 Overall data evaluation
在数据分析过程中,使用比对软件 BWA (version 0.5.10)将测序 reads比对到人类参考基因 组 (HG19, NCBI release GRCh37)上, 参数设置为 (-1 -i 15 -L -k 2 -1 31 -t 4), 取比对结果中唯 一比对到芯片目标区域的 reads并用 SAMtools去除 PCR重复扩展的序列进行后续分析。 测序 得到的数据量, 如 (表 4) 中所示。  In the data analysis process, the sequencing reads were aligned to the human reference genome (HG19, NCBI release GRCh37) using the comparison software BWA (version 0.5.10), and the parameter was set to (-1 -i 15 -L -k 2 - 1 31 -t 4), the only comparison in the alignment results to the target region of the chip and the SAMtools removal PCR repeat extension sequence for subsequent analysis. The amount of data obtained by sequencing is shown in (Table 4).
Figure imgf000022_0001
父母及先证者的外周血样品测序深度约为 100x, 胚胎细胞 WGA样品测序深度约为 50χ。 然后采用 Genome Analysis Toolkit(GATK)软件包进行个样本 SNP及 indel分析,得到各个样本 的基因型。 部分基因区域基因型如 (表 5、 表 6)所示:
Figure imgf000022_0001
The peripheral blood samples of parents and probands were sequenced to a depth of approximately 100x, and the embryonic cell WGA samples were sequenced to a depth of approximately 50χ. Then, a sample SNP and indel analysis were performed using the Genome Analysis Toolkit (GATK) software package to obtain the genotype of each sample. Part of the gene region genotypes are shown in (Table 5, Table 6):
表 5各样本 3分 PAH基区区域基因型  Table 5 samples 3 points PAH base region genotype
¾ 父亲 母亲 先证者 El E2 E3 E4 E5 E6 E73⁄4 father mother proband El E2 E3 E4 E5 E6 E7
103075083 AC CC CC CC AC AC CC CC CC AC103075083 AC CC CC CC AC AC CC CC CC AC
103075442 AA AT AT AA AA AA AA AT AA AT103075442 AA AT AT AA AA AA AA AT AA AT
103075731 AA AT AA AT AT AT AT AA AT AA103075731 AA AT AA AT AT AT AT AA AT AA
103077486 CC CG CC CC CG CG CG CC CG CC103077486 CC CG CC CC CG CG CG CC CG CC
103099439 GG AG GG AG AG AG AG GG AG 103099439 GG AG GG AG AG AG AG GG AG
103104834 TT AA AT AT AT AT AT AT AT  103104834 TT AA AT AT AT AT AT AT AT AT
103106883 TT TG TT TG TG TG TG TT TG  103106883 TT TG TT TG TG TG TG TT TG
103107367 GG TG TG GG GG TG GG TG 103107367 GG TG TG GG GG TG GG TG
103110943 TC CC TC TC CC TC TC TC CC103110943 TC CC TC TC CC TC TC TC CC
103132740 AG AA AG AG AA AA AG AG AG AA103132740 AG AA AG AG AA AA AG AG AG AA
103140560 TT TC TC TT TT TT TT TC TT TC103140560 TT TC TC TT TT TT TT TC TT TC
103148974 TC TT CC TC TT TT TC TC TC TT103148974 TC TT CC TC TT TT TC TC TC TT
103152029 AC CC AC AC CC CC AC AC AC CC103152029 AC CC AC AC CC CC AC AC AC CC
103154308 AG AA AA AA AA AG AA AA AG103154308 AG AA AA AA AA AG AA AA AG
103164355 TC CC CC TC TC CC CC CC TC103164355 TC CC CC TC TC CC CC CC TC
103164544 AG AA AA AA AG AG AA AA AA AG103164544 AG AA AA AA AG AG AA AA AA AG
103174710 AC AA AA AA AC AC AA AA AA AC103174710 AC AA AA AA AC AC AA AA AA AC
103175259 CT CC CC CC CT CT CC CC CC CT103175259 CT CC CC CC CT CT CC CC CC CT
103176419 GC CC CC CC GC GC CC CC GC103176419 GC CC CC CC GC GC CC CC GC
103214192 CA AA AA AA CA CA AA AA AA CA103214192 CA AA AA AA CA CA AA AA AA CA
103237426 AA ΑΓ ΑΓ AA AA AA AT AA ΑΓ103237426 AA ΑΓ ΑΓ AA AA AA AT AA ΑΓ
103246707 GA GG GA GG GG GA GG103246707 GA GG GA GG GG GA GG
103246787 CG CC CG CG CC CC CG CG GG CC103246787 CG CC CG CG CC CC CG CG GG CC
103424228 TG TT TT TT TG TG TT TT TT TG103424228 TG TT TT TT TG TG TT TT TT TG
103425386 TG GG GG GG TG TG GG GG GG TG103425386 TG GG GG GG TG TG GG GG GG TG
103428340 AG AA AG AG AA AA AG AG AG AA103428340 AG AA AG AG AA AA AG AG AG AA
103428555 AA AG AA AG AG AG AG AA AG AA103428555 AA AG AA AG AG AG AG AA AG AA
103429407 GG TG GG TG TG TG GG TG GG 103432532 CC TC TC CC CC CC TC CC TC 103429407 GG TG GG TG TG TG GG TG GG 103432532 CC TC TC CC CC CC TC CC TC
103434254 AG AA AA AA AG AG AA AA AA AG  103434254 AG AA AA AA AG AG AA AA AA AG
103443364 CT TT TT TT TT CT TT TT TT CT  103443364 CT TT TT TT TT CT TT TT TT CT
103445655 CT CC CC CT CT CC CC CC CT  103445655 CT CC CC CT CC CC CC CC CT
103448748 TC TT TC TC TT TT TC TC CC TT  103448748 TC TT TC TC TT TT TC TC CC TT
103456084 AT AA AT AT AA AA AT AT TT AA  103456084 AT AA AT AT AA AA AT AT TT AA
103456562 TT CT CT TT TT TT TT CT TT CT  103456562 TT CT CT TT TT TT TT CT TT CT
103459335 CT TT TT TT CT CT TT TT TT CT  103459335 CT TT TT TT CT CT TT TT TT CT
103460207 GT TT TT TT GT GT TT TT TT GT  103460207 GT TT TT TT GT GT TT TT TT GT
103463741 AA AG AG AA AA AA AA AG AA AG  103463741 AA AG AG AA AA AA AA AG AA AG
103488660 TT CC TC TC TC TC TT TC TC TC  103488660 TT CC TC TC TC TC TT TC TC TC
103488841 CT TT TT TT CT CT TT TT TT CT  103488841 CT TT TT TT CT CT TT TT TT CT
103491018 TG GG GG GG TG GG GG GG TG  103491018 TG GG GG GG TG GG GG GG TG
103495380 AG GG GG GG AG AG GG GG GG  103495380 AG GG GG GG AG AG GG GG GG
103496446 TT CT CT TT TT TT TT CT TT CT  103496446 TT CT CT TT TT TT TT CT TT CT
103501101 AC AA AA AA AC AC AA AA AA AC  103501101 AC AA AA AA AC AC AA AA AA AC
103501562 CC TC CC TC TC TC TC CC TC CC  103501562 CC TC CC TC TC TC TC CC TC CC
103515016 TT AT TT AT AT AT AT TT AT TT 该 SNP信息对应参考基因组的反义链。 -表示该处无法得到 SNP (无数据覆盖或深度太低), 斜体表示致病突变。 表中 103237426坐标和 103246707坐标对应的是 PAH数据库中 V399V (C.1197A>T) 与 R243Q (c.728G>A)位点。为了便于理解, 已经将该两个突变位点的反义链信息改成对应的正义链的形式表示。  103515016 TT AT TT AT AT AT AT TT AT TT This SNP information corresponds to the antisense strand of the reference genome. - Indicates that SNP is not available at this point (no data coverage or depth is too low), and italics indicate disease-causing mutations. The 103237426 coordinates and the 103246707 coordinates in the table correspond to the V399V (C.1197A>T) and R243Q (c.728G>A) sites in the PAH database. For ease of understanding, the antisense strand information of the two mutation sites has been changed to the formal representation of the corresponding sense strand.
表 6各样本部分 DMD基因区域基因型  Table 6 Part of the sample DMD gene regional genotype
先证  Proof
位置 父亲 母亲 E21 E22 E23 E24 E25 E26 E27 E28 E29 者  Location Father Mother E21 E22 E23 E24 E25 E26 E27 E28 E29
31838359 T GT GT TT TG G TG G TT TG G 31838359 T GT GT TT TG G TG G TT TG G
31859140 G AG GG AG GG G GG G AG GG A G31859140 G AG GG AG GG G GG G AG GG A G
31859179 A AG AG AA AG G AG G AA AG A G31859179 A AG AG AA AG G AG G AA AG A G
31860203 A AG AG AA AG G AG G AA AG A G31860203 A AG AG AA AG G AG G AA AG A G
31863187 A AG AA AG AA A AA A AG AA G A31863187 A AG AA AG AA A AA A AG AA G A
31863193 G AT AT GT AG A AG A GT AG T A31863193 G AT AT GT AG A AG A GT AG T A
31863313 T TC TC TT TC C TC C TT TC T C
Figure imgf000025_0001
31863313 T TC TC TT TC C TC C TT TC TC
Figure imgf000025_0001
C8.1780/C10ZN3/X3d 086Ζ ΪΟΖ OAV 32889584 C TC CC TC CC C CC C TC CC T CC8.1780/C10ZN3/X3d 086Ζ ΪΟΖ OAV 32889584 C TC CC TC CC C CC C TC CC TC
32889622 A AG AA AG AA A AA A AG AA G A32889622 A AG AA AG AA A AA A AG AA G A
32889854 G AG GG AG GG G GG G AG GG A G32889854 G AG GG AG GG G GG G AG GG A G
32890041 T GT TT TG TT T TT T TT G T 32890041 T GT TT TG TT T TT T TT G T
-表示该处无法得到 SNP (无数据覆盖或深度太低), 斜体表示致病突变。 表中 32456388 坐标对应的是 DMD 数据库中- Indicates that SNP is not available at this point (no data coverage or depth is too low), and italics indicate disease-causing mutations. The 32456388 coordinates in the table correspond to the DMD database.
R2905X (c. 87130T)位点。 R2905X (c. 87130T) locus.
5.2父母单体型构建 5.2 Parental haplotype construction
根据父母及先证者的 SNP信息按照上述图 4所示方法可以构建父母单体型, 包括致病 突变所在的单体型, 表 7、 表 8分别表示 PAH及 DMD基因部分位置的单体型构建。  Parental haplotypes can be constructed according to the SNP information of parents and probands according to the method shown in Figure 4 above, including the haplotypes in which the disease-causing mutations are located. Tables 7 and 8 show the haplotypes of PAH and DMD genes, respectively. Construct.
表 7 PAH基区父母单本型构建  Table 7 PAH base area parental single-type construction
位置 父亲 母亲 先证者 F-Hapl F-Hap2 M-Hapl M-Hap2  Location Father Mother Proband F-Hapl F-Hap2 M-Hapl M-Hap2
103075083 AC CC CC C A C C 103075083 AC CC CC C A C C
103075442 AA AT AT A A T A103075442 AA AT AT A A T A
103075731 AA AT AA A A A T103075731 AA AT AA A A A T
103077486 CC CG CC C C C G103077486 CC CG CC C C C G
103099439 GG AG GG G G G A103099439 GG AG GG G G G A
103104834 TT AA AT T T A A103104834 TT AA AT T T A A
103106883 TT TG IT T T T G103106883 TT TG IT T T T G
103107367 GG TG TG G G T G103107367 GG TG TG G G T G
103110943 TC CC TC T C C C103110943 TC CC TC T C C C
103132740 AG AA AG G A A A103132740 AG AA AG G A A A
103140560 TT TC TC T T C T103140560 TT TC TC T T C T
103148974 TC TT CC c T T T103148974 TC TT CC c T T T
103152029 AC CC AC A C c C103152029 AC CC AC A C c C
103154308 AG AA AA A G A A103154308 AG AA AA A G A A
103164355 TC CC CC C T c C103164355 TC CC CC C T c C
103164544 AG AA AA A G A A103164544 AG AA AA A G A A
103174710 AC AA AA A C A A103174710 AC AA AA A C A A
103175259 CT CC CC C T C C103175259 CT CC CC C T C C
103176419 GC CC CC C G c C 103214192 CA AA AA A c A A103176419 GC CC CC CG c C 103214192 CA AA AA A c AA
103237426 AA ΑΓ ΑΓ A A T A 103237426 AA ΑΓ ΑΓ A A T A
103246707 GA GG GA A G G G  103246707 GA GG GA A G G G
103246787 CG CC CG G C C C  103246787 CG CC CG G C C C
103424228 TG TT IT T G T T  103424228 TG TT IT T G T T
103425386 TG GG GG G T G G  103425386 TG GG GG G T G G
103428340 AG AA AG G A A A  103428340 AG AA AG G A A A
103428555 AA AG AA A A A G  103428555 AA AG AA A A A G
103429407 GG TG GG G G G T  103429407 GG TG GG G G G T
103432532 CC TC TC C C T c  103432532 CC TC TC C C T c
103434254 AG AA AA A G A A  103434254 AG AA AA A G A A
103443364 CT TT IT T C T T  103443364 CT TT IT T C T T
103445655 CT CC CC C T C c  103445655 CT CC CC C T C c
103448748 TT TC IT T T T c  103448748 TT TC IT T T T c
103456084 AA TA TA A A T A  103456084 AA TA TA A A T A
103456562 TT CT CT T T c T  103456562 TT CT CT T T c T
103459335 CT TT TT T C T T  103459335 CT TT TT T C T T
103460207 GT TT IT T G T T  103460207 GT TT IT T G T T
103463741 AA AG AG A A G A  103463741 AA AG AG A A G A
103488660 TT CC TC T T C c  103488660 TT CC TC T T C c
103488841 CT TT IT T C T T  103488841 CT TT IT T C T T
103491018 TG GG GG G T G G  103491018 TG GG GG G T G G
103495380 AG GG GG G A G G  103495380 AG GG GG G A G G
103496446 TT CT CT T T C T  103496446 TT CT CT T T C T
103501101 AC AA AA A C A A  103501101 AC AA AA A C A A
103501562 CC TC CC C C C T  103501562 CC TC CC C C C T
103515016 TT AT ΊΤ T T T A  103515016 TT AT ΊΤ T T T A
表中 F-Hapl、 F-Hap2分别表示父亲两个单体型, M-Hapl , M-Hap2分别表示母亲两个单体型。 该 SNP信息对应参考 基因组的负链。 -表示该处无法得到 SNP (无数据覆盖或深度太低), 斜体为致病突变。 表中 103237426坐标和 103246707 坐标对应的是 PAH数据库中 V399V (c.ll97A>T) 与 R243Q (c.728G>A)位点。 为了便于理解, 已经将该两个突变位点的反 义链信息改成对应的正义链的形式表示。  In the table, F-Hapl and F-Hap2 respectively represent the father's two haplotypes, and M-Hapl and M-Hap2 represent the mother's two haplotypes, respectively. This SNP information corresponds to the negative strand of the reference genome. - Indicates that there is no SNP (no data coverage or too low depth) and italic mutations. The 103237426 coordinates and 103246707 coordinates in the table correspond to the V399V (c.ll97A>T) and R243Q (c.728G>A) sites in the PAH database. For ease of understanding, the antisense strand information of the two mutation sites has been changed to the form representation of the corresponding sense strand.
表 8 DMD基因父母单体型构建
Figure imgf000028_0001
Table 8 DMD gene parent haplotype construction
Figure imgf000028_0001
LI LI
C8.1780/C10ZN3/X3d 086Z ΪΟΖ OAV 32579849 C TC CC c C TC8.1780/C10ZN3/X3d 086Z ΪΟΖ OAV 32579849 C TC CC c CT
32580579 c TC TC c T C32580579 c TC TC c T C
32827465 A AG AG A G A32827465 A AG AG A G A
32858090 T TC TC T C T32858090 T TC TC T C T
32862539 G AG GG G G A32862539 G AG GG G G A
32886984 C CG CC C C G32886984 C CG CC C C G
32887091 T TC TT T T C32887091 T TC TT T T C
32887278 A AG AA A A G32887278 A AG AA A A G
32889584 C TC CC C C T32889584 C TC CC C C T
32889622 A AG AA A A G32889622 A AG AA A A G
32889854 G AG GG G G A32889854 G AG GG G G A
32890041 T GT TT T T G 表中 F-Hap表示父亲单体型 (男性只有一条 X染色体〕, M-Hapl , M-Hap2分别表示母亲两个单体型。 斜体为致病突变。 表中 32456388坐标对应的是 DMD数据库中 R2905X (c. 87130T)位点。 32890041 T GT TT TTG The F-Hap in the table indicates the father haplotype (the male has only one X chromosome), M-Hapl and M-Hap2 indicate the mother's two haplotypes respectively. The italic is the pathogenic mutation. The coordinates of 32456388 in the table correspond. The R2905X (c. 87130T) site in the DMD database.
5.3胚胎单体型分析 5.3 Embryo haplotype analysis
根据表 5、 6中胚胎 SNP信息及表 7、 8中父母单体型信息按照图 4所示方法对胚胎区 分型 SNPs进行统计, 然后根据对应每条单体型支持的 SNP数目多少判断出胚胎单体型, 进而判断胚胎是否致病。 对于常染色体, 一个胚胎只有 2个单体型, 一般也只有两个单体 型有 SNP支持, 但偶尔会出现第 3或第 4条单体型, 这是由于 SNP错误导致, 这种错误的 SNP在总 SNP中低于 5%。此夕卜, 由于 ADO及测序错误的存在,胚胎 SNP会存在个别 SNP 丢失或错误现象, 为避免这种错误对结果的影响, 我们规定一条单体型至少有 10个区分型 SNPs支持。 本实施例的大量数据表明, 错误的单体型所支持的区分型 SNPs—般不超过 3 个, 而正确的单体型所支持的区分型 SNPs会大于 20个, 这说明个别错误不会影响胚胎单 体型判断。 因而, 为确保结果准确, 本发明将正确单体型的 SNP支持数定义为不少于 10 个, 错误单体型的 SNP数不大于 3个。 具体分析流程如图 5所示。 图 5显示的为一常染色 体隐性遗传病的胚胎状态分析流程, 其中父母的 Hapl为致病突变所在单体型。 图中所示个 别胚胎出现了 SNP支持第 3个单体型, 但支持的 SNP非常少, 不会影响结果判断。  According to the embryonic SNP information in Tables 5 and 6, and the parental haplotype information in Tables 7 and 8, the embryo-disaggregated SNPs were counted according to the method shown in Fig. 4, and then the embryos were judged according to the number of SNPs supported by each haplotype. The haplotype is used to determine whether the embryo is ill. For autosomes, an embryo has only 2 haplotypes, and generally only two haplotypes have SNP support, but occasionally a 3rd or 4th haplotype occurs, which is due to a SNP error. The SNP is less than 5% in the total SNP. Furthermore, due to the existence of ADO and sequencing errors, there may be individual SNP loss or error in the embryonic SNP. To avoid the impact of this error on the results, we require a haplotype with at least 10 differentiated SNPs to support. The large amount of data in this embodiment shows that the wrong haplotypes support no more than three distinct SNPs, and the correct haplotypes support more than 20 differentiated SNPs, indicating that individual errors will not affect. Embryo haplotype judgment. Therefore, in order to ensure accurate results, the present invention defines the number of SNP supports of the correct haplotype to be no less than 10, and the number of SNPs of the wrong haplotype is not more than three. The specific analysis process is shown in Figure 5. Figure 5 shows the embryonic state analysis process for a chromosomal recessive genetic disease in which the parent's Hapl is the haplotype of the disease-causing mutation. The individual embryos shown in the figure show that the SNP supports the third haplotype, but the number of SNPs supported is very small and does not affect the judgment of the results.
从以上分析结果即可判断胚胎状态, 如表 9所示。 该结果与传统方法 MF-PCR检测结 果一致, 结果符合率为 100%。。 上述流程开发软件自动完成。  The embryo status can be judged from the above analysis results, as shown in Table 9. This result is consistent with the results of the traditional method of MF-PCR, and the coincidence rate is 100%. . The above process development software is automatically completed.
表 9各胚胎检测结果  Table 9 results of each embryo test
样本 检测结果 Ell R243Q (c.728G>A)携带者 Sample test result Ell R243Q (c.728G>A) carrier
E12 正常  E12 is normal
E13 正常  E13 is normal
E14 R243Q (c.728G>A)携带者  E14 R243Q (c.728G>A) carrier
E15 R243Q (c.728G>A)合并 V399V (C.1197A>T)突变 E15 R243Q (c.728G>A) combined with V399V (C.1197A>T) mutation
E16 R243Q (c.728G>A)携带者 E16 R243Q (c.728G>A) carrier
E17 V399V (C.1197A>T)携带者  E17 V399V (C.1197A>T) carrier
E21 女, 正常  E21 female, normal
E22 女, R2905X (c. 8713C>T) 携带者  E22 Female, R2905X (c. 8713C>T) carrier
E23 男, R2905X (c. 8713C>T) 突变  E23 Male, R2905X (c. 8713C>T) Mutation
E24 女, R2905X (c. 8713C>T) 携带者  E24 Female, R2905X (c. 8713C>T) carrier
E25 男, R2905X (c. 8713C>T)突变  E25 male, R2905X (c. 8713C>T) mutation
E26 女, 正常  E26 female, normal
E27 女, R2905X (c. 8713C>T) 携带者  E27 Female, R2905X (c. 8713C>T) carrier
E28 男, 正常  E28 male, normal
E29 男, R2905X (c. 8713C>T)突变  E29 male, R2905X (c. 8713C>T) mutation
工业实用性 Industrial applicability
本发明的确定 (胚胎) 染色体预定区域中 SNP信息的方法、 系统和计算机可读介质, 能够有效地用于确定染色体预定区域中 SNP信息, 例如胚胎染色体预定区域中 SNP信息, 并且该信息准确度高, 能够有效地用于确定胎儿的遗传状态是正常、 携带或致病, 从而能 够为胚胎植入前单基因病检测、 孕妇产前诊断或临床疾病治疗提供依据。 尽管本发明的具体实施方式已经得到详细的描述, 本领域技术人员将会理解。 根据已 经公开的所有教导, 可以对那些细节进行各种修改和替换, 这些改变均在本发明的保护范 围之内。 本发明的全部范围由所附权利要求及其任何等同物给出。  The method, system and computer readable medium of the present invention for determining SNP information in a predetermined region of an (embryo) chromosome can be effectively used to determine SNP information in a predetermined region of a chromosome, such as SNP information in a predetermined region of an embryonic chromosome, and the accuracy of the information High, can be effectively used to determine the genetic status of the fetus is normal, carrying or causing disease, which can provide a basis for pre-implantation monogenic disease detection, prenatal diagnosis of pregnant women or clinical disease treatment. Although specific embodiments of the invention have been described in detail, those skilled in the art will understand. Various modifications and alterations of those details are possible in light of the teachings of the invention. The full scope of the invention is given by the appended claims and any equivalents thereof.
在本说明书的描述中, 参考术语"一个实施例"、 "一些实施例"、 "示意性实施例"、 "示 例"、 "具体示例"、 或 "一些示例"等的描述意指结合该实施例或示例描述的具体特征、 结 构、 材料或者特点包含于本发明的至少一个实施例或示例中。 在本说明书中, 对上述术语 的示意性表述不一定指的是相同的实施例或示例。 而且, 描述的具体特征、 结构、 材料或 者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。  In the description of the present specification, the description of the terms "one embodiment", "some embodiments", "illustrative embodiment", "example", "specific example", or "some examples", etc. Particular features, structures, materials or features described in the examples or examples are included in at least one embodiment or example of the invention. In the present specification, the schematic representation of the above terms does not necessarily mean the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in a suitable manner in any one or more embodiments or examples.

Claims

权利要求书 claims
1、 一种确定染色体预定区域中 SNP信息的方法, 其特征在于, 包括: 1. A method for determining SNP information in a predetermined region of a chromosome, which is characterized by including:
针对染色体的至少一部分, 构建测序文库; Construct a sequencing library for at least a portion of the chromosome;
利用探针对所述测序文库进行筛选, 其中, 所述探针特异性识别所述预定区域中已知 SNP位点的至少一个, 以便获得目标捕获片段, 所述目标捕获片段包含 SNP位点; Screening the sequencing library using a probe, wherein the probe specifically recognizes at least one known SNP site in the predetermined region to obtain a target capture fragment, and the target capture fragment includes the SNP site;
对经过筛选的测序文库进行测序, 以便获得测序结果; 以及 Sequencing the screened sequencing library to obtain sequencing results; and
基于所述测序结果, 确定所述预定区域中的 SNP信息。 Based on the sequencing results, SNP information in the predetermined region is determined.
2、 根据权利要求 1 所述的方法, 其特征在于, 所述预定区域包括目标基因区域和 SNP-marker区域。 2. The method according to claim 1, characterized in that the predetermined region includes a target gene region and a SNP-marker region.
3、 根据权利要求 2所述的方法, 其特征在于, 所述目标基因区域包括所述目标基因的 外显子和外显子毗邻区的至少一部分。 3. The method according to claim 2, wherein the target gene region includes at least a part of the exons and exon-adjacent regions of the target gene.
4、 根据权利要求 3所述的方法, 其特征在于, 所述外显子毗邻区包括所述外显子 5' 端上游 50bp的区域和所述外显子下游 50bp的区域。 4. The method according to claim 3, wherein the exon adjacent region includes a 50 bp region upstream of the 5' end of the exon and a 50 bp region downstream of the exon.
5、 根据权利要求 2所述的方法, 其特征在于, 所述 SNP-marker区域包括所述目标基 因上下游 1M的范围。 5. The method according to claim 2, characterized in that the SNP-marker region includes a range of 1M upstream and downstream of the target gene.
6、 根据权利要求 1所述的方法, 其特征在于, 所述探针的长度为 20~200nt。 6. The method according to claim 1, characterized in that the length of the probe is 20~200nt.
7、 根据权利要求 6所述的方法, 其特征在于, 所述探针的长度为 60~80nt。 7. The method according to claim 6, characterized in that the length of the probe is 60~80nt.
8、 根据权利要求 1所述的方法, 其特征在于, 所述探针是以芯片的形式提供的。 8. The method of claim 1, wherein the probe is provided in the form of a chip.
9、 根据权利要求 1所述的方法, 其特征在于, 所述染色体的至少一部分是通过对生物 体的外周血进行 DNA提取而获得的。 9. The method according to claim 1, characterized in that at least a part of the chromosome is obtained by DNA extraction from the peripheral blood of the organism.
10、 根据权利要求 1所述的方法, 其特征在于, 利用选自 Illumina Hiseq2000、 Genome Analyzer, Miseq测序系列、 Life technologies的 SOLiD测序系统、 Ion Torrent测序系统和罗 氏的 454测序系统的至少之一进行所述测序。 10. The method according to claim 1, characterized in that it is carried out using at least one selected from the group consisting of Illumina Hiseq2000, Genome Analyzer, Miseq sequencing series, SOLiD sequencing system of Life technologies, Ion Torrent sequencing system and Roche's 454 sequencing system. The sequencing.
11、 根据权利要求 1 所述的方法, 其特征在于, 基于所述测序结果, 确定所述预定区 域中的 SNP信息进一歩包括: 11. The method according to claim 1, characterized in that, based on the sequencing results, determining the SNP information in the predetermined region further includes:
将所述测序结果与参考序列进行比对, 以便获得唯一比对序列; 以及 Compare the sequencing results with the reference sequence to obtain a unique alignment sequence; and
利用 SNP分析软件从所述唯一比对序列获取所述预定区域中的 SNP信息。 Utilize SNP analysis software to obtain SNP information in the predetermined region from the unique aligned sequence.
12、 根据权利要求 11所述的方法, 其特征在于, 所述比对是利用 BWA软件包进行的。 12. The method according to claim 11, characterized in that the comparison is performed using the BWA software package.
13、 根据权利要求 11所述的方法, 其特征在于, 在获得唯一比对序列后, 进一步包括 从所述唯一比对序列去除 PCR重复扩展的序列。 13. The method according to claim 11, characterized in that, after obtaining the unique alignment sequence, further comprising removing PCR repeat expansion sequences from the unique alignment sequence.
14、 根据权利要求 11所述的方法, 其特征在于, 所述 SNP分析软件为选自 SAMtools 和 GATK的至少之一。 14. The method according to claim 11, characterized in that the SNP analysis software is at least one selected from SAMtools and GATK.
15、根据权利要求 11所述的方法, 其特征在于, 进一步包括对所获得的 SNP信息进行 过滤。 15. The method according to claim 11, further comprising filtering the obtained SNP information.
16、 根据权利要求 15所述的方法, 其特征在于, 所述过滤的条件为去除满足下列条件 之一的 SNP: 16. The method according to claim 15, characterized in that the filtering condition is to remove SNPs that meet one of the following conditions:
SNP测序深度低于 10 X, 优选低于 20 X ; 以及 SNP sequencing depth is less than 10X, preferably less than 20X; and
杂合 SNP中两种碱基测序深度差异高于 20%, 优选高于 10%, 更优选高于 5%。 The difference in sequencing depth of the two bases in the hybrid SNP is higher than 20%, preferably higher than 10%, and more preferably higher than 5%.
17、 一种确定胚胎染色体预定区域中 SNP信息的方法, 其特征在于, 包括: 获取所述胚胎的全基因组; 以及 17. A method for determining SNP information in a predetermined region of an embryo's chromosome, characterized by: obtaining the entire genome of the embryo; and
针对所述胚胎的全基因组, 根据权利要求 1~16任一项所述的方法, 确定所述胎儿染色 体预定区域中的 SNP信息。 For the entire genome of the embryo, the SNP information in the predetermined region of the fetal chromosome is determined according to the method described in any one of claims 1 to 16.
18、 根据权利要求 17所述的方法, 其特征在于, 所述胚胎的全基因组是通过对胚胎细 胞进行全基因组扩增而获得的。 18. The method according to claim 17, characterized in that the whole genome of the embryo is obtained by performing whole genome amplification of embryonic cells.
19、根据权利要求 18所述的方法,其特征在于,所述全基因组扩增是通过选自 PEP-PCR, DOP-PCR, OmniPlex WGA和 MDA的至少之一进行的。 19. The method of claim 18, wherein the whole genome amplification is performed by at least one selected from the group consisting of PEP-PCR, DOP-PCR, OmniPlex WGA and MDA.
20、 根据权利要求 17所述的方法, 其特征在于, 进一步包括: 20. The method of claim 17, further comprising:
获取胚胎遗传相关个体的全基因组, 其中, 所述胚胎遗传相关个体包括所述胚胎的父 亲、 母亲和先证者; 以及 Obtain the whole genome of embryonic genetically related individuals, where the embryonic genetically related individuals include the father, mother and proband of the embryo; and
基于所述胚胎遗传相关个体的全基因组, 分别确定所述父亲的 SNP信息, 所述母亲的 SNP信息以及所述先证者的 SNP信息; Based on the whole genome of the embryo genetically related individual, determine the SNP information of the father, the SNP information of the mother and the SNP information of the proband respectively;
基于所述父亲的 SNP信息和所述母亲的 SNP信息, 确定区分型 SNP; Based on the father's SNP information and the mother's SNP information, determine the distinguishing SNP;
基于所述区分型 SNP和所述先证者的 SNP信息, 确定父亲 SNP单体型和母亲 SNP单 体型; 以及 Based on the discriminating SNP and the SNP information of the proband, determine the paternal SNP haplotype and the maternal SNP haplotype; and
基于所述胚胎的 SNP信息、 父亲 SNP单体型和母亲 SNP单体型, 确定所述父亲 SNP 单体型和母亲 SNP单体型的组合方式, 以便获得所述胚胎的 SNP单体型。 Based on the SNP information of the embryo, the paternal SNP haplotype and the maternal SNP haplotype, a combination of the paternal SNP haplotype and the maternal SNP haplotype is determined to obtain the SNP haplotype of the embryo.
21、根据权利要求 20所述的方法, 其特征在于, 所述胚胎的 SNP单体型是通过下列步 骤获得的: 21. The method according to claim 20, characterized in that the SNP haplotype of the embryo is obtained through the following steps:
确定胚胎的 SNP信息显著支持的父亲单体型作为胚胎的父本来源单体型; 以及 确定胚胎的 SNP信息显著支持的母亲单体型作为胚胎的母本来源单体型。 Determine the paternal haplotype significantly supported by the embryo's SNP information as the embryo's paternally derived haplotype; and Determine the maternal haplotype significantly supported by the embryo's SNP information as the embryo's maternally derived haplotype.
22、 根据权利要求 21所述的方法, 其特征在于, 所述区分型 SNP数不低于 10个是显 著支持的指示。 22. The method according to claim 21, wherein the number of discriminating SNPs being no less than 10 is an indication of significant support.
23、 一种确定染色体预定区域中 SNP信息的设备, 其特征在于, 包括: 23. A device for determining SNP information in a predetermined region of a chromosome, characterized by including:
文库构建装置, 所述文库构建装置适于针对染色体的至少一部分, 构建测序文库; 文库筛选装置, 所述文库筛选装置与所述文库构建装置相连, 并且适于利用探针对所 述测序文库进行筛选, 其中, 所述探针特异性识别所述预定区域中已知 SNP位点的至少一 个, 以便获得目标捕获片段, 所述目标捕获片段包含 SNP位点; A library construction device, the library construction device is suitable for constructing a sequencing library for at least a part of the chromosome; Library screening device, the library screening device is connected to the library construction device, and is suitable for screening the sequencing library using a probe, wherein the probe specifically recognizes a known SNP site in the predetermined region At least one of, in order to obtain a target capture fragment, the target capture fragment includes a SNP site;
测序装置, 所述测序装置与所述文库筛选装置相连, 适于对经过筛选的测序文库进行 测序, 以便获得测序结果; 以及 A sequencing device, which is connected to the library screening device and is suitable for sequencing the screened sequencing library to obtain sequencing results; and
分析装置, 所述分析装置与所述测序装置相连, 并且适于基于所述测序结果, 确定所 述预定区域中的 SNP信息。 An analysis device, the analysis device is connected to the sequencing device, and is adapted to determine the SNP information in the predetermined region based on the sequencing results.
24、 根据权利要求 23 所述的设备, 其特征在于, 所述预定区域包括目标基因区域和 SNP-marker区域, 所述目标基因区域包括所述目标基因的外显子和外显子毗邻区的至少一 部分,所述外显子毗邻区包括外显子 5 '端上游 50bp的区域和所述外显子下游 50bp的区域, 所述 SNP-marker区域包括所述目标基因上下游 1M的范围。 24. The device according to claim 23, wherein the predetermined region includes a target gene region and a SNP-marker region, and the target gene region includes an exon of the target gene and an exon adjacent region. At least part of the exon adjacent region includes a 50 bp region upstream of the 5' end of the exon and a 50 bp region downstream of the exon, and the SNP-marker region includes a 1 M range upstream and downstream of the target gene.
25、 根据权利要求 23所述的设备, 其特征在于, 所述探针的长度为 20~200nt。 25. The device according to claim 23, characterized in that the length of the probe is 20~200nt.
26、 根据权利要求 25所述的设备, 其特征在于, 所述探针的长度为 60~80nt。 26. The device according to claim 25, characterized in that the length of the probe is 60~80nt.
27、 根据权利要求 23所述的设备, 其特征在于, 所述探针是以芯片的形式提供的。 27. The device according to claim 23, wherein the probe is provided in the form of a chip.
28、 根据权利要求 23所述的设备, 其特征在于, 进一步包括染色体制备装置, 所述染 色体制备装置与所述文库构建装置相连, 并且适用于通过全基因组扩增获得胚胎细胞全基 因组, 所述胚胎细胞全基因组构成所述染色体的至少一部分。 28. The apparatus according to claim 23, further comprising a chromosome preparation device connected to the library construction device and suitable for obtaining the entire genome of embryonic cells through whole genome amplification, The entire embryonic cell genome constitutes at least part of the chromosomes.
29、 根据权利要求 28 所述的设备, 其特征在于, 所述染色体制备装置适于通过选自 29. The apparatus according to claim 28, characterized in that the chromosome preparation device is adapted to be selected from the group consisting of:
PEP-PCR、 DOP-PCR、 OmniPlex WGA和 MDA的至少之一进行所述全基因组扩增。 At least one of PEP-PCR, DOP-PCR, OmniPlex WGA and MDA performs the whole genome amplification.
30、根据权利要求 23所述的设备,其特征在于,进一歩包括 DNA提取装置,所述 DNA 提取装置与所述文库构建装置相连, 并且适于通过对生物体的外周血进行 DNA提取, 以便 获得所述染色体的至少一部分。 30. The apparatus according to claim 23, further comprising a DNA extraction device connected to the library construction device and adapted to extract DNA from the peripheral blood of an organism, so as to At least a portion of the chromosome is obtained.
31、根据权利要求 23所述的设备,其特征在于,所述测序装置为选自 Illumina Hiseq2000, Genome Analyzer, Miseq测序系列、 Life technologies的 SOLiD SlJ序系统、 Ion Torrent测序 系统和罗氏的 454测序系统的至少之一。 31. The equipment according to claim 23, wherein the sequencing device is selected from the group consisting of Illumina Hiseq2000, Genome Analyzer, Miseq sequencing series, Life technologies' SOLiD SlJ sequencing system, Ion Torrent sequencing system and Roche's 454 sequencing system. at least one of.
32、 根据权利要求 23所述的设备, 其特征在于, 所述分析装置进一步包括: 比对单元, 所述比对单元适于将所述测序结果与参考序列进行比对, 以便获得唯一比 对序列; 以及 32. The device according to claim 23, characterized in that the analysis device further comprises: a comparison unit, the comparison unit is adapted to compare the sequencing results with a reference sequence to obtain a unique comparison sequence; and
SNP信息获取单元, 所述 SNP信息获取单元与所述比对单元相连, 并且适于利用 SNP 分析软件从所述唯一比对序列获取所述预定区域中的 SNP信息。 SNP information acquisition unit, the SNP information acquisition unit is connected to the comparison unit, and is adapted to use SNP analysis software to obtain SNP information in the predetermined region from the unique comparison sequence.
33、 根据权利要求 32所述的设备, 其特征在于, 所述比对单元适于利用 BWA软件包 进行所述比对。 33. The device according to claim 32, wherein the comparison unit is adapted to use a BWA software package to perform the comparison.
34、 根据权利要求 32所述的设备, 其特征在于, 所述分析装置进一步包括: 适于从所述唯一比对序列去除 PCR重复扩展的序列的单元。 34. The apparatus according to claim 32, wherein the analysis device further comprises: a unit adapted to remove PCR repeat extended sequences from the unique alignment sequence.
35、 根据权利要求 32所述的设备, 其特征在于, 所述 SNP分析软件为选自 SAMtools 和 GATK的至少之一。 35. The device according to claim 32, characterized in that the SNP analysis software is at least one selected from SAMtools and GATK.
36、 根据权利要求 32所述的设备, 其特征在于, 所述分析装置进一步包括: 适于对所获得的 SNP信息进行过滤的单元。 36. The device according to claim 32, wherein the analysis device further includes: a unit adapted to filter the obtained SNP information.
37、 根据权利要求 36所述的设备, 其特征在于, 所述过滤的条件为去除满足下列条件 之一的 SNP: 37. The device according to claim 36, characterized in that the filtering condition is to remove SNPs that meet one of the following conditions:
SNP测序深度低于 10 X , 优选低于 20 X ;以及 SNP sequencing depth is less than 10X, preferably less than 20X; and
杂合 SNP中两种碱基测序深度差异高于 20%, 优选高于 10%, 更优选高于 5%。 The difference in sequencing depth of the two bases in the hybrid SNP is higher than 20%, preferably higher than 10%, and more preferably higher than 5%.
38、 一种确定胚胎染色体预定区域中 SNP信息的系统, 其特征在于, 包括: 第一全基因组获取设备, 所述第一全基因组获取设备适于获取所述胚胎的全基因组; 以及 38. A system for determining SNP information in a predetermined region of an embryo's chromosomes, characterized by comprising: a first whole genome acquisition device, the first whole genome acquisition device being adapted to acquire the whole genome of the embryo; and
SNP信息确定设备, 所述 SNP信息确定设备与所述第一全基因组获取设备相连, 用于 确定所述胎儿染色体预定区域中的 SNP信息, 其中, 所述 SNP信息确定设备为权利要求 23~37任一项所述的设备。 SNP information determination device, the SNP information determination device is connected to the first whole genome acquisition device, and is used to determine SNP information in the predetermined region of the fetal chromosome, wherein the SNP information determination device is claimed in claims 23 to 37 any of the equipment described above.
39、 根据权利要求 38所述的系统, 其特征在于, 所述第一全基因组获取设备适于通过 对胚胎细胞进行全基因组扩增而获得所述胚胎的全基因组。 39. The system according to claim 38, wherein the first whole genome acquisition device is adapted to obtain the whole genome of the embryo by performing whole genome amplification of embryonic cells.
40、 根据权利要求 39所述的系统, 其特征在于, 所述第一全基因组获取设备适于利用 选自 PEP-PCR、 D0P-PCR、 OmniPlex WGA和 MDA的至少之一获得所述胚胎的全基因组。 40. The system according to claim 39, wherein the first whole genome acquisition device is adapted to obtain the whole genome of the embryo using at least one selected from the group consisting of PEP-PCR, DOP-PCR, OmniPlex WGA and MDA. Genome.
41、 根据权利要求 38所述的系统, 其特征在于, 进一步包括: 41. The system according to claim 38, further comprising:
第二全基因组获取设备, 所述第二全基因组获取设备适于获取胚胎遗传相关个体的全 基因组, 其中, 所述胚胎遗传相关个体包括所述胚胎的父亲、 母亲和先证者; A second whole genome acquisition device, the second whole genome acquisition device is suitable for acquiring the whole genome of an embryo genetically related individual, wherein the embryo genetically related individual includes the father, mother and proband of the embryo;
区分型 SNP确定设备,所述区分型确定设备适于基于所述父亲的 SNP信息和所述母亲 的 SNP信息, 确定区分型 SNP; A differential SNP determination device, the differential determination device is adapted to determine a differential SNP based on the father's SNP information and the mother's SNP information;
第一单体型确定设备, 所述第一单体型确定设备适于基于所述区分型 SNP和所述先证 者的 SNP信息, 确定父亲 SNP单体型和母亲 SNP单体型; 以及 a first haplotype determination device, the first haplotype determination device is adapted to determine the paternal SNP haplotype and the maternal SNP haplotype based on the discriminating SNP and the SNP information of the proband; and
第二单体型确定设备, 所述第二单体型确定设备适于基于所述胚胎的 SNP信息、 父亲 SNP单体型和母亲 SNP单体型, 确定所述父亲 SNP单体型和母亲 SNP单体型的组合方式, 以便获得所述胚胎的 SNP单体型。 a second haplotype determination device, the second haplotype determination device is adapted to determine the paternal SNP haplotype and the maternal SNP based on the SNP information of the embryo, the paternal SNP haplotype and the maternal SNP haplotype. A combination of haplotypes to obtain the SNP haplotype of the embryo.
42、根据权利要求 41所述的系统,其特征在于,所述第二单体型确定设备进一步包括: 确定胚胎的 SNP信息显著支持的父亲单体型作为胚胎的父本来源单体型的单元; 以及 确定胚胎的 SNP信息显著支持的母亲单体型作为胚胎的母本来源单体型的单元。 42. The system according to claim 41, wherein the second haplotype determination device further comprises: a unit that determines the paternal haplotype significantly supported by the SNP information of the embryo as the paternal source haplotype of the embryo. ; as well as Identify embryonic SNP information that significantly supports the maternal haplotype as the embryo's maternally derived haplotype unit.
43、 根据权利要求 42所述的系统, 其特征在于, 所述区分型 SNP数不低于 10个是显 著支持的指示。 43. The system according to claim 42, wherein the number of discriminating SNPs being no less than 10 is an indication of significant support.
PCT/CN2013/084783 2013-09-30 2013-09-30 Method, system, and computer-readable medium for determining snp information in a predetermined chromosomal region WO2015042980A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
CN201380079613.2A CN106029899B (en) 2013-09-30 2013-09-30 Method, system and computer readable medium for determining SNP information in predetermined region of chromosome
PCT/CN2013/084783 WO2015042980A1 (en) 2013-09-30 2013-09-30 Method, system, and computer-readable medium for determining snp information in a predetermined chromosomal region
CN201480050879.9A CN105555970B (en) 2013-09-30 2014-07-04 Method and system for simultaneous haplotyping and chromosomal aneuploidy detection
PCT/CN2014/081672 WO2015043278A1 (en) 2013-09-30 2014-07-04 Method and system for simultaneously performing target gene haplotype analysis and chromosomal aneuploidy detection
HK16109816.5A HK1221745A1 (en) 2013-09-30 2016-08-16 Method and system for simultaneously performing target gene haplotype analysis and chromosomal aneuploidy detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2013/084783 WO2015042980A1 (en) 2013-09-30 2013-09-30 Method, system, and computer-readable medium for determining snp information in a predetermined chromosomal region

Publications (1)

Publication Number Publication Date
WO2015042980A1 true WO2015042980A1 (en) 2015-04-02

Family

ID=52741899

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/CN2013/084783 WO2015042980A1 (en) 2013-09-30 2013-09-30 Method, system, and computer-readable medium for determining snp information in a predetermined chromosomal region
PCT/CN2014/081672 WO2015043278A1 (en) 2013-09-30 2014-07-04 Method and system for simultaneously performing target gene haplotype analysis and chromosomal aneuploidy detection

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/081672 WO2015043278A1 (en) 2013-09-30 2014-07-04 Method and system for simultaneously performing target gene haplotype analysis and chromosomal aneuploidy detection

Country Status (3)

Country Link
CN (2) CN106029899B (en)
HK (1) HK1221745A1 (en)
WO (2) WO2015042980A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111373054A (en) * 2018-05-31 2020-07-03 深圳华大临床检验中心 Method, system and computer readable medium for determining the presence of triploids in a male test sample
WO2020257717A1 (en) * 2019-06-21 2020-12-24 Coopersurgical, Inc. System and method for determining genetic relationships between a sperm provider, oocyte provider, and the respective conceptus

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105046105B (en) * 2015-07-09 2018-02-02 天津诺禾医学检验所有限公司 The Haplotype map and its construction method of chromosome span
WO2018053761A1 (en) * 2016-09-22 2018-03-29 华为技术有限公司 Data processing method and device, and computing node
CN108220403B (en) * 2017-12-26 2021-07-06 北京科迅生物技术有限公司 Method and device for detecting specific mutation site, storage medium and processor
CN110628891B (en) * 2018-06-25 2024-01-09 深圳华大智造科技股份有限公司 Method for screening embryo genetic abnormality
CN111276189B (en) * 2020-02-26 2020-12-29 广州市金域转化医学研究院有限公司 Chromosome balance translocation detection and analysis system based on NGS and application thereof
CN113436680B (en) * 2020-05-22 2022-03-25 复旦大学附属妇产科医院 Method for simultaneously identifying chromosome structural abnormality and carrier state of pathogenic gene of embryo

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000050869A2 (en) * 1999-02-26 2000-08-31 Incyte Pharmaceuticals, Inc. Snp detection
WO2003065000A2 (en) * 2002-01-25 2003-08-07 Applera Corporation METHODS OF VALIDATING SNPs AND COMPILING LIBRARIES OF ASSAYS
WO2009106294A1 (en) * 2008-02-29 2009-09-03 Roche Diagnostics Gmbh Methods and systems for uniform enrichment of genomic regions
CN102061526A (en) * 2010-11-23 2011-05-18 深圳华大基因科技有限公司 DNA (deoxyribonucleic acid) library and preparation method thereof as well as method and device for detecting single nucleotide polymorphisms (SNPs)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006031745A2 (en) * 2004-09-10 2006-03-23 Sequenom, Inc. Methods for long-range sequence analysis of nucleic acids
EP2053132A1 (en) * 2007-10-23 2009-04-29 Roche Diagnostics GmbH Enrichment and sequence analysis of geomic regions
CN102559856B (en) * 2010-12-22 2014-03-12 深圳华大基因科技服务有限公司 Method for deleting vector segments in sequencing library
CN102952855B (en) * 2011-08-26 2015-05-20 深圳华大基因科技服务有限公司 Genetic map construction method and device, haplotype analytical method and device
CN103103624B (en) * 2011-11-15 2014-12-31 深圳华大基因科技服务有限公司 Method for establishing high-throughput sequencing library and application thereof
CN102839168A (en) * 2012-07-31 2012-12-26 深圳华大基因研究院 Nucleic acid probe, and preparation method and application thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000050869A2 (en) * 1999-02-26 2000-08-31 Incyte Pharmaceuticals, Inc. Snp detection
WO2003065000A2 (en) * 2002-01-25 2003-08-07 Applera Corporation METHODS OF VALIDATING SNPs AND COMPILING LIBRARIES OF ASSAYS
WO2009106294A1 (en) * 2008-02-29 2009-09-03 Roche Diagnostics Gmbh Methods and systems for uniform enrichment of genomic regions
CN102061526A (en) * 2010-11-23 2011-05-18 深圳华大基因科技有限公司 DNA (deoxyribonucleic acid) library and preparation method thereof as well as method and device for detecting single nucleotide polymorphisms (SNPs)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111373054A (en) * 2018-05-31 2020-07-03 深圳华大临床检验中心 Method, system and computer readable medium for determining the presence of triploids in a male test sample
WO2020257717A1 (en) * 2019-06-21 2020-12-24 Coopersurgical, Inc. System and method for determining genetic relationships between a sperm provider, oocyte provider, and the respective conceptus
WO2020257709A1 (en) * 2019-06-21 2020-12-24 Coopersurgical, Inc. Systems and methods for determining pattern of inheritance in embryos
JP2022537444A (en) * 2019-06-21 2022-08-25 クーパーサージカル・インコーポレイテッド Systems, computer program products and methods for determining genetic patterns in embryos
JP2022537445A (en) * 2019-06-21 2022-08-25 クーパーサージカル・インコーポレイテッド Systems, computer program products and methods for determining genetic relationships between sperm donors, oocyte donors and their respective conceptuses
AU2020296108B2 (en) * 2019-06-21 2023-08-03 Coopersurgical, Inc. Systems and methods for determining pattern of inheritance in embryos
AU2020296188B2 (en) * 2019-06-21 2023-08-24 Coopersurgical, Inc. System and method for determining genetic relationships between a sperm provider, oocyte provider, and the respective conceptus
JP7333838B2 (en) 2019-06-21 2023-08-25 クーパーサージカル・インコーポレイテッド Systems, computer programs and methods for determining genetic patterns in embryos
JP7362789B2 (en) 2019-06-21 2023-10-17 クーパーサージカル・インコーポレイテッド Systems, computer programs and methods for determining genetic relationships between sperm donors, oocyte donors and their respective conceptuses

Also Published As

Publication number Publication date
CN106029899A (en) 2016-10-12
CN105555970B (en) 2020-06-05
CN105555970A (en) 2016-05-04
HK1221745A1 (en) 2017-06-09
CN106029899B (en) 2021-08-03
WO2015043278A1 (en) 2015-04-02

Similar Documents

Publication Publication Date Title
JP6585117B2 (en) Diagnosis of fetal chromosomal aneuploidy
ES2564656T3 (en) Means and methods for the non-invasive diagnosis of chromosomal aneuploidy
KR101966262B1 (en) Diagnosing fetal chromosomal aneuploidy using massively parallel genomic sequencing
CN103874767B (en) Presumptive area in sample of nucleic acid is carried out the method and system of gene type
WO2015042980A1 (en) Method, system, and computer-readable medium for determining snp information in a predetermined chromosomal region
CN104232777B (en) Determine the method and device of fetal nucleic acid content and chromosomal aneuploidy simultaneously
CN105441432B (en) Composition and its purposes in sequencing and variation detection
WO2024027569A1 (en) Haplotype construction method independent of proband
JP6073461B2 (en) Non-invasive prenatal diagnosis of fetal trisomy by allelic ratio analysis using targeted massively parallel sequencing
CN105648045B (en) The method and apparatus for determining fetus target area haplotype
JP6045686B2 (en) Method, system and computer-readable recording medium for determining base information of a predetermined region in fetal genome
CN112126677B (en) Noninvasive deafness haplotype gene mutation detection method
US20190338350A1 (en) Method, device and kit for detecting fetal genetic mutation
WO2014075228A1 (en) Method, system and computer readable medium for determining whether chromosome number variation exists in biological sample
CN105648044B (en) The method and apparatus for determining fetus target area haplotype
EP2971126B1 (en) Determining fetal genomes for multiple fetus pregnancies
WO2020047694A1 (en) Method and device for determining genetic status of new mutation in embryo
CN117925820B (en) Method for detecting variation before embryo implantation
WO2024076469A1 (en) Non-invasive methods of assessing transplant rejection in pregnant transplant recipients
JP2015517317A (en) Methods and systems for identifying twin types
CN116790740A (en) Novel construction method of common deafness gene copy number detection and diagnosis chip
CN115772563A (en) Non-diagnostic method for detecting PAH gene mutation and design method of probe
CN117004708A (en) Screening method, detection composition and detection kit for hereditary Imerslund-Grasbeck syndrome
WO2017124214A1 (en) Method for detecting chromosome robertsonian translocation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13894841

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase
32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC, FORM 1205A DATED 17-08-2016

122 Ep: pct application non-entry in european phase

Ref document number: 13894841

Country of ref document: EP

Kind code of ref document: A1