WO2015042980A1

WO2015042980A1 - Method, system, and computer-readable medium for determining snp information in a predetermined chromosomal region

Info

Publication number: WO2015042980A1
Application number: PCT/CN2013/084783
Authority: WO
Inventors: 李剑; 张现东; 李金良; 刘赛军; 叶敏兰
Original assignee: 深圳华大基因科技有限公司
Priority date: 2013-09-30
Filing date: 2013-09-30
Publication date: 2015-04-02
Also published as: CN106029899A; CN105555970B; CN105555970A; HK1221745A1; CN106029899B; WO2015043278A1

Abstract

The present invention provides a method, system, and computer-readable medium for determining SNP information in a predetermined chromosomal region. The method for determining SNP information in a predetermined chromosomal region comprises: constructing a sequencing library for at least one part of a chromosome; using a probe to screen said sequencing library; the probe specifically identifying at least one of the known SNP sites in said predetermined region so as to obtain a target capture fragment, said target capture fragment including the SNP site; sequencing the screened sequencing library so as to obtain sequencing results; and determining on the basis of the sequencing results the SNP information in said predetermined region.

Description

a method of determining SNP information in a predetermined region of a chromosome,

System and computer readable media priority information

No technical field

The present invention relates to the field of biomedicine and, in particular, to a method, system and computer readable medium for determining SNP information in a predetermined region of a chromosome. Background technique

The World Health Organization's 2012 Global Birth Defect Prevention Report shows that the global incidence of birth defects is 3%, with 3.2 million birth defects per year, of which 270,000 newborns die from birth defects. Studies have shown that most of the birth defects are related to genetic factors, and chromosomal abnormalities and monogenic genetic diseases are two important reasons. Among them, there are many types of monogenic genetic diseases, and the incidence rates are different, and most of these diseases cannot be cured, which brings a heavy economic and psychological burden to the whole society and families. Therefore, prevention of the occurrence of children with monogenic genetic diseases and reduction of the birth of children with genetic diseases are the focus of prevention and control of hereditary birth defects. Preimplantation Genetic Diagnosis (PGD) technology can block the occurrence and transmission of genetic diseases from the roots, and advance the prevention of birth defects to the embryonic stage. However, pre-implantation diagnosis of single-gene borne diseases has not been widely applied, and so far thousands of cases have been reported in the world. The reason is mainly due to the small amount of specimens (only 1~2 cells), easy to cause allele tripping (ADO) and pollution, the detection is more difficult, the existing detection technology can not fully meet the single genetic disease implant Clinical requirements for pre-diagnosis.

The haplotype analysis before embryo implantation is the main method for the detection of monogenic diseases before implantation. This method determines mutational haplotypes by detecting mutation sites and multiple STRs (or SNPs) linked to them, reducing the effects of allelic amplification, ADO, and contamination. Multiplex PCR (MF-PCR) is the most commonly used technique based on this method. Because of the high sensitivity of fluorescent PCR, and the combination of multiple linked STRs for single-type analysis of mutation sites, it was once considered the gold standard for the diagnosis of pre-implantation monogenic diseases. However, there are too few linkage markers used in this method, and even in individual clinical cases, there may even be cases where no linkage markers are available. Therefore, before each clinical test, a pre-test is needed to find and select the appropriate molecular marker for the patient. In addition, the linkage markers used in MF-PCR are often far from the pathogenic site and may have a risk of misdiagnosis due to chromosomal recombination events.

SNP-army is an analysis of SNP loci in the whole genome region, and the SNP density is high and the number is large. The advantage of this method is that it is suitable for haplotype analysis of all samples, and no pre-test is required to select molecular markers for individual samples. In addition, the chip can detect multiple diseases at the same time. However, the chip can only be indirectly detected by haplotype analysis. It is not possible to directly detect the site of the disease.

Thus, methods for determining SNP information in chromosomes, particularly in predetermined regions of the embryonic chromosome, have yet to be improved. Summary of the invention

The present invention aims to solve at least one of the technical problems existing in the prior art. The present invention aims to propose a method for efficiently determining SNP information in a chromosome, particularly a predetermined region of an embryonic chromosome.

In one aspect of the invention, the invention proposes a method of determining SNP information in a predetermined region of a chromosome. According to an embodiment of the invention, the method comprises: constructing a sequencing library for at least a portion of a chromosome; screening the sequencing library with a probe, wherein the probe specifically recognizes a known SNP position in the predetermined region At least one of points to obtain a target capture fragment, the target capture fragment comprising a SNP site; sequencing the sequenced sequencing library to obtain a sequencing result; and determining a SNP in the predetermined region based on the sequencing result information. By using the method for determining SNP information in a predetermined region of a chromosome of the present invention, it is possible to efficiently and accurately determine SNP information in a predetermined region of a chromosome, for example, information on a mutation site associated with a pathogenic gene of a sample, and further, the information can be effectively It is used to determine whether the genetic state of a subject is normal, carried or pathogenic, thereby providing a basis for clinical disease detection or treatment.

In another aspect of the invention, the invention also provides a method of determining SNP information in a predetermined region of an embryonic chromosome. According to an embodiment of the present invention, the method comprises: acquiring a whole genome of the embryo; and determining, for the whole genome of the embryo, a predetermined region of the embryo chromosome according to the method for determining SNP information in a predetermined region of the chromosome as described above SNP information in . The method for determining SNP information in a predetermined region of an embryo's chromosome can effectively and accurately determine SNP information in a predetermined region of an embryo chromosome, and further, the information can be effectively used to determine whether the embryo's genetic state is normal, carried or pathogenic. Therefore, it can provide a basis for pre-implantation monogenic disease detection, prenatal diagnosis of pregnant women or clinical disease treatment.

In still another aspect of the present invention, the present invention also provides an apparatus for determining SNP information in a predetermined region of a chromosome. According to an embodiment of the invention, the apparatus comprises: a library construction device, the library construction device being adapted to construct a sequencing library for at least a portion of a chromosome; a library screening device, the library screening device being coupled to the library construction device, and Suitable for screening the sequencing library with a probe, wherein the probe specifically recognizes at least one of known SNP sites in the predetermined region to obtain a target capture fragment, the target capture fragment comprising a SNP position a sequencing device, the sequencing device being coupled to the library screening device, adapted to sequence the sequenced sequencing library to obtain a sequencing result; and an analysis device coupled to the sequencing device and adapted Based on the sequencing result, SNP information in the predetermined area is determined. With the apparatus of the present invention, the above-described method for determining SNP information in a predetermined region of a chromosome of the present invention can be effectively implemented, thereby enabling efficient and accurate determination of SNP information in a predetermined region of a chromosome, for example, by a pathogenic gene of a sample. Mutation site information, and, in turn, the information can have Effectively used to determine whether a subject's genetic status is normal, carried or pathogenic, thereby providing a basis for clinical disease detection or treatment.

In still another aspect of the invention, the invention also proposes a system for determining SNP information in a predetermined region of an embryonic chromosome. According to an embodiment of the invention, the system comprises: a first whole genome acquisition device, the first whole genome acquisition device being adapted to acquire a whole genome of the embryo; and a SNP information determining device, the SNP information determining device and device The first whole genome acquisition device is connected to determine SNP information in a predetermined region of the embryo chromosome, wherein the SNP information determining device is the device for determining SNP information in a predetermined region of the chromosome as described above. With the system of the present invention, the above-described method of determining SNP information in a predetermined region of a chromosome can be efficiently implemented, thereby effectively determining SNP information in a predetermined region of the chromosome, and further, the information can be effectively used to determine the genetic state of the fetus. Normal, carrying or causing disease, which can provide a basis for pre-implantation monogenic disease detection, prenatal diagnosis of pregnant women or clinical disease treatment.

In another aspect of the invention, the invention also provides a computer readable medium. According to an embodiment of the invention, the computer readable medium stores instructions, the instructions being adapted to be executed by the processor to determine SNP information in a predetermined region of the chromosome based on the sequencing result, wherein the sequencing result is through the following Step obtained: constructing a sequencing library for at least a part of a chromosome; screening the sequencing library with a probe, wherein the probe specifically recognizes at least one of known SNP sites in the predetermined region, in order to obtain A target capture fragment, the target capture fragment comprising a SNP site; and sequencing of the sequenced sequencing library to obtain sequencing results. With the computer readable medium of the present invention, SNP information in a predetermined region of a chromosome can be efficiently determined, for example, information on mutation sites associated with a pathogenic gene of a sample, and further, the information can be effectively used to determine a subject. The genetic status is normal, carried or pathogenic, thus providing a basis for clinical disease detection or treatment. Wherein, when at least a portion of the chromosome is a whole genome of an embryo, the computer readable medium stores instructions adapted to be executed by a processor to determine a SNP in a predetermined region of the embryo's chromosome for a whole genome of the embryo information.

In still another aspect of the invention, the invention also proposes an apparatus for determining SNP information in a predetermined region of a chromosome. According to an embodiment of the invention, the apparatus comprises: a sequencing device; and the aforementioned computer-readable medium storing instructions adapted to be executed by the processor to determine SNP information in a predetermined region of the chromosome based on the sequencing result. The apparatus of the present invention can accurately and efficiently determine SNP information in a predetermined region of a chromosome, for example, information on mutation sites associated with a pathogenic gene of a sample, and further, the information can be effectively used to determine a genetic state of a subject. It is normal, carried or pathogenic, which can provide a basis for clinical disease detection or treatment.

In still another aspect of the invention, the invention also provides a system for determining SNP information in a predetermined region of an embryonic chromosome. According to an embodiment of the invention, the system comprises: a sequencing device; and the aforementioned computer readable medium storing instructions adapted to be executed by the processor to determine SNP information in a predetermined region of the fetal chromosome for the whole genome of the embryo. The system of the present invention can accurately and efficiently determine SNP information in a predetermined region of an embryonic chromosome, and further, The information can be effectively used to determine whether the genetic state of the embryo is normal, carried or pathogenic, thereby providing a basis for pre-implantation monogenic disease detection, prenatal diagnosis of pregnant women or clinical disease treatment.

It should be noted that the above-mentioned means for determining SNP information in a predetermined region of a chromosome based on the high-throughput target region capture sequencing technology provided by the present invention has at least the following advantages over the prior art:

1. The haplotype analysis method of the present invention can not only indirectly detect a target site, but also directly detect a target site.

2. The selected SNP locus in the present invention is concentrated in the target gene 1M range, and the density is high and the linkage is tight, which can greatly improve the sensitivity and accuracy of SNP information detection in the target region, and can reduce the detection cost.

3. The invention concentrates multiple target detection sites on one chip, and can detect multiple mutations of various diseases simultaneously based on the obtained SNP information, and does not need to design an experimental scheme differently from person to person, which shortens the detection. The cycle reduces the cost of testing.

4. The invention adopts a chip comprising a plurality of target detection sites to simultaneously detect a plurality of samples, and the detection flux is greatly improved. This provides great technical support for the scaled application of PGD in the future.

5. The method of the present invention, in addition to being capable of being used for single-gene genetic disease detection, is capable of simultaneously performing HLA typing and aneuploidy detection, and realizing multiple tests of a single sample, and providing personalized services for related IVF patients.

The additional aspects and advantages of the invention will be set forth in part in the description which follows. DRAWINGS

The above and/or additional aspects and advantages of the present invention will become apparent and readily understood from

Figure 1 shows a flow chart of an analysis of embryo haplotypes in accordance with one embodiment of the present invention;

2 is a schematic diagram showing a method of determining distinguishing SNPs according to an embodiment of the present invention;

Figure 3 shows the results of 2100 detection of a constructed library in accordance with one embodiment of the present invention;

Figure 4 shows a simulation of a haplotype construction in accordance with one embodiment of the present invention;

Figure 5 is a schematic flow chart showing analysis of embryo haplotype and embryo genetic condition according to one embodiment of the present invention;

6 is a flow chart showing a method of determining SNP information in a predetermined region of a chromosome according to an embodiment of the present invention;

Figure 7 is a flow chart showing a method of determining SNP information in a predetermined region of an embryonic chromosome according to an embodiment of the present invention;

FIG. 8 shows the structure of an apparatus for determining SNP information of a predetermined region of a chromosome according to an embodiment of the present invention. Schematic;

Figure 9 is a diagram showing the structure of a system for determining SNP information in a predetermined region of an embryonic chromosome according to an embodiment of the present invention. Detailed description of the invention

The embodiments of the present invention are described in detail below, and the examples of the embodiments are illustrated in the drawings, wherein the same or similar reference numerals are used to refer to the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the drawings are intended to be illustrative of the invention and are not to be construed as limiting.

It should be noted that the terms "first" and "second" are used for descriptive purposes only, and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, features defining "first", "second" may explicitly or implicitly include one or more of the features. Further, in the description of the present invention, "multiple" means two or more unless otherwise stated.

Method

In one aspect of the invention, the invention proposes a method of determining SNP information in a predetermined region of a chromosome. According to an embodiment of the present invention, referring to FIG. 6, the method includes:

Build a sequencing library for at least a portion of the chromosome

According to an embodiment of the invention, at least a portion of the chromosome is a whole genome of embryonic cells obtained by whole genome amplification. According to an embodiment of the present invention, the method of performing whole genome amplification is not particularly limited, and according to some specific examples of the present invention, whole genome amplification is performed by at least one selected from the group consisting of PEP-PCR, DOP-PCR, OmniPlex WGA, and MDA. One carried out. Thereby, a small amount of embryonic cells can be efficiently amplified, thereby obtaining more embryonic whole genome samples.

Screening the sequencing library with a probe to obtain a target capture fragment

According to an embodiment of the invention, the probe specifically recognizes at least one of known SNP sites in the predetermined region to obtain a target capture segment, the target capture segment comprising a SNP site. According to an embodiment of the invention, the predetermined region comprises a target gene region and a SNP-marker region. According to an embodiment of the present invention, the target gene region comprises at least a portion of an exon and an exon adjacent region of the gene associated with the target disease. Wherein the exon adjacent region comprises a region of 50 bp upstream of the 5' end of the exon and a region of 50 bp downstream of the exon; and the SNP-marker region comprises a range of 1 M upstream and downstream of the target gene. Thereby, the influence of gene recombination can be effectively reduced in the screening process, and the probability of recombination of the target gene region and the SNP-marker region can be reduced to one ten thousandth, thereby ensuring the accuracy of subsequent detection.

According to an embodiment of the invention, the probe has a length of 20 to 200 nt. Preferably, the length of the probe is 60 to 80 nt. Thereby, the capture efficiency of the target SNP can be effectively improved. According to an embodiment of the invention, the probe It is provided in the form of a chip. Therefore, by using a chip capable of including a plurality of target detection sites, it is possible to simultaneously detect multiple mutations of various diseases, and it is not necessary to design an experimental scheme differently from person to person, which shortens the detection period and reduces the detection cost; The chip can detect multiple samples at the same time, and the detection throughput is greatly improved.

Sequencing the sequenced sequencing library for sequencing results

According to an embodiment of the invention, the sequencing is performed using at least one of an SOLiD sequencing system selected from the group consisting of Illumina Hiseq 2000, Genome Analyzer, Miseq Sequencing Systems, Life Technologies, Ion Torrent Sequencing System and Roche 454 Sequencing System. Thereby, the efficiency and throughput of sequencing can be effectively improved.

Determining SNP information in the predetermined area based on the sequencing result

According to an embodiment of the present invention, determining SNP information in the predetermined region based on the sequencing result further comprises: comparing the sequencing result with a reference sequence to obtain a unique alignment sequence; and using SNP analysis software to The unique alignment sequence acquires SNP information in the predetermined area. Therein, the alignment is performed using a BWA software package in accordance with an embodiment of the present invention. Thereby, the comparison can be achieved quickly and accurately. According to an embodiment of the invention, after obtaining the unique alignment sequence, further comprising removing the sequence of PCR repeat extension from the unique alignment sequence. This facilitates subsequent SNP analysis. The kind of SNP analysis software that can be employed according to an embodiment of the present invention is not particularly limited. According to some embodiments of the invention, the SNP analysis software is at least one selected from the group consisting of SAMtools and GATK. Thereby, SNP analysis can be performed quickly and accurately.

According to an embodiment of the invention, the filtering of the obtained SNP information is further included. Wherein, according to some embodiments of the present invention, the filtering condition is to remove a SNP that satisfies one of the following conditions: SNP sequencing depth is less than 10 X, preferably less than 20 X; and two base sequencing depths in the hybrid SNP The difference is above 20%, preferably above 10%, more preferably above 5%. Thus, the filtered SNP information is accurate and reliable. It should be noted that, in theory, the higher the sequencing depth, the closer the heterogeneous SNP sequencing depth ratio is to 1: 1, and the specific value of the sequencing depth and the sequencing depth difference in the SNP filtration condition, and the sample at the time of implementation, sequencing Depth, sequencing quality related, can be adjusted according to actual needs. In one embodiment of the present invention, the embryo genetically related individual has a sequencing depth of 50 X, the embryo sample has a sequencing depth of 100 X, and the sequencing quality is good, so that the remaining SNPs are accurately aligned with the actual SNP, and strictly filtered. Filter out SNPs below 10 X and also filter out heterozygous SNPs with a difference in sequencing depth greater than 10%, removing a large number of heterozygous SNPs; understandably, using higher depth sequencing (> 100 X), if Strict filtering ensures the true accuracy of the remaining SNPs. It can filter out SNPs such as below 20 X, and filter out heterozygous SNPs with a difference of more than 5%. Conversely, for relatively low-depth sequencing data, filter can be set higher than 20% heterozygous SNP.

The inventors have found that the method for determining SNP information in a predetermined region of a chromosome of the present invention can efficiently and accurately determine SNP information in a predetermined region of a chromosome, for example, information on a mutation site related to a pathogenic gene of a sample, and further, Information can be effectively used to determine whether a subject's genetic status is normal, carried or pathogenic, thereby providing a basis for clinical disease detection or treatment. In another aspect of the invention, the invention also provides a method of determining SNP information in a predetermined region of an embryonic chromosome. According to an embodiment of the present invention, the method comprises: acquiring a whole genome of the embryo; and determining, for the whole genome of the embryo, a predetermined region of the embryo chromosome according to the method for determining SNP information in a predetermined region of the chromosome as described above SNP information in .

According to still another embodiment of the present invention, referring to FIG. 7, the method for determining SNP information in a predetermined region of an embryonic chromosome of the present invention specifically includes the following steps: acquiring a whole genome of the embryo; constructing a sequencing library for the whole genome of the embryo The sequencing library is screened by a probe to obtain a target capture fragment; the screened sequencing library is sequenced to obtain a sequencing result; based on the sequencing result, SNP information in a predetermined region of the embryo chromosome is determined. The method for determining SNP information in a predetermined region of an embryonic chromosome of the present invention can effectively and accurately determine SNP information in a predetermined region of an embryonic chromosome, and further, the information can be effectively used to determine whether the genetic state of the fetus is normal, carried or pathogenic Therefore, it can provide a basis for pre-implantation monogenic disease detection, prenatal diagnosis of pregnant women or clinical disease treatment.

According to an embodiment of the invention, the whole genome of the embryo is obtained by whole genome amplification of embryonic cells. Wherein, according to an embodiment of the present invention, the specific implementation method of whole genome amplification is not particularly limited. According to some specific examples of the present invention, whole genome amplification is selected from the group consisting of PEP-PCR, DOP-PCR, OmniPlex WGA and MDA At least one of them is carried out. Thereby, a small amount of embryonic cells can be efficiently amplified, thereby obtaining a larger whole genome sample of embryonic cells.

According to an embodiment of the present invention, the method for determining SNP information in a predetermined region of an embryonic chromosome of the present invention further comprises:

First, a whole genome of an embryo genetically related individual is obtained, wherein the embryonic genetic related individual includes a father, a mother, and a proband of the embryo. It should be noted that the term "proband" as used herein refers to a patient diagnosed with the disease-causing gene and exhibiting the symptoms of the disease, and is an organism having a genetic relationship with the aforementioned embryo, either The embryo or fetus can also be an individual after birth.

Next, based on the whole genome of the embryonic genetically related individual, the father's SNP information, the mother's SNP information, and the proband's SNP information are determined, respectively.

Next, a distinguishing type SNP is determined based on the SNP information of the father and the SNP information of the mother. It should be noted that the term "differentiated SNP" as used herein refers to a base which can effectively distinguish a parental haplotype, that is, one of the four bases of a parent at one position (autosomal) Different from other bases at this position, the base can determine the only one of the four haplotypes of the parents. For example, if the parental genotype of each position is AA, AG, then the G base is a differentiated SNP, because At this position G, a single haplotype can be determined, and A is present in the other three haplotypes, and the unique haplotype cannot be determined. Figure 2 shows a schematic diagram of the method for determining parental SNPs based on Mendelian genetic principles. Next, a father SNP haplotype and a mother SNP haplotype are determined based on the distinguishing SNP and the SNP information of the proband. That is, based on the distinguishing SNP and the proband SNP, respectively constructing a first father haplotype and a second father haplotype for the two chromosomes corresponding to the predetermined region in the father and mother genomes, respectively. The first mother haplotype and the second mother haplotype are used for the determination of subsequent embryo haplotypes. Wherein the father SNP haplotype comprises a first father haplotype and a second father haplotype, the mother SNP haplotype comprising a first mother haplotype and a second mother haplotype, the first The father haplotype, the second father haplotype, the first mother haplotype, and the second mother haplotype are composed of the distinguishing SNPs. According to the embodiment of the present invention, the parent SNP-haplotype can be constructed according to the Mendelian genetic principle and the linkage exchange law, combined with the parental SNP locus and the proband SNPs information, and the construction principle is shown in FIG. 4 . The SNP-haplotype consists entirely of distinguishing SNP position bases, each of which contains a plurality of distinguishing SNPs, and the distinguishing SNPs in the haplotype can be distinguished from other haplotypes. For example, the parental genotype of a certain position is AA, AG, G is a differentiated SNP, A is a non-differentiated SNP, and A and G are the bases of the haplotype. Since the two haplotypes of the proband are inherited from the parents, the haplotype in which the mutation is located can be determined according to the disease. If the dominant genetic disease, the father is sick, the mother is normal, the haplotype inherited by the proband from the father is the haplotype where the disease is the mutation; if the recessive genetic disease, the parents are carriers, the proband Both haplotypes of the disease (disease) are haplotypes in which the disease is mutated. Thus, based on the SNP information of the distinguishing SNP and the proband, the father SNP haplotype and the maternal SNP haplotype can be effectively determined, and based on the SNP information of the embryo, the father SNP haplotype and the mother SNP haplotype, The SNP haplotype of the embryo is efficiently determined.

Then, based on the SNP information of the embryo, the father SNP haplotype and the maternal SNP haplotype, the combination of the father SNP haplotype and the maternal SNP haplotype is determined to obtain the SNP haplotype of the embryo. . That is, determining the SNP type in the predetermined region of the fetal chromosome based on the SNP information of the embryo and the aforementioned first father haplotype, second father haplotype, first mother haplotype, and second mother haplotype. And determining the SNP haplotype of the embryo. According to an embodiment of the invention, the SNP haplotype of the embryo is obtained by determining the father haplotype of the embryo that is significantly supported by the SNP information of the embryo as the paternal source haplotype of the embryo; and determining the SNP information of the embryo A significantly supported maternal haplotype is used as the maternal source haplotype of the embryo. Wherein, according to an embodiment of the present invention, the number of the distinguishing SNPs is not less than 10, which is an indication of significant support. Specifically, since the two haplotypes of the embryo are inherited from each parent, the SNPs can be analyzed according to the information of the embryonic SNPs and the combination of the two haplotypes of the embryonic SNPs. 4 is shown. In the analysis, the statistical calculation of the number of distinguishing SNPs can be used, and the embryo haplotype is determined according to the numerical value. The specific process is shown in FIG. 5 . According to an embodiment of the present invention, if the number of single-type distinguishing SNPs is greater than 10, it can be determined that the haplotype is one of the haplotypes of the embryo; if the number of singular-type SNPs is less than 4, the monomer can be judged. Type is caused by a SNP error. According to some specific examples of the present invention, in order to ensure accuracy, the number of SNP supports of a correct haplotype is set to be no less than 10, and the number of haplotype SNP supports is not more than 3, because the previously set SNP filtration conditions are more stringent, that is, the correct rate of SNP used in haplotype construction is higher, and the number of candidate SNPs is large, The test data shows that the number of SNPs supported by the correct haplotype is much higher than 10, and the number of SNPs supported by the wrong haplotype is generally zero. According to some embodiments of the present invention, it has been verified that for an autosomal disease, only two haplotypes satisfying the requirements can be obtained per embryo by the method of the present invention; for an X chromosome disease, by the method of the present invention Analysis, one (male) or two (female) can be obtained to meet the required haplotype.

Thereby, the SNP haplotype of the embryo can be accurately and efficiently determined, and the genetic state of the embryo can be effectively determined. That is, the method can effectively determine whether the embryo inherits the pathogenic haplotype of the parent according to the parent haplotype constructed by the simulation, thereby judging whether the embryo's genetic state is normal, carrying or causing disease. Equipment and systems

In still another aspect of the present invention, the present invention also provides an apparatus for determining SNP information in a predetermined region of a chromosome. According to an embodiment of the present invention, referring to FIG. 8, the apparatus 1000 includes a library construction device 100, a library screening device 200, a sequencing device 300, and an analysis device 400. According to an embodiment of the invention, the library construction device 100 is adapted to construct a sequencing library for at least a portion of a chromosome; a library screening device 200 is coupled to the library construction device 100 and is adapted to screen the sequencing library with a probe, Wherein the probe specifically recognizes at least one of known SNP sites in the predetermined region to obtain a target capture segment, the target capture segment comprising the SNP site; sequencing device 300 and the library screening device 200 connected, suitable for sequencing the sequenced sequencing library to obtain sequencing results; the analysis device 400 is coupled to the sequencing device 300 and is adapted to determine SNP information in the predetermined region based on the sequencing result. With the apparatus 1000 of the present invention, the above-described method for determining SNP information in a predetermined region of a chromosome of the present invention can be effectively implemented, thereby enabling efficient and accurate determination of SNP information in a predetermined region of a chromosome, for example, by a pathogenic gene of a sample. The mutation site information, in turn, can be effectively used to determine whether the subject's genetic state is normal, carried or pathogenic, thereby providing a basis for clinical disease detection or treatment.

According to an embodiment of the invention, the predetermined area comprises a target gene region and a SNP-marker region. According to an embodiment of the invention, the target gene region comprises at least a portion of an exon and an exon adjacent region of a gene associated with the target disease. According to an embodiment of the present invention, the exon adjacent region comprises a region 50 bp upstream from the 5′ end of the exon and a region 50 bp downstream of the exon; the SNP-marker region includes 1 M upstream and downstream of the target gene The scope.

According to an embodiment of the invention, the probe has a length of 20 to 200 nt. Preferably, the length of the probe is 60 to 80 nt. According to an embodiment of the invention, the probe is provided in the form of a chip.

According to an embodiment of the present invention, further comprising a chromosome preparation device (not shown), the chromosome preparation device is connected to the library construction device 100, and is adapted to obtain an embryonic cell whole genome by whole genome amplification, The whole genome of the embryonic cells constitutes at least a portion of the chromosome. According to an embodiment of the invention, the chromosome preparation device is adapted to perform the whole genome amplification by at least one selected from the group consisting of PEP-PCR, DOP-PCR, OmniPlex WGA and MDA. According to an embodiment of the present invention, further comprising a DNA extraction device (not shown), the DNA extraction device is connected to the library construction device 100, and is adapted to obtain DNA extraction from peripheral blood of the living body to obtain At least a portion of the chromosome.

According to an embodiment of the invention, the sequencing device 300 is at least one selected from the group consisting of Illumina Hiseq 2000, Genome Analyzer, Miseq sequencing series, Life technologies' SOLiD sequencing system, Ion Torrent sequencing system and Roche's 454 sequencing system.

According to an embodiment of the present invention, the analyzing device 400 further includes: a comparing unit, the comparing unit is adapted to compare the sequencing result with a reference sequence to obtain a unique alignment sequence; and a SNP information acquiring unit And the SNP information acquiring unit is connected to the comparison unit, and is adapted to acquire SNP information in the predetermined area from the unique alignment sequence by using SNP analysis software. According to an embodiment of the invention, the comparison unit is adapted to perform the comparison using a BWA software package. According to an embodiment of the invention, the analysis means further comprises means adapted to remove the sequence of PCR repeat extensions from the unique alignment sequence. According to an embodiment of the invention, the SNP analysis software is at least one selected from the group consisting of SAMtools and GATK.

According to an embodiment of the invention, the analysis device 400 further comprises means adapted to filter the obtained SNP information. According to an embodiment of the invention, the filtering condition is to remove a SNP that satisfies one of the following conditions: SNP sequencing depth is less than 10 X, preferably less than 20 X _; and the difference in sequencing depth between the two bases in the hybrid SNP is higher than 20%, preferably more than 10%, more preferably more than 5%.

It should be noted that each device of the device can implement the corresponding steps in the method for determining the SNP information of the predetermined region of the chromosome of the present invention, and the foregoing description of the advantages and effects of the method for determining the SNP information in the predetermined region of the chromosome is also applicable to the device. , will not repeat them here.

In still another aspect of the invention, the invention also proposes a system for determining SNP information in a predetermined region of an embryonic chromosome. According to an embodiment of the present invention, referring to FIG. 9, the system 10000 includes: a first whole genome acquisition device 2000, and a SNP information determination device 1000, the first whole genome acquisition device 2000 being adapted to acquire a whole genome of the embryo; The SNP information determining device 1000 is connected to the first genome-wide acquiring device for determining SNP information in a predetermined region of the fetal chromosome, wherein the SNP information determining device 1000 is a predetermined region for determining a chromosome as described above. Device 1000 for SNP information. With the system 10000 of the present invention, the above-described method of determining SNP information in a predetermined region of a chromosome can be efficiently implemented, thereby enabling effective and accurate determination of SNP information in a predetermined region of an embryonic chromosome, and further, the information can be effectively used for Determining the genetic status of the fetus is normal, carrying or causing disease, which can provide a basis for pre-implantation monogenic disease detection, prenatal diagnosis of pregnant women or clinical disease treatment.

According to an embodiment of the invention, the first whole genome acquisition device 2000 is adapted to obtain a whole genome of the embryo by whole genome amplification of the embryonic cells. Wherein, according to an embodiment of the present invention, the first whole genome acquisition device 2000 is adapted to obtain at least one selected from the group consisting of PEP-PCR, DOP-PCR, OmniPlex WGA and MDA. The whole genome of the embryo.

According to an embodiment of the present invention, the system 10000 further includes: a second whole genome acquisition device (not shown), wherein the second whole genome acquisition device is adapted to acquire a whole genome of an embryo genetically related individual, wherein The embryo genetically related individual includes a father, a mother, and a proband of the embryo; a distinguishing SNP determining device (not shown) adapted to be based on the father's SNP information and the mother SNP information, determining a distinguishing SNP; a first haplotype determining device (not shown), the first haplotype determining device being adapted to be based on the distinguishing SNP and the SNP information of the proband Determining a father SNP haplotype and a mother SNP haplotype; and a second haplotype determining device (not shown) adapted to be based on SNP information of the embryo, father The SNP haplotype and the maternal SNP haplotype are determined by recombination of the father SNP haplotype and the maternal SNP haplotype to obtain the SNP haplotype of the embryo.

According to an embodiment of the present invention, the second haplotype determining apparatus further comprises: determining a father haplotype in which the SNP information of the embryo is significantly supported as a unit of the paternal source haplotype of the embryo; and determining the SNP information of the embryo is significant The supported maternal haplotype is the unit of the haplotype derived from the mother of the embryo. According to an embodiment of the invention, the number of distinguishing SNPs of not less than 10 is an indication of significant support.

It should be noted that each device included in the above system can implement the corresponding steps in the method for determining the SNP information of the predetermined region of the chromosome of the present invention, and the foregoing description of the advantages and effects of the method for determining the SNP information in the predetermined region of the embryonic chromosome is also applicable to This system will not be described here. Computer readable medium

In another aspect of the invention, the invention also provides a computer readable medium. According to an embodiment of the invention, the computer readable medium stores instructions, the instructions being adapted to be executed by the processor to determine SNP information in a predetermined region of the chromosome based on the sequencing result, it being understood that, when the program is executed, All or part of the steps of determining a chromosome including a predetermined region of the embryonic chromosome SNP information may be performed by instructing related hardware, and the computer readable medium may include: a read only memory, a random access memory, a magnetic disk or an optical disk, or the like. Wherein the sequencing result is obtained by: constructing a sequencing library for at least a part of the chromosome; screening the sequencing library with a probe, wherein the probe specifically recognizes the known region At least one of the SNP sites to obtain a target capture fragment, the target capture fragment comprising a SNP site; and sequencing the sequenced sequencing library to obtain sequencing results.

According to an embodiment of the invention, the predetermined area comprises a target gene region and a SNP-marker region. According to an embodiment of the invention, the target gene region comprises at least a portion of an exon and an exon adjacent region of the gene associated with the target disease. According to an embodiment of the present invention, the exon adjacent region includes a range of 50 bp upstream and downstream of the exon; and the SNP-marker region includes a range of 1 M upstream and downstream of the target gene. According to an embodiment of the invention, the probe has a length of 20 to 200 nt. Preferably, the length of the probe is 60 to 80 nt. According to an embodiment of the invention, the probe is provided in the form of a chip.

According to an embodiment of the invention, at least a portion of the chromosome is a whole genome of embryonic cells obtained by whole genome amplification. According to an embodiment of the invention, whole genome amplification is performed by at least one of PEP-PCR, DOP-PCR, OmniPlex WGA and MDA.

According to an embodiment of the invention, at least a portion of the chromosome is obtained by DNA extraction of peripheral blood of the organism.

The sequencing was performed according to an embodiment of the invention using Illumina Hiseq 2000, Genome Analyzer, Miseq sequencing series, Life technologies' SOLiD sequencing system, Ion Torrent sequencing system, Roche's 454 sequencing system.

According to an embodiment of the present invention, determining SNP information in the predetermined region based on the sequencing result further comprises: comparing the sequencing result with a reference sequence to obtain a unique alignment sequence; and using SNP analysis software to The unique alignment sequence acquires SNP information in the predetermined area. According to an embodiment of the invention, the alignment is performed using a BWA software package. According to an embodiment of the invention, after obtaining the unique alignment sequence, further comprising removing the sequence of PCR repeat extension from the unique alignment sequence. According to an embodiment of the present invention, the SNP analysis software is at least one selected from the group consisting of SAMtools and GATK. According to an embodiment of the invention, the filtering of the obtained SNP information is further included. According to an embodiment of the invention, the filtering condition is to remove a SNP that satisfies one of the following conditions: SNP sequencing depth is less than 10 X , preferably less than 20 X; and the difference in sequencing depth between the two bases in the hybrid SNP is higher than 20%, preferably more than 10%, more preferably more than 5%. It should be noted that, in theory, the higher the sequencing depth, the closer the heterogeneous SNP sequencing depth ratio is to 1: 1, and the specific value of the sequencing depth and the sequencing depth difference in the SNP filtering condition, and the sample at the time of implementation, sequencing Depth, sequencing quality related, can be adjusted according to actual needs. In one embodiment of the present invention, the embryo genetically related individual has a sequencing depth of 50 X, the embryo sample has a sequencing depth of 100 X, and the sequencing quality is good, so that the remaining SNPs are accurately aligned with the actual SNP, and strictly filtered. Filter out SNPs below 10 X and also filter out heterozygous SNPs with a difference in sequencing depth greater than 10%, removing a large number of heterozygous SNPs; understandably, using higher depth sequencing (> 100 X ), if Strict filtering ensures that the remaining SNPs are true and accurate. Filter out SNPs such as below 20 X and filter out heterozygous SNPs with a difference of more than 5%. Conversely, for relatively low-depth sequencing data, filter can be set higher than 20% heterozygous SNP.

According to an embodiment of the invention, at least a portion of the chromosome is a whole genome of an embryo such that SNP information in a predetermined region of the fetal chromosome is determined for the whole genome of the fetus.

Thus, in accordance with an embodiment of the present invention, the instructions are further adapted to be executed by a processor to: acquire a whole genome of an embryo genetically related individual, wherein the embryonic genetically related individual comprises a father, a mother, and a proband of the embryo And determining a SNP information of the father based on the whole genome of the embryo genetically related individual, the mother's SNP information and SNP information of the proband; determining a differentiated SNP based on the SNP information of the father and the SNP information of the mother; determining the father based on the distinguishing SNP and the SNP information of the proband a SNP haplotype and a maternal SNP haplotype; and determining a combination of the father SNP haplotype and the maternal SNP haplotype based on the SNP information of the embryo, the father SNP haplotype, and the maternal SNP haplotype, In order to obtain the SNP haplotype of the embryo. Wherein, according to an embodiment of the present invention, the SNP haplotype of the embryo is obtained by: determining that the SNP information of the embryo significantly supports the father haplotype as the paternal source haplotype of the embryo; and determining the embryo The SNP information significantly supports the maternal haplotype as the maternal source haplotype of the embryo. According to an embodiment of the present invention, the number of the distinguishing SNPs is not less than 10, which is an indication of significant support.

In still another aspect of the invention, the invention also proposes an apparatus for determining SNP information in a predetermined region of a chromosome. According to an embodiment of the invention, the apparatus comprises: a sequencing device; and the aforementioned computer-readable medium storing instructions adapted to be executed by the processor to determine SNP information in a predetermined region of the chromosome based on the sequencing result. The apparatus of the present invention can accurately and efficiently determine SNP information in a predetermined region of a chromosome, for example, information on mutation sites associated with a pathogenic gene of a sample, and further, the information can be effectively used to determine a genetic state of a subject. It is normal, carried or pathogenic, which can provide a basis for clinical disease detection or treatment. Wherein, when at least a portion of the chromosome is a whole genome of an embryo, the computer readable medium stores instructions adapted to be executed by a processor to determine a SNP in a predetermined region of the fetal chromosome for the whole genome of the fetus information.

In still another aspect of the invention, the invention also provides a system for determining SNP information in a predetermined region of an embryonic chromosome. According to an embodiment of the invention, the system comprises: a sequencing device; and the aforementioned computer readable medium storing instructions adapted to be executed by the processor to determine SNP information in a predetermined region of the fetal chromosome for the whole genome of the fetus. The system of the invention can accurately and effectively determine the SNP information in the predetermined region of the embryo chromosome, and further, the information can be effectively used to determine that the genetic state of the fetus is normal, carried or pathogenic, thereby enabling preimplantation of a single gene for the embryo. Provide evidence for disease testing, prenatal diagnosis of pregnant women or treatment of clinical diseases.

It should be noted that the advantages and effects of the computer readable medium of the present invention described above are equally applicable to the above-described apparatus for determining SNP information in a predetermined region of a chromosome and a system for determining SNP information in a predetermined region of an embryonic chromosome, which will not be described herein. The solution of the present invention will be explained below in conjunction with the embodiments. Those skilled in the art will appreciate that the following examples are merely illustrative of the invention and are not to be considered as limiting the scope of the invention. In the examples, the specific techniques or conditions are not indicated, according to the techniques or conditions described in the literature in the field (for example, refer to J. Sambrook et al., Huang Peitang et al., Molecular Cloning Experimental Guide, Third Edition, Science Press) Or follow the product manual. The reagents or instruments used are not specified by the manufacturer, and are conventional products that are commercially available, for example, from Illumina.

General method Referring to Figure 1, the main steps in the following embodiments are as follows:

1. Design probes based on the target area, custom capture chips

The capture chip designed by the present invention comprises two parts, one part is a target gene region; the other part is a SNP-marker area. The target gene region is mainly the exon and the exon-intron junction region, which covers most of the pathogenic mutations and can be used for direct detection of disease mutations. The SNP-marker region is the upstream and downstream region of the target gene region, which contains thousands of high-frequency SNPs (that is, SNPs with a frequency greater than 0.3 in the 1000-person database). This region is used to detect parental differential SNPs, combined with the family. The proband SNP information constructs the disease-causing gene haplotype. Due to the presence of genetic recombination between homologous chromosomes in meiosis, the SNP-haplotype of the gene is affected. The smaller the distance between SNP-markers, the smaller the recombination rate. When the distance is less than 1M, the recombination rate is less than 1% (the human recombination rate is 1% per 1M area). The range of SNP-marker regions contained in the chip capture can be determined based on the general recombination rate of the human genome. The range of upstream and downstream of the target gene region is generally selected to be small, and the captured SNP is accurate, but the number is small, and the range of selection is large. The number of captured SNPs is large, but the probability of recombination occurring in a large range is also higher, and the number of large SNPs in the upstream and downstream ranges is selected, and the design and synthesis cost is relatively high. In one embodiment of the present invention, in order to reduce the influence of genetic recombination and ensure detection accuracy, the SNP-marker region is limited to 1M upstream and downstream of the target gene, thereby reducing the probability of recombination of the target gene region and the SNP-marker region. To one ten thousandth.

1.1 Target gene capture chip design

First determine the target gene, then H _g 19 as a reference sequence to determine the location of a target gene, to finalize the capture region.

1.2 SNP-marker capture chip design

According to the target gene positions determined in 1.1, the SNP loci with higher frequency in the population were selected within 1M distance from the upstream and downstream of the position. Having the selected SNP site located in the middle of the target capture segment is advantageous for increasing the probability of the SNP being captured. In one embodiment of the present invention, since the size of the constructed library is about 200 bp, the capture fragment size of the capture probe is mainly About 200 bp, in order to improve the capture efficiency of the target SNP, the SNP-marker capture region is the region of these SNP sites and about 100 bp above and below (so that the selected SNP is located at 1/2 200 bp).

1.3 Chip Evaluation

After the chip design is completed, the probe is specifically evaluated by the Sequence Search and Alignment by Hashing Algorithm (SSAHA), and the chip is synthesized after the evaluation is passed.

2, family sample preparation

Embryonic cell genomes were collected and whole-genome amplification (WGA) of embryonic cells was performed using PEP-PCR, DOP-PCR, OmniPlex WGA or MDA (multiple strand displacement amplification) methods, and parental and proband peripheral blood was extracted (or Samples of other family members of the family were collected according to the type of disease) DNA.

3. Library preparation Peripheral blood DNA of the above parents and probands according to the sequencing requirements of the selected sequencing platform (Illumina Hiseq2000, Genome Analyzer, Miseq sequencing series, Life technologies SOLiD sequencing system, Ion Torrent sequencing system or Roche's 454 sequencing system) The WGA products of the embryonic cell genome were separately constructed, and 2100, Q-PCR and enrichment were detected after the library was constructed.

4, probe capture hybridization

The libraries obtained above were mixed, and the mixed library was hybridized with the designed capture probe, and the hybridization procedure was followed by the technical procedure provided by the chip synthesis service company.

5. High-throughput sequencing

Sequencing was performed using Illumina Hiseq2000, Genome Analyzer, Miseq sequencing system ij ij , Life technologies' SOLiD sequencing system, Ion Torrent sequencing system or Roche's 454 sequencing system.

6, data analysis

Referring to Figure 1, the analysis process includes:

6.1, reference sequence alignment

According to the requirements of different sequencing platforms, the low-quality sequencing data is filtered out, the sequence containing the library linker is removed, and the sequencing data is compared with the human reference genome by using analysis software such as BWA (Burrows Wheeler Aligner) software package, according to the default optimality. The parameters (-1 -i 15 -L -k 2 -1 31 -t 4) were compared in the alignment result to the read of the chip target region and the sequence of the PCR repeat extension was removed by SAMtools for subsequent analysis.

6.2, SNP calling

For the valid data obtained, SNP analysis software such as SAMtools and GATK are used for analysis to obtain all SNP information in the target area.

6.3, SNP filtering

The SNP obtained above is filtered under certain conditions to improve the accuracy of the SNP. The filtration conditions are: Filter out any of the following conditions: 1. The SNP sequencing depth is less than 10 X; 2. The difference in sequencing depth between the two bases in the hybrid SNP is higher than 10%. This is because the low sequencing depth may result in the failure of one of the bases in the partially heterozygous SNP. The difference in the depth of the two bases in the heterozygous SNP may not be correctly distinguished from the sequencing error. . Filtering by the above conditions can remove potentially erroneous SNPs.

6.4. Screening can effectively distinguish bases of parental haplotypes (ie, differentiated SNPs)

A distinguishing SNP means that one of the four bases of the parent at a certain position (the autosome) is different from any other base at the position, and the base can be determined in the four haplotypes of both parents. The only one, if the parental genotype of a certain location is AA, AG, then the G base is a differentiated SNP, because G can determine the only one haplotype at this position, and A exists in the other three haplotypes. , Unable to determine the only haplotype. A specific example is shown in Figure 2. According to the requirements of the figure, the parental distinguishing SNPs can be selected according to the Mendelian genetic principle. 6.5. Building parental haplotypes

According to the Mendelian genetic principle and the chain exchange law, the parental SNP locus and the proband SNPs were combined to construct the parent SNP-haplotype. The construction principle is shown in Figure 4, which firstly combines the parental SNPs locus information. And the proband SNPs information, constructing the parental haplotype according to the basic Mendelian genetic principle and the chain exchange law; then combining the parental haplotype results and embryonic SNPs information to predict the embryo haplotype results. Wherein, as shown in Figure 4, the red-marked base letter indicates the father's distinguishing SNPs; the yellow-marked base letter indicates the mother's SNPs; the italicized and underlined base letters indicate that the site is in WGA ADO occurs during the process; G* indicates the pathogenic mutant base; -- indicates the site where the test failed. Among them, the SNP-haplotype consists entirely of distinguishing SNP position bases, each of which contains a plurality of distinguishing SNPs, and the distinguishing SNPs in the haplotype can be distinguished from other haplotypes. For example, the parental genotypes of a certain position are AA, AG, G is a differentiated SNP, A is a non-differentiated SNP, and A and G are the bases in the haplotype, respectively. Since the two haplotypes of the proband are inherited from the parents, the haplotype of the disease-causing mutation can be determined according to the disease. If the dominant genetic disease, the father is sick, the mother is normal, the haplotype inherited by the proband from the father is the haplotype where the disease is the mutation; if the recessive genetic disease, the parents are carriers, the proband Both haplotypes of the disease (disease) are haplotypes in which the disease is mutated.

6.6. Analysis of embryo haplotypes

Since the two haplotypes of the embryo are inherited from each parent, the analysis can be based on the information of the embryonic SNPs combined with the parent SNP-haplotype, and the combination of the two haplotypes of the embryonic SNPs is determined. The analysis principle is shown in Fig. 4. . In the analysis, the number of differentiated SNPs can be statistically calculated, and the embryo haplotype is determined according to the numerical value, as shown in Fig. 5. If the number of singular-type SNPs is greater than 10, it can be determined that the haplotype is one of the haplotypes of the embryo; if the number of singular-type SNPs is less than 4, the haplotype can be judged to be a SNP error; In an embodiment of the invention, in order to ensure accuracy, the number of SNP supports of a correct haplotype is set to be no less than 10, and the number of SNPs supported by the haplotype is not more than 3, as set in the 6.3 step. SNP filtration conditions are more stringent, that is, the correct rate of SNP used in haplotype construction is higher, and the number of candidate SNPs is large. The actual test data indicates that the number of SNPs supported by the correct haplotype is much higher than 10, and the number of incorrect haplotype SNPs is supported. Usually 0. For an autosomal disease, after this process analysis, only 2 haplotypes can be obtained for each embryo; for an X-chromosome disease, one (male) or two (female) can be obtained through this procedure. The required haplotype.

6.7, analysis of results

The genetic state of the embryo is judged to be normal, carried or pathogenic depending on whether the embryo is genetically parental.

Example 1

In this example, a general method and a detection procedure are used for a phenylketonuria (classic) family (family-type, autosomal recessive) sample and a fertility progressive muscular dystrophy (DMD) family (family two) , X chromosome recessive inheritance) samples were tested. A couple of families obtained 7 embryos by IVF and used MF-PCR method for PAH Gene detection, screening of 2 normal embryo implantation, and finally obtaining a baby girl, the umbilical cord blood gene test confirmed that the baby girl is normal. The two couples obtained 9 embryos through IVF, and used the MF-PCR method to carry out the DMD gene PGD. Three normal embryos were selected, and two of them were selected. Finally, a male baby (one of which was not developed) was passed through the umbilical cord. Blood genetic testing confirmed that the baby was normal.

The family is the same as the parent, the sick daughter (proband) peripheral blood and 7 embryo blastomere single cells. According to the PAH gene test, the father is a carrier of PAH gene R243Q (c.728G>A), the mother is a carrier of PAH gene V399V (C.1197A>T) mutation, and the proband is PAH gene R243Q (c.728G>A ) Compound mutation with V399V (C.1197A>T), which is characterized by phenylketonuria. Seven embryo blastomere single cells (labeled Ell, E12, E13, E14, E15, E16, E17, respectively) were tested by multiplex PCR after WGA. The results are shown in Table 1.

Table 1 MF-PCR results of one embryo in the family

The second family sample included parents, daughter (normal phenotype) peripheral blood and 9 embryo blastomere single cells. After the DMD gene test, the father was normal, and the mother and daughter were carriers of the DMD gene R2905X (c. 8713C>T). Nine embryonic blastomeres (labeled E21, E22, E23, E24, E25, E26, E27, E28, E29) were tested by multiplex PCR after WGA. The results are shown in Table 2.

Table 2 Results of MF-PCR of two embryos in family 2

Sample test result

E21 female, normal

E22 Female, R2905X (c. 8713C>T) carrier

E23 male, R2905X (c. 8713C>T) mutation

E24 female, R2905X (c. 8713C>T) carrier

E25 male, R2905X (c. 8713C>T) mutation

E26 female, normal

E27 Female, R2905X (c. 8713C>T) carrier

E28 male, normal E29 male, R2905X (c. 8713C>T) mutation

The above samples were retrospectively tested by using the technical scheme and the detection procedure of the present invention, and the obtained test results were consistent with the MF-PCR detection results, and the result coincidence rate was 100%. The results show that the technology of the present invention can accurately detect the SNP information of the predetermined region of the embryo chromosome, and further detect the embryo genotype to guide the embryo implantation based on the obtained SNP information, and has a short detection period (11 days), high throughput, low cost. Advantage. The specific implementation is as follows:

1. Sample extraction and WGA (1 day)

Parents, probands peripheral blood using QIAamp DNA Blood MidiKit (Qiagen) kit according to the instructions to extract DNA, and using Nanodrop detection, the concentration is greater than 30ng / ul. 7 embryo blastomere single cells using REPLI-g ® Single Cell WGA kit (Qiagen) kit and complete genome-wide amplification according to the instructions. The product was subjected to agarose gel electrophoresis and qubit quantification. The sample marks are: Fl, Ml, Pl, Ell, E12, E13, E14, E15, E16, E17, F2, M2, P2, E21, E22, E23, E24, E25, E26, E27, E28, E29.

2. Illumina Hiseq library construction (2 days)

The DNA samples and WGA products obtained above were first interrupted with a CovarisTM interrupter to a fragment of 200 bp, and then constructed according to the requirements of the illumia® HiSeq2000TM sequencer. The specific steps are as follows:

2.1 Sample interruption

A total of 3 ug of 22 genomic DNA and WGA products were interrupted on Covaris S2 (Covaris) using Covaris microTube with AFA fiber and Snap - Cap. The breaking conditions are as follows:

After interruption, purified with Qiagen DNA Purification Kit (Qiagen), dissolved in 327.5μ1 EB

2.2 end repair:

The purified product was subjected to 37.5 μί, and the end-repair reaction was carried out, and the system was as follows (reagents were purchased from Enzymatics):

Previous product 3 .5

10x Polynucleotide Kinase Buffer (B904) 5 μL

dNTP Solutm Set (10mM each) 2 μ

T4 DNA polymerase 2.5

T4 polynucleotide kinase 2.5

Klenow Fragment 0.5

The reaction conditions were: Thermomixer 20 ° C warm bath for 30 min.

The reaction product was recovered by Qiagen DNA Purification Kit and dissolved in 32 μM of hydrazine.

2.3 3' end addition reaction

The 3' end of the DNA was reacted with A, and the system was as follows (reagents were purchased from Enzymatics):

The reaction conditions were: Thermomixer at 37 ° C for 30 min.

The reaction product was recovered and purified by Qiagen DNA Purification Kit (QIAGEN) and dissolved in 38 μl of EB. 2.4 connection Illumina Hiseq connector (adaptor)

Twenty-two libraries were each added with different library tags, and the correspondence between library tags and libraries was recorded. The system is as follows (the reagents are all purchased from Illumina):

The reaction conditions were as follows: Thermomixer 16 ° C bath for 16 h.

The reaction product was purified by 60 ul of Ampure Beads (Beckman Coulter Genomics) and dissolved 20 μM.

2.5 After the library was constructed, the range distribution of the fragments was determined by Agilent® Bioanalyzer 2100. The results are shown in Figure 3. The results of library concentration detected by real-time PCR (QPCR) are shown in Table 3:

Table 3 QPCR quantitative detection library relative concentration

Sample library number QPCR concentration (nM)

F1 Library 1 66.14 Ml Library 2 53.62

PI Library 3 47.35

Ell Library 4 76.30

E12 Library 5 53.77

E13 Library 6 90.65

E14 Library 7 78.46

E15 Library 8 47.86

E16 Library 9 71.87

E17 Library 10 51.92

F2 Library 11 60.54

M2 Library 12 63.42

P2 Library 13 57.65

E21 Library 14 67.35

E22 Library 15 54.76

E23 Library 16 70.66

E24 Library 17 75.26

E25 Library 18 57.14

E26 Library 19 72.07

E27 Library 20 56.91

E28 Library 21 71.87

E29 Library 22 61.94

3, chip capture (3 days)

The above 22 libraries were divided into 2 groups of 11 each, which were mixed in equal proportions into a total of 500 ng of 2 mixed libraries. Hybrid Library Hybridization was performed using NimbleGen's custom-made liquid phase chip SeqCap EZ Choice XL Library (see Nimblegen SeqCap EZ Exome Capture Operating Instructions for specific procedures). After 72 hours of hybridization, elution was performed using the NmibleGenwashkit according to the instructions. The final eluted product was subjected to enrichment detection, Qpcr and 2100 detection.

4, Hiseq2500 sequencing (3 days)

The above hybridization products were sequenced on an illumina® HiSeq2500TM sequencer, and the number of sequencing cycles was PElOlindex (ie, bidirectional lOlbp index sequencing), in which the instrument parameters were set and operated in accordance with the illumina® operating manual (available at http:〃 www.illumina.com/support) /documentation.ilmn gets).

5. Analysis of results (2 days)

After sequencing is complete, the sequencing data is first subjected to mass filtration and removal of the contaminant-contaminated sequence, high-quality sequencing reads. Perform the following analysis:

5.1 Overall data evaluation

In the data analysis process, the sequencing reads were aligned to the human reference genome (HG19, NCBI release GRCh37) using the comparison software BWA (version 0.5.10), and the parameter was set to (-1 -i 15 -L -k 2 - 1 31 -t 4), the only comparison in the alignment results to the target region of the chip and the SAMtools removal PCR repeat extension sequence for subsequent analysis. The amount of data obtained by sequencing is shown in (Table 4).

The peripheral blood samples of parents and probands were sequenced to a depth of approximately 100x, and the embryonic cell WGA samples were sequenced to a depth of approximately 50χ. Then, a sample SNP and indel analysis were performed using the Genome Analysis Toolkit (GATK) software package to obtain the genotype of each sample. Part of the gene region genotypes are shown in (Table 5, Table 6):

Table 5 samples 3 points PAH base region genotype

立_3⁄4 father mother proband El E2 E3 E4 E5 E6 E7

103075083 AC CC CC CC AC AC CC CC CC AC

103075442 AA AT AT AA AA AA AA AT AA AT

103075731 AA AT AA AT AT AT AT AA AT AA

103077486 CC CG CC CC CG CG CG CC CG CC

103099439 GG AG GG AG AG AG AG GG AG

103104834 TT AA AT AT AT AT AT AT AT AT

103106883 TT TG TT TG TG TG TG TT TG

103107367 GG TG TG GG GG TG GG TG

103110943 TC CC TC TC CC TC TC TC CC

103132740 AG AA AG AG AA AA AG AG AG AA

103140560 TT TC TC TT TT TT TT TC TT TC

103148974 TC TT CC TC TT TT TC TC TC TT

103152029 AC CC AC AC CC CC AC AC AC CC

103154308 AG AA AA AA AA AG AA AA AG

103164355 TC CC CC TC TC CC CC CC TC

103164544 AG AA AA AA AG AG AA AA AA AG

103174710 AC AA AA AA AC AC AA AA AA AC

103175259 CT CC CC CC CT CT CC CC CC CT

103176419 GC CC CC CC GC GC CC CC GC

103214192 CA AA AA AA CA CA AA AA AA CA

103237426 AA ΑΓ ΑΓ AA AA AA AT AA ΑΓ

103246707 GA GG GA GG GG GA GG

103246787 CG CC CG CG CC CC CG CG GG CC

103424228 TG TT TT TT TG TG TT TT TT TG

103425386 TG GG GG GG TG TG GG GG GG TG

103428340 AG AA AG AG AA AA AG AG AG AA

103428555 AA AG AA AG AG AG AG AA AG AA

103429407 GG TG GG TG TG TG GG TG GG 103432532 CC TC TC CC CC CC TC CC TC

103434254 AG AA AA AA AG AG AA AA AA AG

103443364 CT TT TT TT TT CT TT TT TT CT

103445655 CT CC CC CT CC CC CC CC CT

103448748 TC TT TC TC TT TT TC TC CC TT

103456084 AT AA AT AT AA AA AT AT TT AA

103456562 TT CT CT TT TT TT TT CT TT CT

103459335 CT TT TT TT CT CT TT TT TT CT

103460207 GT TT TT TT GT GT TT TT TT GT

103463741 AA AG AG AA AA AA AA AG AA AG

103488660 TT CC TC TC TC TC TT TC TC TC

103488841 CT TT TT TT CT CT TT TT TT CT

103491018 TG GG GG GG TG GG GG GG TG

103495380 AG GG GG GG AG AG GG GG GG

103496446 TT CT CT TT TT TT TT CT TT CT

103501101 AC AA AA AA AC AC AA AA AA AC

103501562 CC TC CC TC TC TC TC CC TC CC

103515016 TT AT TT AT AT AT AT TT AT TT This SNP information corresponds to the antisense strand of the reference genome. - Indicates that SNP is not available at this point (no data coverage or depth is too low), and italics indicate disease-causing mutations. The 103237426 coordinates and the 103246707 coordinates in the table correspond to the V399V (C.1197A>T) and R243Q (c.728G>A) sites in the PAH database. For ease of understanding, the antisense strand information of the two mutation sites has been changed to the formal representation of the corresponding sense strand.

Table 6 Part of the sample DMD gene regional genotype

Proof

Location Father Mother E21 E22 E23 E24 E25 E26 E27 E28 E29

31838359 T GT GT TT TG G TG G TT TG G

31859140 G AG GG AG GG G GG G AG GG A G

31859179 A AG AG AA AG G AG G AA AG A G

31860203 A AG AG AA AG G AG G AA AG A G

31863187 A AG AA AG AA A AA A AG AA G A

31863193 G AT AT GT AG A AG A GT AG T A

31863313 T TC TC TT TC C TC C TT TC TC

C8.1780/C10ZN3/X3d 086Ζ ΪΟΖ OAV 32889584 C TC CC TC CC C CC C TC CC TC

32889622 A AG AA AG AA A AA A AG AA G A

32889854 G AG GG AG GG G GG G AG GG A G

32890041 T GT TT TG TT T TT T TT G T

- Indicates that SNP is not available at this point (no data coverage or depth is too low), and italics indicate disease-causing mutations. The 32456388 coordinates in the table correspond to the DMD database.

R2905X (c. 87130T) locus.

5.2 Parental haplotype construction

Parental haplotypes can be constructed according to the SNP information of parents and probands according to the method shown in Figure 4 above, including the haplotypes in which the disease-causing mutations are located. Tables 7 and 8 show the haplotypes of PAH and DMD genes, respectively. Construct.

Table 7 PAH base area parental single-type construction

Location Father Mother Proband F-Hapl F-Hap2 M-Hapl M-Hap2

103075083 AC CC CC C A C C

103075442 AA AT AT A A T A

103075731 AA AT AA A A A T

103077486 CC CG CC C C C G

103099439 GG AG GG G G G A

103104834 TT AA AT T T A A

103106883 TT TG IT T T T G

103107367 GG TG TG G G T G

103110943 TC CC TC T C C C

103132740 AG AA AG G A A A

103140560 TT TC TC T T C T

103148974 TC TT CC c T T T

103152029 AC CC AC A C c C

103154308 AG AA AA A G A A

103164355 TC CC CC C T c C

103164544 AG AA AA A G A A

103174710 AC AA AA A C A A

103175259 CT CC CC C T C C

103176419 GC CC CC CG c C 103214192 CA AA AA A c AA

103237426 AA ΑΓ ΑΓ A A T A

103246707 GA GG GA A G G G

103246787 CG CC CG G C C C

103424228 TG TT IT T G T T

103425386 TG GG GG G T G G

103428340 AG AA AG G A A A

103428555 AA AG AA A A A G

103429407 GG TG GG G G G T

103432532 CC TC TC C C T c

103434254 AG AA AA A G A A

103443364 CT TT IT T C T T

103445655 CT CC CC C T C c

103448748 TT TC IT T T T c

103456084 AA TA TA A A T A

103456562 TT CT CT T T c T

103459335 CT TT TT T C T T

103460207 GT TT IT T G T T

103463741 AA AG AG A A G A

103488660 TT CC TC T T C c

103488841 CT TT IT T C T T

103491018 TG GG GG G T G G

103495380 AG GG GG G A G G

103496446 TT CT CT T T C T

103501101 AC AA AA A C A A

103501562 CC TC CC C C C T

103515016 TT AT ΊΤ T T T A

In the table, F-Hapl and F-Hap2 respectively represent the father's two haplotypes, and M-Hapl and M-Hap2 represent the mother's two haplotypes, respectively. This SNP information corresponds to the negative strand of the reference genome. - Indicates that there is no SNP (no data coverage or too low depth) and italic mutations. The 103237426 coordinates and 103246707 coordinates in the table correspond to the V399V (c.ll97A>T) and R243Q (c.728G>A) sites in the PAH database. For ease of understanding, the antisense strand information of the two mutation sites has been changed to the form representation of the corresponding sense strand.

Table 8 DMD gene parent haplotype construction

LI

C8.1780/C10ZN3/X3d 086Z ΪΟΖ OAV 32579849 C TC CC c CT

32580579 c TC TC c T C

32827465 A AG AG A G A

32858090 T TC TC T C T

32862539 G AG GG G G A

32886984 C CG CC C C G

32887091 T TC TT T T C

32887278 A AG AA A A G

32889584 C TC CC C C T

32889622 A AG AA A A G

32889854 G AG GG G G A

32890041 T GT TT TTG The F-Hap in the table indicates the father haplotype (the male has only one X chromosome), M-Hapl and M-Hap2 indicate the mother's two haplotypes respectively. The italic is the pathogenic mutation. The coordinates of 32456388 in the table correspond. The R2905X (c. 87130T) site in the DMD database.

5.3 Embryo haplotype analysis

According to the embryonic SNP information in Tables 5 and 6, and the parental haplotype information in Tables 7 and 8, the embryo-disaggregated SNPs were counted according to the method shown in Fig. 4, and then the embryos were judged according to the number of SNPs supported by each haplotype. The haplotype is used to determine whether the embryo is ill. For autosomes, an embryo has only 2 haplotypes, and generally only two haplotypes have SNP support, but occasionally a 3rd or 4th haplotype occurs, which is due to a SNP error. The SNP is less than 5% in the total SNP. Furthermore, due to the existence of ADO and sequencing errors, there may be individual SNP loss or error in the embryonic SNP. To avoid the impact of this error on the results, we require a haplotype with at least 10 differentiated SNPs to support. The large amount of data in this embodiment shows that the wrong haplotypes support no more than three distinct SNPs, and the correct haplotypes support more than 20 differentiated SNPs, indicating that individual errors will not affect. Embryo haplotype judgment. Therefore, in order to ensure accurate results, the present invention defines the number of SNP supports of the correct haplotype to be no less than 10, and the number of SNPs of the wrong haplotype is not more than three. The specific analysis process is shown in Figure 5. Figure 5 shows the embryonic state analysis process for a chromosomal recessive genetic disease in which the parent's Hapl is the haplotype of the disease-causing mutation. The individual embryos shown in the figure show that the SNP supports the third haplotype, but the number of SNPs supported is very small and does not affect the judgment of the results.

The embryo status can be judged from the above analysis results, as shown in Table 9. This result is consistent with the results of the traditional method of MF-PCR, and the coincidence rate is 100%. . The above process development software is automatically completed.

Table 9 results of each embryo test

Sample test result Ell R243Q (c.728G>A) carrier

E12 is normal

E13 is normal

E14 R243Q (c.728G>A) carrier

E15 R243Q (c.728G>A) combined with V399V (C.1197A>T) mutation

E16 R243Q (c.728G>A) carrier

E17 V399V (C.1197A>T) carrier

E21 female, normal

E22 Female, R2905X (c. 8713C>T) carrier

E23 Male, R2905X (c. 8713C>T) Mutation

E24 Female, R2905X (c. 8713C>T) carrier

E25 male, R2905X (c. 8713C>T) mutation

E26 female, normal

E27 Female, R2905X (c. 8713C>T) carrier

E28 male, normal

E29 male, R2905X (c. 8713C>T) mutation

Industrial applicability

The method, system and computer readable medium of the present invention for determining SNP information in a predetermined region of an (embryo) chromosome can be effectively used to determine SNP information in a predetermined region of a chromosome, such as SNP information in a predetermined region of an embryonic chromosome, and the accuracy of the information High, can be effectively used to determine the genetic status of the fetus is normal, carrying or causing disease, which can provide a basis for pre-implantation monogenic disease detection, prenatal diagnosis of pregnant women or clinical disease treatment. Although specific embodiments of the invention have been described in detail, those skilled in the art will understand. Various modifications and alterations of those details are possible in light of the teachings of the invention. The full scope of the invention is given by the appended claims and any equivalents thereof.

In the description of the present specification, the description of the terms "one embodiment", "some embodiments", "illustrative embodiment", "example", "specific example", or "some examples", etc. Particular features, structures, materials or features described in the examples or examples are included in at least one embodiment or example of the invention. In the present specification, the schematic representation of the above terms does not necessarily mean the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in a suitable manner in any one or more embodiments or examples.

Claims

claims

1. A method for determining SNP information in a predetermined region of a chromosome, which is characterized by including:

Construct a sequencing library for at least a portion of the chromosome;

Screening the sequencing library using a probe, wherein the probe specifically recognizes at least one known SNP site in the predetermined region to obtain a target capture fragment, and the target capture fragment includes the SNP site;

Sequencing the screened sequencing library to obtain sequencing results; and

Based on the sequencing results, SNP information in the predetermined region is determined.

2. The method according to claim 1, characterized in that the predetermined region includes a target gene region and a SNP-marker region.

3. The method according to claim 2, wherein the target gene region includes at least a part of the exons and exon-adjacent regions of the target gene.

4. The method according to claim 3, wherein the exon adjacent region includes a 50 bp region upstream of the 5' end of the exon and a 50 bp region downstream of the exon.

5. The method according to claim 2, characterized in that the SNP-marker region includes a range of 1M upstream and downstream of the target gene.

6. The method according to claim 1, characterized in that the length of the probe is 20~200nt.

7. The method according to claim 6, characterized in that the length of the probe is 60~80nt.

8. The method of claim 1, wherein the probe is provided in the form of a chip.

9. The method according to claim 1, characterized in that at least a part of the chromosome is obtained by DNA extraction from the peripheral blood of the organism.

10. The method according to claim 1, characterized in that it is carried out using at least one selected from the group consisting of Illumina Hiseq2000, Genome Analyzer, Miseq sequencing series, SOLiD sequencing system of Life technologies, Ion Torrent sequencing system and Roche's 454 sequencing system. The sequencing.

11. The method according to claim 1, characterized in that, based on the sequencing results, determining the SNP information in the predetermined region further includes:

Compare the sequencing results with the reference sequence to obtain a unique alignment sequence; and

Utilize SNP analysis software to obtain SNP information in the predetermined region from the unique aligned sequence.

12. The method according to claim 11, characterized in that the comparison is performed using the BWA software package.

13. The method according to claim 11, characterized in that, after obtaining the unique alignment sequence, further comprising removing PCR repeat expansion sequences from the unique alignment sequence.

14. The method according to claim 11, characterized in that the SNP analysis software is at least one selected from SAMtools and GATK.

15. The method according to claim 11, further comprising filtering the obtained SNP information.

16. The method according to claim 15, characterized in that the filtering condition is to remove SNPs that meet one of the following conditions:

SNP sequencing depth is less than 10X, preferably less than 20X; and

The difference in sequencing depth of the two bases in the hybrid SNP is higher than 20%, preferably higher than 10%, and more preferably higher than 5%.

17. A method for determining SNP information in a predetermined region of an embryo's chromosome, characterized by: obtaining the entire genome of the embryo; and

For the entire genome of the embryo, the SNP information in the predetermined region of the fetal chromosome is determined according to the method described in any one of claims 1 to 16.

18. The method according to claim 17, characterized in that the whole genome of the embryo is obtained by performing whole genome amplification of embryonic cells.

19. The method of claim 18, wherein the whole genome amplification is performed by at least one selected from the group consisting of PEP-PCR, DOP-PCR, OmniPlex WGA and MDA.

20. The method of claim 17, further comprising:

Obtain the whole genome of embryonic genetically related individuals, where the embryonic genetically related individuals include the father, mother and proband of the embryo; and

Based on the whole genome of the embryo genetically related individual, determine the SNP information of the father, the SNP information of the mother and the SNP information of the proband respectively;

Based on the father's SNP information and the mother's SNP information, determine the distinguishing SNP;

Based on the discriminating SNP and the SNP information of the proband, determine the paternal SNP haplotype and the maternal SNP haplotype; and

Based on the SNP information of the embryo, the paternal SNP haplotype and the maternal SNP haplotype, a combination of the paternal SNP haplotype and the maternal SNP haplotype is determined to obtain the SNP haplotype of the embryo.

21. The method according to claim 20, characterized in that the SNP haplotype of the embryo is obtained through the following steps:

Determine the paternal haplotype significantly supported by the embryo's SNP information as the embryo's paternally derived haplotype; and Determine the maternal haplotype significantly supported by the embryo's SNP information as the embryo's maternally derived haplotype.

22. The method according to claim 21, wherein the number of discriminating SNPs being no less than 10 is an indication of significant support.

23. A device for determining SNP information in a predetermined region of a chromosome, characterized by including:

A library construction device, the library construction device is suitable for constructing a sequencing library for at least a part of the chromosome; Library screening device, the library screening device is connected to the library construction device, and is suitable for screening the sequencing library using a probe, wherein the probe specifically recognizes a known SNP site in the predetermined region At least one of, in order to obtain a target capture fragment, the target capture fragment includes a SNP site;

A sequencing device, which is connected to the library screening device and is suitable for sequencing the screened sequencing library to obtain sequencing results; and

An analysis device, the analysis device is connected to the sequencing device, and is adapted to determine the SNP information in the predetermined region based on the sequencing results.

24. The device according to claim 23, wherein the predetermined region includes a target gene region and a SNP-marker region, and the target gene region includes an exon of the target gene and an exon adjacent region. At least part of the exon adjacent region includes a 50 bp region upstream of the 5' end of the exon and a 50 bp region downstream of the exon, and the SNP-marker region includes a 1 M range upstream and downstream of the target gene.

25. The device according to claim 23, characterized in that the length of the probe is 20~200nt.

26. The device according to claim 25, characterized in that the length of the probe is 60~80nt.

27. The device according to claim 23, wherein the probe is provided in the form of a chip.

28. The apparatus according to claim 23, further comprising a chromosome preparation device connected to the library construction device and suitable for obtaining the entire genome of embryonic cells through whole genome amplification, The entire embryonic cell genome constitutes at least part of the chromosomes.

29. The apparatus according to claim 28, characterized in that the chromosome preparation device is adapted to be selected from the group consisting of:

At least one of PEP-PCR, DOP-PCR, OmniPlex WGA and MDA performs the whole genome amplification.

30. The apparatus according to claim 23, further comprising a DNA extraction device connected to the library construction device and adapted to extract DNA from the peripheral blood of an organism, so as to At least a portion of the chromosome is obtained.

31. The equipment according to claim 23, wherein the sequencing device is selected from the group consisting of Illumina Hiseq2000, Genome Analyzer, Miseq sequencing series, Life technologies' SOLiD SlJ sequencing system, Ion Torrent sequencing system and Roche's 454 sequencing system. at least one of.

32. The device according to claim 23, characterized in that the analysis device further comprises: a comparison unit, the comparison unit is adapted to compare the sequencing results with a reference sequence to obtain a unique comparison sequence; and

SNP information acquisition unit, the SNP information acquisition unit is connected to the comparison unit, and is adapted to use SNP analysis software to obtain SNP information in the predetermined region from the unique comparison sequence.

33. The device according to claim 32, wherein the comparison unit is adapted to use a BWA software package to perform the comparison.

34. The apparatus according to claim 32, wherein the analysis device further comprises: a unit adapted to remove PCR repeat extended sequences from the unique alignment sequence.

35. The device according to claim 32, characterized in that the SNP analysis software is at least one selected from SAMtools and GATK.

36. The device according to claim 32, wherein the analysis device further includes: a unit adapted to filter the obtained SNP information.

37. The device according to claim 36, characterized in that the filtering condition is to remove SNPs that meet one of the following conditions:

SNP sequencing depth is less than 10X, preferably less than 20X; and

38. A system for determining SNP information in a predetermined region of an embryo's chromosomes, characterized by comprising: a first whole genome acquisition device, the first whole genome acquisition device being adapted to acquire the whole genome of the embryo; and

SNP information determination device, the SNP information determination device is connected to the first whole genome acquisition device, and is used to determine SNP information in the predetermined region of the fetal chromosome, wherein the SNP information determination device is claimed in claims 23 to 37 any of the equipment described above.

39. The system according to claim 38, wherein the first whole genome acquisition device is adapted to obtain the whole genome of the embryo by performing whole genome amplification of embryonic cells.

40. The system according to claim 39, wherein the first whole genome acquisition device is adapted to obtain the whole genome of the embryo using at least one selected from the group consisting of PEP-PCR, DOP-PCR, OmniPlex WGA and MDA. Genome.

41. The system according to claim 38, further comprising:

A second whole genome acquisition device, the second whole genome acquisition device is suitable for acquiring the whole genome of an embryo genetically related individual, wherein the embryo genetically related individual includes the father, mother and proband of the embryo;

A differential SNP determination device, the differential determination device is adapted to determine a differential SNP based on the father's SNP information and the mother's SNP information;

a first haplotype determination device, the first haplotype determination device is adapted to determine the paternal SNP haplotype and the maternal SNP haplotype based on the discriminating SNP and the SNP information of the proband; and

a second haplotype determination device, the second haplotype determination device is adapted to determine the paternal SNP haplotype and the maternal SNP based on the SNP information of the embryo, the paternal SNP haplotype and the maternal SNP haplotype. A combination of haplotypes to obtain the SNP haplotype of the embryo.

42. The system according to claim 41, wherein the second haplotype determination device further comprises: a unit that determines the paternal haplotype significantly supported by the SNP information of the embryo as the paternal source haplotype of the embryo. ; as well as Identify embryonic SNP information that significantly supports the maternal haplotype as the embryo's maternally derived haplotype unit.

43. The system according to claim 42, wherein the number of discriminating SNPs being no less than 10 is an indication of significant support.