US20220375547A1 - Ancestry finder - Google Patents
Ancestry finder Download PDFInfo
- Publication number
- US20220375547A1 US20220375547A1 US17/880,566 US202217880566A US2022375547A1 US 20220375547 A1 US20220375547 A1 US 20220375547A1 US 202217880566 A US202217880566 A US 202217880566A US 2022375547 A1 US2022375547 A1 US 2022375547A1
- Authority
- US
- United States
- Prior art keywords
- user
- segment
- ibd
- ancestral
- chromosomal segment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 230000002759 chromosomal effect Effects 0.000 claims abstract description 39
- 238000000034 method Methods 0.000 claims description 59
- 210000000349 chromosome Anatomy 0.000 claims description 33
- 230000002068 genetic effect Effects 0.000 claims description 20
- 238000004590 computer program Methods 0.000 claims description 9
- 230000015654 memory Effects 0.000 claims description 3
- 230000008569 process Effects 0.000 description 47
- 108020004414 DNA Proteins 0.000 description 40
- 102000053602 DNA Human genes 0.000 description 40
- 108700028369 Alleles Proteins 0.000 description 22
- 238000010586 diagram Methods 0.000 description 14
- 210000001766 X chromosome Anatomy 0.000 description 7
- 201000010099 disease Diseases 0.000 description 5
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 5
- 108091092878 Microsatellite Proteins 0.000 description 4
- 108091028043 Nucleic acid sequence Proteins 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 108090000623 proteins and genes Proteins 0.000 description 4
- 238000005215 recombination Methods 0.000 description 4
- 230000006798 recombination Effects 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 102000054766 genetic haplotypes Human genes 0.000 description 3
- 238000003205 genotyping method Methods 0.000 description 3
- 230000035772 mutation Effects 0.000 description 3
- 108020005196 Mitochondrial DNA Proteins 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 238000003556 assay Methods 0.000 description 2
- 238000010923 batch production Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000008774 maternal effect Effects 0.000 description 2
- 230000008775 paternal effect Effects 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 210000002593 Y chromosome Anatomy 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 230000013011 mating Effects 0.000 description 1
- 210000003470 mitochondria Anatomy 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000002773 nucleotide Substances 0.000 description 1
- 125000003729 nucleotide group Chemical group 0.000 description 1
- 102000054765 polymorphisms of proteins Human genes 0.000 description 1
- 238000003908 quality control method Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000004557 technical material Substances 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 230000003442 weekly effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/30—Data warehousing; Computing architectures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
- G06N5/048—Fuzzy inferencing
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B10/00—ICT specially adapted for evolutionary bioinformatics, e.g. phylogenetic tree construction or analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/40—Population genetics; Linkage disequilibrium
Definitions
- Genealogy is the study of the history of families and the line of descent from ancestors. It is an interesting subject studied by many professionals as well as hobbyists. Traditional genealogical study techniques typically involve constructing family trees based on surnames and historical records. As gene sequencing technology becomes more accessible, there has been growing interest in genetic ancestry testing in recent years.
- Existing genetic ancestry testing techniques are typically based on deoxyribonucleic acid (DNA) information of the Y chromosome (Y-DNA) or DNA information of the mitochondria (mtDNA).
- DNA deoxyribonucleic acid
- Y-DNA Y chromosome
- mtDNA DNA information of the mitochondria
- the Y-DNA is passed down unchanged from father to son and therefore is useful for testing patrilineal ancestry of a man.
- the mtDNA is passed down mostly unchanged from mother to children and therefore is useful for testing a person's matrilineal ancestry.
- These techniques are found to be effective for identifying individuals that are related many generations ago (e.g., 10 generations or more), but are typically less effective for identifying closer relationships. Further, many relationships that are not strictly patrilineal or matrilineal cannot be easily detected by the existing techniques. In addition, improved techniques for inferring ancestry information for an individual would be desirable.
- FIG. 1 is a block diagram illustrating an embodiment of a relative finding system.
- FIG. 2 is a flowchart illustrating an embodiment of a process for finding relatives in a relative finding system.
- FIG. 3 is a flowchart illustrating an embodiment of a process for connecting a user with potential relatives found in the database.
- FIGS. 4A-4I are screenshots illustrating user interface examples in connection with process 300 .
- FIG. 5 is a diagram illustrating an embodiment of a process for determining the expected degree of relationship between two users.
- FIG. 6 is a diagram illustrating example DNA data used for IBD identification by process 500 .
- FIG. 7 is a diagram illustrating example simulated relationship distribution patterns for different population groups according to one embodiment.
- FIG. 8 is a diagram illustrating an embodiment of a highly parallel IBD identification process.
- FIG. 9 is a diagram illustrating an example in which phased data are compared to identify IBD.
- FIG. 10 is a block diagram illustrating an embodiment of an ancestry finder system.
- FIG. 11 is a flowchart illustrating an embodiment of a process for inferring a characteristic of an individual.
- FIG. 12 is a flowchart illustrating an embodiment of a process for determining that a first user and a second user share at least one IBD segment.
- FIG. 13 shows an interface example for a table view of an ancestry finder system.
- FIG. 14 shows an interface example for the discovery view of a relative finder system that incorporates an ancestry finder system.
- FIG. 15 shows an interface example for a karyotype view of an ancestry finder system.
- FIG. 16 shows an example of the effect of varying some of the settings in a karyotype view of an ancestry finder system.
- FIG. 17 shows an interface example of a karyotype view of an ancestry finder system in which “Number of grandparents from the same country” is 2.
- FIG. 18 shows an interface example for a geographical map view of an ancestry finder system.
- FIG. 19 is a flowchart illustrating an embodiment of a process for inferring a characteristic of an individual.
- the invention can be implemented in numerous ways, including as a process; an apparatus; a system; a composition of matter; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor.
- these implementations, or any other form that the invention may take, may be referred to as techniques.
- the order of the steps of disclosed processes may be altered within the scope of the invention.
- a component such as a processor or a memory described as being configured to perform a task may be implemented as a general component that is temporarily configured to perform the task at a given time or a specific component that is manufactured to perform the task.
- the term ‘processor’ refers to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions.
- locating IBD regions includes sequencing the entire genomes of the individuals and comparing the genome sequences. In some embodiments, locating IBD regions includes assaying a large number of markers that tend to vary in different individuals and comparing the markers. Examples of such markers include Single Nucleotide Polymorphisms (SNPs), which are points along the genome with two or more common variations; Short Tandem Repeats (STRs), which are repeated patterns of two or more repeated nucleotide sequences adjacent to each other; and Copy-Number Variants (CNVs), which include longer sequences of DNA that could be present in varying numbers in different individuals. Long stretches of DNA sequences from different individuals' genomes in which markers in the same locations are the same or at least compatible indicate that the rest of the sequences, although not assayed directly, are also likely identical.
- SNPs Single Nucleotide Polymorphisms
- STRs Short Tandem Repeats
- CNVs Copy-Number Variants
- FIG. 1 is a block diagram illustrating an embodiment of a relative finding system.
- relative finder system 102 may be implemented using one or more server computers having one or more processors, one or more special purpose computing appliances, or any other appropriate hardware, software, or combinations thereof. The operations of the relative finder system are described in greater detail below.
- various users of the system e.g., user 1 (“Alice”) and user 2 (“Bob”) access the relative finder system via a network 104 using client devices such as 106 and 108 .
- a database 110 which can be implemented on an integral storage component of the relative finder system, an attached storage device, a separate storage device accessible by the relative finder system, or a combination thereof. Many different arrangements of the physical components are possible in various embodiments.
- the entire genome sequences or assayed DNA markers are stored in the database to facilitate the relative finding process. For example, approximately 650,000 SNPs per individual's genome are assayed and stored in the database in some implementations.
- System 100 shown in this example includes genetic and other additional non-genetic information for many users.
- the relative finder system can identify users within the database that are relatives. Since more distant relationships (second cousins or further) are often unknown to the users themselves, the system allows the users to “opt-in” and receive notifications about the existence of relative relationships. Users are also presented with the option of connecting with their newly found relatives.
- FIG. 2 is a flowchart illustrating an embodiment of a process for finding relatives in a relative finding system.
- Process 200 may be implemented on a relative finder system such as 100 . The process may be invoked, for example, at a user's request to look for potential relatives this user may have in the database or by the system to assess the potential relationships among various users.
- recombining DNA information of a first user e.g., Alice
- a second user e.g., Bob
- the information is retrieved from a database that stores recombining DNA information of a plurality of users as well as any additional user information.
- SNP information is described extensively in this and following examples.
- Other DNA information such as STR information and/or CNV information may be used in other embodiments.
- a predicted degree of relationship between Alice and Bob is determined.
- a range of possible relationships between the users is determined and a prediction of the most likely relationship between the users is made.
- the threshold may be a user configurable value, a system default value, a value configured by the system's operator, or any other appropriate value.
- Bob may select five generations as the maximum threshold, which means he is interested in discovering relatives with whom the user shares a common ancestor five generations or closer.
- the system may set a default value minimum of three generations, allowing the users to by default find relatives sharing a common ancestor at least three generations out or beyond.
- the system, the user, or both have the option to set a minimum threshold (e.g., two generations) and a maximum threshold (e.g., six generations) so that the user would discover relatives within a maximum number of generations, but would not be surprised by the discovery of a close relative such as a sibling who was previously unknown to the user.
- a minimum threshold e.g., two generations
- a maximum threshold e.g., six generations
- Alice or Bob is notified about her/his relative relationship with the other user.
- the system actively notifies the users by sending messages or alerts about the relationship information when it becomes available.
- Other notification techniques are possible, for example by displaying a list or table of users that are found to be related to the user.
- the potential relatives may be shown anonymously for privacy protection, or shown with visible identities to facilitate making connections.
- a threshold is set, the user is only notified if the predicted degree of relationship at least meets the threshold.
- a user is only notified if both of the user and the potential relative have “opted in” to receive the notification.
- the user is notified about certain personal information of the potential relative, the predicted relationship, the possible range of relationships, the amount of DNA matching, or any other appropriate information.
- the process optionally infers additional relationships or refines estimates of existing relationships between the users based on other relative relationship information, such as the relative relationship information the users have with a third user.
- other relative relationship information such as the relative relationship information the users have with a third user.
- Alice and Bob are only estimated to be 6 th cousins after step 204 , if among Alice's relatives in the system, a third cousin, Cathy, is also a sibling of Bob's, then Alice and Bob are deemed to be third cousins because of their relative relationships to Cathy.
- the relative relationships with the third user may be determined based on genetic information and analysis using a process similar to 200 , based on non-genetic information such as family tree supplied by one of the users, or both.
- the relatives of the users in the system are optionally checked to infer additional relatives at 210 .
- additional relatives For example, if Bob is identified as a third cousin of Alice's, then Bob's relatives in the system (such as children, siblings, possibly some of the parents, aunts, uncles, cousins, etc.) are also deemed to be relatives of Alice's.
- a threshold is applied to limit the relationships within a certain range. Additional notifications about these relatives are optionally generated.
- FIG. 3 is a flowchart illustrating an embodiment of a process for connecting a user with potential relatives found in the database.
- the process may be implemented on a relative finder system such as 102 , a client system such as 106 , or a combination thereof.
- a relative finder system such as 102
- a client system such as 106
- process 300 follows 206 of process 200 , where a notification is sent to Alice, indicating that a potential relative has been identified.
- the identity of Bob is disclosed to Alice.
- the identity of Bob is not disclosed initially to protect Bob's privacy.
- an invitation from Alice to Bob inviting Bob to make a connection is generated.
- the invitation includes information about how Alice and Bob may be related and any personal information Alice wishes to share such as her own ancestry information.
- Bob can accept the invitation or decline.
- an acceptance or a declination is received. If a declination is received, no further action is required. In some embodiments, Alice is notified that a declination has been received. If, however, an acceptance is received, at 306 , a connection is made between Alice and Bob.
- the identities and any other sharable personal information e.g., genetic information, family history, phenotype/traits, etc.
- the connection information is updated in the database.
- FIGS. 4A-4I are screenshots illustrating user interface examples in connection with process 300 .
- the relative finder application provides two views to the user: the discovery view and the list view.
- FIG. 4A shows an interface example for the discovery view at the beginning of the process. No relative has been discovered at this point.
- a privacy feature is built into the relative finder application so that close relative information will only be displayed if both the user and the close relative have chosen to view close relatives. This is referred to as the “opt in” feature.
- the user is further presented with a selection button “show close relatives” to indicate that he/she is interested in finding out about close relatives.
- FIG. 4B shows a message that is displayed when the user selects “show close relatives”. The message explains to the user how a close relative is defined.
- a close relative is defined as a first cousin or closer. In other words, the system has set a default minimum threshold of three degrees.
- the message further explains that unless there is already an existing connection between the user and the close relative, any newly discovered potential close relatives will not appear in the results unless the potential close relatives have also chosen to view their close relatives.
- the message further warns about the possibility of finding out about close relatives the user did not know he/she had. The user has the option to proceed with viewing close relatives or cancel the selection.
- FIG. 4C shows the results in the discovery view.
- seven potential relatives are found in the database.
- the predicted relationship, the range of possible relationship, certain personal details a potential relative has made public, the amount of DNA a potential relative shares with the user, and the number of DNA segments the potential relative shares with the user are displayed.
- the user is presented with a “make contact” selection button for each potential relative.
- FIG. 4D shows the results in the list view.
- the potential relatives are sorted according to how close the corresponding predicted relationships are to the user in icon form.
- the user may select an icon that corresponds to a potential relative and view his/her personal information, the predicted relationship, relationship range, and other additional information. The user can also make contact with the potential relative.
- FIGS. 4E-4G show the user interface when the user selects to “make contact” with a potential relative.
- FIG. 4E shows the first step in making contact, where the user personalizes the introduction message and determine what information the user is willing to share with the potential relative.
- FIG. 4F shows an optional step in making contact, where the user is told about the cost of using the introduction service. In this case, the introduction is free.
- FIG. 4G shows the final step, where the introduction message is sent.
- FIG. 4H shows the user interface shown to the potential relative upon receiving the introduction message.
- the discovery view indicates that a certain user/potential relative has requested to make a contact.
- the predicted relationship, personal details of the sender, and DNA sharing information are shown to the recipient.
- the recipient has the option to select “view message” to view the introduction message from the sender.
- FIG. 4I shows the message as it is displayed to the recipient.
- the recipient is given the option to accept or decline the invitation to be in contact with the sender. If the recipient accepts the invitation, the recipient and the sender become connected and may view each other's information and/or interact with each other.
- At least some of the potential relatives are displayed in a family tree.
- the determination includes comparing the DNA markers (e.g., SNPs) of two users and identifying IBD regions.
- SNPs DNA markers
- the standard SNP based genotyping technology results in genotype calls each having two alleles, one from each half of a chromosome pair.
- a genotype call refers to the identification of the pair of alleles at a particular locus on the chromosome.
- Genotype calls can be phased or unphased. In phased data, the individual's diploid genotype at a particular locus is resolved into two haplotypes, one for each chromosome. In unphased data, the two alleles are unresolved; in other words, it is uncertain which allele corresponds to which haplotype or chromosome.
- the genotype call at a particular SNP location may be a heterozygous call with two different alleles or a homozygous call with two identical alleles.
- a heterozygous call is represented using two different letters such as AB that correspond to different alleles.
- Some SNPs are biallelic SNPs with only two possible states for SNPs. Some SNPs have more states, e.g. triallelic. Other representations are possible.
- A is selected to represent an allele with base A and B represents an allele with base G at the SNP location.
- a homozygous call is represented using a pair of identical letters such as AA or BB. The two alleles in a homozygous call are interchangeable because the same allele came from each parent.
- two individuals have opposite-homozygous calls at a given SNP location, or, in other words, one person has alleles AA and the other person has alleles BB, it is very likely that the region in which the SNP resides does not have IBD since different alleles came from different ancestors.
- both individuals have compatible calls, that is, both have the same homozygotes (i.e., both people have AA alleles or both have BB alleles), both have heterozygotes (i.e., both people have AB alleles), or one has a heterozygote and the other a homozygote (i.e., one has AB and the other has AA or BB), there is some chance that at least one allele is passed down from the same ancestor and therefore the region in which the SNP resides is IBD.
- FIG. 5 is a diagram illustrating an embodiment of a process for determining the predicted degree of relationship between two users.
- Process 500 may be implemented on a relative finder system such as 102 and is applicable to unphased data.
- consecutive opposite-homozygous calls in the users' SNPs are identified.
- the consecutive opposite-homozygous calls can be identified by serially comparing individual SNPs in the users' SNP sequences or in parallel using ffap operations as described below.
- the distance between consecutive opposite-homozygous calls is determined.
- IBD regions are identified based at least in part on the distance between the opposite-homozygous calls.
- the distance may be physical distance measured in the number of base pairs or genetic distance accounting for the rate of recombination. For example, in some embodiments, if the genetic distance between the locations of two consecutive opposite-homozygous calls is greater than a threshold of 10 centimorgans (cM), the region between the calls is determined to be an IBD region. This step may be repeated for all the opposite-homozygous calls. A tolerance for genotyping error can be built by allowing some low rate of opposite homozygotes when calculating an IBD segment. In some embodiments, the total number of matching genotype calls is also taken into account when deciding whether the region is IBD. For example, a region may be examined where the distance between consecutive opposite homozygous calls is just below the 10 cM threshold. If a large enough number of genotype calls within that interval match exactly, the interval is deemed IBD.
- cM centimorgans
- FIG. 6 is a diagram illustrating example DNA data used for IBD identification by process 500 .
- 602 and 604 correspond to the SNP sequences of Alice and Bob, respectively.
- the alleles of Alice and Bob are opposite-homozygotes, suggesting that the SNP at this location resides in a non-IBD region.
- the opposite-homozygotes suggest a non-IBD region.
- both pairs of alleles are heterozygotes, suggesting that there is potential for IBD.
- IBD there is potential for IBD at location 612 , where both pairs of alleles are identical homozygotes, and at location 614 , where Alice's pair of alleles is heterozygous and Bob's is homozygous. If there is no other opposite-homozygote between 606 and 608 and there are a large number of compatible calls between the two locations, it is then likely that the region between 606 and 608 is an IBD region.
- the number of shared IBD segments and the amount of DNA shared by the two users are computed based on the IBD.
- the longest IBD segment is also determined.
- the amount of DNA shared includes the sum of the lengths of IBD regions and/or percentage of DNA shared. The sum is referred to as IBD half or half IBD because the individuals share DNA identical by descent for at least one of the homologous chromosomes.
- the predicted relationship between the users, the range of possible relationships, or both, is determined using the IBD half and number of segments, based on the distribution pattern of IBD half and shared segments for different types of relationships.
- the individuals have IBD half that is 100% the total length of all the autosomal chromosomes and 22 shared autosomal chromosome segments; in a second degree grandparent/grandchild relationship, the individuals have IBD half that is approximately half the total length of all the autosomal chromosomes and many more shared segments; in each subsequent degree of relationship, the percentage of IBD half of the total length is about 50% of the previous degree. Also, for more distant relationships, in each subsequent degree of relationship, the number of shared segments is approximately half of the previous number.
- the effects of genotyping error are accounted for and corrected.
- certain genotyped SNPs are removed from consideration if there are a large number of Mendelian errors when comparing data from known parent/offspring trios.
- SNPs that have a high no-call rate or otherwise failed quality control measures during the assay process are removed.
- an occasional opposite-homozygote is allowed if there is sufficient opposite-homozygotes-free distance (e.g., at least 3 cM and 300 SNPs) surrounding the opposite-homozygote.
- the distribution patterns are determined empirically based on survey of real populations. Different population groups may exhibit different distribution patterns. For example, the level of homozygosity within endogamous populations is found to be higher than in populations receiving gene flow from other groups.
- the bounds of particular relationships are estimated using simulations of IBD using generated family trees. Based at least in part on the distribution patterns, the IBD half , and shared number of segments, the degree of relationship between two individuals can be estimated.
- FIG. 7 is a diagram illustrating example simulated relationship distribution patterns for different population groups according to one embodiment. In particular, Ashkenazi Jews and Europeans are two population groups surveyed.
- Simulations are conducted by specifying an extended pedigree and creating simulated genomes for the pedigree by simulating the mating of individuals drawn from a pool of empirical genomes. Pairs of individuals who appear to share IBD half that was not inherited through the specified simulated pedigree are marked as “unknown” in panels A-F. Thus, special distribution patterns can be used to find relatives of users who have indicated that they belong to certain distinctive population groups such as the Ashkenazi.
- the amount of IBD sharing is used in some embodiments to identify different population groups. For example, for a given degree of relationship, since Ashkenazi tend to have much more IBD sharing than non-Ashkenazi Europeans, users may be classified as either Ashkenazi or non-Ashkenazi Europeans based on the number and pattern of IBD matches.
- chromosomes are examined to determine the relationship. For example, X chromosome information is received in some embodiments in addition to the autosomal chromosomes.
- the X chromosomes of the users are also processed to identify IBD. Since one of the X chromosomes in a female user is passed on from her father without recombination, the female inherits one X chromosome from her maternal grandmother and another one from her mother. Thus, the X chromosome undergoes recombination at a slower rate compared to autosomal chromosomes and more distant relationships can be predicted using IBD found on the X chromosomes.
- analyses of mutations within IBD segments can be used to estimate ages of the IBD segments and refine estimates of relationships between users.
- the relationship determined is verified using non-DNA information.
- the relationship may be checked against the users' family tree information, birth records, or other user information.
- the efficiency of IBD region identification is improved by comparing a user's DNA information with the DNA information of multiple other users in parallel and using bitwise operations.
- FIG. 8 is a diagram illustrating an embodiment of a highly parallel IBD identification process. Alice's SNP calls are compared with those of multiple other users. Alice's SNP calls are pre-processed to identify ones that are homozygous. Alice's heterozygous calls are not further processed since they always indicate that there is possibility of IBD with another user. For each SNP call in Alice's genome that is homozygous, the zygosity states in the corresponding SNP calls in the other users are encoded.
- compatible calls e.g., heterozygous calls and same homozygous calls
- opposite-homozygous calls are encoded as 1.
- opposite-homozygous calls BB are encoded as 1 and compatible calls (AA and AB) are encoded as 0
- compatible calls EE and EF are encoded as 0, etc.
- the encoded representations are stored in arrays such as 818 , 820 , and 824 .
- the length of the array is the same as the word length of the processor to achieve greater processing efficiency.
- the array length is set to 64 and the zygosity of 64 users' SNP calls are encoded and stored in the array.
- a bitwise operation is performed on the encoded arrays to determine whether a section of DNA such as the section between locations 806 and 810 includes opposite-homozygous calls.
- a bitwise OR operation is performed to generate a result array 824 . Any user with no opposite-homozygous calls between beginning location 806 and ending location 816 results in an entry value of 0 in array 824 . The corresponding DNA segment, therefore, is deemed as an IBD region for such user and Alice.
- users with opposite-homozygotes result in corresponding entry values of 1 in array 824 and they are deemed not to share IBD with Alice in this region. In the example shown, user 1 shares IBD with Alice while other users do not.
- phased data is used instead of unphased data. These data can come directly from assays that produce phased data, or from statistical processing of unphased data. IBD regions are determined by matching the SNP sequences between users. In some embodiments, sequences of SNPs are stored in dictionaries using a hash-table data structure for the ease of comparison.
- FIG. 9 is a diagram illustrating an example in which phased data are compared to identify IBD. The sequences are split along pre-defined intervals into non-overlapping words. Other embodiments may use overlapping words. Although a preset length of 3 is used for purposes of illustration in the example shown, many implementations may use words of longer lengths (e.g. 100). Also, the length does not have to be the same for every location.
- chromosome 902 is represented by words AGT, CTG, CAA, . . . and chromosome 904 is represented by CGA, CAG, TCA, . . . .
- AGT chromosome
- CTG CAA
- CAA CAA
- chromosome 904 is represented by CGA, CAG, TCA, . . . .
- the words are stored in a hash table that includes information about a plurality of users to enable constant retrieval of which users carry matching haplotypes. Similar hash tables are constructed for other sequences starting at other locations.
- Bob's sequences are processed into words at the same locations as Alice's.
- Bob's chromosome 906 yields CAT, GAC, CCG, . . .
- chromosome 908 yields AAT, CTG, CAA, . . . . Every word from Bob's chromosomes is then looked up in the corresponding hash table to check whether any other users have the same word at that location in their genomes.
- the second and third words of chromosome 908 match second and third words of Alice's chromosome 902 . This indicates that SNP sequence CTGCAA is present in both chromosomes and suggests the possibility of IBD sharing. If enough matching words are present in close proximity to each other, the region would be deemed IBD.
- relative relationships found using the techniques described above are used to infer characteristics about the users that are related to each other.
- the inferred characteristic is based on non-genetic information pertaining to the related users. For example, if a user is found to have a number of relatives that belong to a particular population group, then an inference is made that the user may also belong to the same population group.
- genetic information is used to infer characteristics, in particular characteristics specific to shared IBD segments of the related users. Assume, for example, that Alice has sequenced her entire genome but her relatives in the system have only genotyped SNP data. If Alice's genome sequence indicates that she may have inherited a disease gene, then, with Alice's permission, Alice's relatives who have shared IBD with Alice in the same region that includes the disease gene may be notified that they are at risk for the same disease.
- FIG. 10 is a block diagram illustrating an embodiment of an ancestry finder system.
- ancestry finder system 102 may be implemented using one or more server computers having one or more processors, one or more special purpose computing appliances, or any other appropriate hardware, software, or combinations thereof. The operations of the ancestry finder system are described in greater detail below.
- various users of the system e.g., user 1 (“Alice”) and user 2 (“Bob”) access the ancestry finder system via a network 104 using client devices such as 106 and 108 .
- a database 110 which can be implemented on an integral storage component of the ancestry finder system, an attached storage device, a separate storage device accessible by the ancestry finder system, or a combination thereof. Many different arrangements of the physical components are possible in various embodiments.
- the entire genome sequences or assayed DNA markers are stored in the database to facilitate the relative finding process. For example, approximately 650,000 SNPs per individual's genome are assayed and stored in the database in some implementations.
- System 1000 shown in this example includes genetic and other additional non-genetic information for many users. By comparing the recombining DNA information to identify shared IBD regions between various users, the ancestry finder system can infer ancestry information or other characteristics of a user.
- database 110 includes results of a survey of users.
- the survey may be an ancestry survey that requests information such as the place (e.g., country and city) of birth, race, and ethnicity of the user.
- the survey also requests information regarding relatives of the user, such as the place of birth and race/ethnicity of each of the user's parents and grandparents (if known to the user).
- Alice may provide her place of birth and ethnicity, as well as the places of birth and ethnicities of each of her parents and grandparents. Such information may be useful for determining information about the ancestry of other users who are related to Alice and/or have a common IBD segment with Alice, as further described below.
- Alice may also provide ancestry information for her direct maternal line (mother's mother's mother's . . . mother) as far back as possible.
- Alice's brother (or any male) might provide ancestry information for his direct paternal line ( father's father's father's . . . father) as far back as possible.
- the user survey results are stored separately from database 110 , such as in a separate database.
- relative finder system 102 and ancestry finder system 1002 are part of the same system and may share data.
- FIG. 11 is a flowchart illustrating an embodiment of a process for inferring a characteristic of an individual.
- Process 1100 may be implemented on an ancestry finder system such as 1100 .
- this process infers that the user has ancestry associated with that population group.
- the process may be invoked, for example, at a user's request to look for ancestry information and/or potential relatives this user may have in the database.
- One or more steps of process 1100 may be part of a batch process used to obtain ancestry information (or information used to infer ancestry information) for a plurality of users.
- an indication that a first user (e.g., Alice) and a second user (e.g., Bob) have at least one shared IBD segment is received.
- SNP information is described extensively in this and following examples.
- Other DNA information such as STR information, CNV information, exomic sequence information, and/or full sequence information may be used in other embodiments.
- the second user is an individual from a reference database, and not necessarily a user of the ancestry finder system.
- information about the second user is obtained.
- the information about the second user comprises one or more characteristics of one or more relatives of the second user.
- at least one characteristic is ancestry information.
- the information about the second user could comprise ancestry information about the four grandparents of the second user, such as the birthplaces of the four grandparents. Such information could, for example, have been provided by the second user when filling out a survey of characteristics of the user and the user's relatives, such as an ancestry survey, an example of which was previously described. In other embodiments, such information could be provided by a reference database.
- the information about the second user is obtained from a database, such as database 110 .
- a characteristic of the first user is inferred based at least in part on the information about the second user. If the first user and second user share at least one common IBD segment, information about the first user could be inferred from known information about the second user. For example, if it is known that the second user's four grandparents were all born in Ireland, than it can be inferred that the first user has at least some Irish ancestry, or at least that there is someone of Irish ancestry associated with that segment value. Characteristics can include ancestry information. Other examples of characteristics that can be inferred besides ancestry information include any other inherited characteristic, such as information associated with diseases, traits, or any other form of phenotypic information.
- the inferred characteristic of the first user is presented in a user interface.
- user interfaces for presenting, displaying, or notifying of the inferred characteristic are possible.
- Some examples of user interfaces include a table view, an (extended) discovery view, a karyotype view, a geographical map view, as more fully described below.
- a discovery view may display a list or table of users that are found to be related to the user based on the relative finder system.
- ancestry information is displayed next to each relative. For example, next to each relative, there may be a column for each grandparent of that relative. An indication of ancestry information for each grandparent may be displayed in each column, as more fully illustrated below.
- user interfaces include a family tree, which displays a family tree filled with inferred ancestry information about each relative.
- the system actively notifies the users by sending messages or alerts about ancestry information when it becomes available or at a given interval.
- the information is processed to infer ancestry (or other characteristic information). For example, the information may be processed using a majority rules technique. For example, if the ancestry associated with the majority of the matching users is German, then it is inferred that that segment is associated with German ancestry. In some embodiments, for each segment and each possible value of each segment, the information is processed (e.g., using majority rules), and the processed result is pre-computed and stored for future reference by any of the matching users.
- 1102 is part of a batch process in which a plurality of users (e.g., from a database) are preprocessed to determine shared IBD segments.
- the results comprise shared IBD segments for a plurality of users. For example, for a particular user, there may be 278 matches to various segments of the user's chromosomes.
- these results are cross-referenced with a database of information (e.g., ancestry information) about those matching users. For example, of the 278 matches, 49 of those users have completed an ancestry information survey that can be used to infer ancestry information about the particular user.
- FIG. 12 is a flowchart illustrating an embodiment of a process for determining that a first user and a second user share at least one IBD segment.
- Process 1200 may be implemented on an ancestry finder system such as 1100 . The process may be performed, for example, at 1102 of process 1100 .
- recombining DNA information of a first user and recombining DNA information of a second user are received.
- a shared IBD segment between the first user and the second user is determined based at least in part on the recombining DNA information of the first user and recombining DNA information of the second user. Any appropriate technique may be used to determine the shared IBD segment.
- one or more steps of process 500 are used to determine the shared IBD segment.
- one or more steps of the process described with respect to FIG. 6, 8 , or 9 is used to determine the shared IBD segment.
- FIG. 13 shows an interface example for a table view of an ancestry finder system.
- this interface is used to present inferred ancestry information for a user at 1108 .
- a table is displayed or presented that illustrates the results a typical user of European ancestry might receive.
- Each row in the table corresponds to a chromosomal segment that the user shares by IBD with another individual (i.e., a relative finder hit).
- Three of the segments are to (one or more) individuals of Irish ancestry, and one is to an individual of German ancestry.
- This table shows a relatively low number of ancestry finder hits.
- a table of the top ten countries associated with the user's ancestral origin may be presented or displayed. These top ten countries are determined based on the total number of non-overlapping DNA segments attributable to each country.
- a pie chart may be used to show the breakdown among countries.
- FIG. 14 shows an interface example for the discovery view of a relative finder system that incorporates an ancestry finder system.
- this interface is used to present inferred ancestry information for a user at 1108 .
- the user interface of FIG. 4C is shown in addition to four new columns: GP1-GP4, representing Grandparents 1-4.
- ancestry information is shown in each of the columns for that relative.
- the ancestry information is represented by codes that indicate birth countries of the grandparents. DE is Germany, UK is the United Kingdom, IE is Ireland, US is the United States, and JP is Japan.
- the first row shows a 4 th cousin with four grandparents who are all of German ancestry.
- ancestry information could be displayed, such as race/ethnicity (e.g., NA for Native American).
- various other ancestry information could be provided, such as information about the relative's parents or siblings.
- various other information about characteristics of the relative could be provided, including disease, trait, or any other form of phenotypic information
- FIG. 15 shows an interface example for a karyotype view of an ancestry finder system.
- this interface is used to present inferred ancestry information for a user “Paul Pierce” at 1108 .
- the user interface shows a graphic of a user's chromosome on the left hand side.
- various segments are colored based on with which ancestry (France, Austria, United States, etc.) that segment is associated.
- a segment is associated with an ancestry based on an IBD match with an individual of known ancestry. For example, if a user's segment has an IBD match with another individual, and the birth country of all four grandparents of that individual is provided or known, then it can be inferred that the user's segment is associated with that birth country. In other words, that shared IBD segment was associated with that part of the world (prior to the era of intercontinental travel).
- Minimum segment size is the minimum length of a matching IBD segment, in this example, 5 cM. Minimum segment size governs the minimum length of displayed ancestry finder hits. It's given in centiMorgans (cM) in the example shown, but can also be expressed in terms of base pairs, kilobases, megabases, or any other appropriate unit. In this example, most hits are short, so increasing the minimum quickly reduces the number of segments displayed. These longer hits, although fewer, are increasingly likely to indicate recent shared ancestry, and therefore to represent genuine/applicable ancestry finder results.
- the great majority of IBD hits, and thus ancestry finder segments, are between 5 cM and 10 cM in length.
- the lower end is controlled by a few threshold parameters in the IBD matching techniqueand can take any appropriate value.
- the lower edge might be different for the IBD hits on live.
- One range might be 5 cm to 20 cM (20 cM is the approximate length of chromosome 22).
- a pull down menu is shown, in various embodiments, a slider or other element could be used, with a default initial value.
- “Number of grandparents from the same country” indicates the minimum number of grandparents from the same country in order for an association to be made, as will be more fully described below. In this example, ancestry associations for a segment are only made if the birth countries of all four grandparents are the same. “Display type” indicates that this is a karyotype display.
- “Show North American origin?” indicates whether to include the United States and Canada as ancestry finder matches. For example, a user may desire to exclude North American origin matches because they may not be as informative since most individuals born in North America have ancestry in other parts of the world. In some cases, if this option is unchecked, then relatively fewer number of hits will result because there are many users whose four grandparents are from the United States or Canada. This is a toggle in the example, but other user interface elements may be used.
- a dialog box such as the one shown may be used for a user to provide feedback in this view or any other view (such as the discovery view).
- the dialog box may open when a cursor hovers over one of the segments.
- the shown dialog box asks the user whether the indicated ancestry information could be right.
- the user can indicate “Yes” or “No” and provide a reason.
- this feedback is used by the ancestry finder system to improve future inferences. For example, a user may see one segment that is colored brown to indicate Japan. If the user knows that he does not have any Japanese ancestry, he may select “No” in the dialog box.
- other user configurable settings may be included, such as an option to Show Public Reference Individuals and/or Show Relative Finder Users. There may be hits to customers via the ancestry survey, and to anonymous reference individuals of known ancestry from public genotype databases. The customer may wish to toggle visibility of the public individuals/Relative Finder. In some embodiments, this could rotate through three states: Both->Relative Finder Users Only->Public Only.
- An option to “Notify Me of Changes to My Ancestry Finder Results” may be provided.
- Ancestry finder results will change/improve with time. This would allow a user to indicate whether and how the user wishes to be notified of these changes. For example, the user could request to be notified as soon as they happen (up to daily), weekly in a batch, monthly in a batch. The user could indicate which results the user wants to be notified of (e.g., four grandparents, three grandparents, and/or Non-North American).
- FIG. 16 shows an example of the effect of varying some of the settings in a karyotype view of an ancestry finder system, such as that shown in FIG. 15 .
- “Number of Grandparents from Same Country” is the minimum number of co-located grandparents for an ancestry finder match to be made. This governs how many grandparents of the individual matching a user must be associated with the same country/ethnicity. For example, if four, then only segments for which all four grandparents come from the same country will be shown.
- the table shows the overall distribution of ancestry survey responses across an example database of users. The possible values are integers from four to one, inclusive.
- FIG. 17 shows an interface example of a karyotype view of an ancestry finder system in which “Number of grandparents from the same country” is 2.
- the interface of FIG. 15 is shown for a user “Adrian Lee” who has selected 2 as the “Number of grandparents from the same country.” This means that for each shared IBD segment shown (colored), the user had shared IBD with someone who has at least two grandparents from the same country. As shown, some of the segments show two or more colors (split horizontally) to indicate that that segment is IBD matched with someone whose grandparents could be from as many as three different countries. In contrast, FIG. 15 shows only one color per segment because all four grandparents must be from the same country. In various embodiments, a variety of graphical elements or visual cues may be used to depict such inferences about characteristics (such as ancestry information) that have been made based on one or more shared IBD segments.
- FIG. 18 shows an interface example for a geographical map view of an ancestry finder system.
- the left hand side shows a map of some region of the world.
- the dots are latitude/longitude coordinates inferred from text typed in by users describing their specific places of birth, and/or those of their ancestors.
- the dots are the birthplaces of US-born grandparents of other users who have an IBD match with user Paul Pierce. The match is based on a minimum segment size of 5 cM, minimum number of grandparents from the same country of 4, and North American origin. For example, the map could be of North America.
- FIG. 19 is a flowchart illustrating an embodiment of a process for inferring a characteristic of an individual from the ancestry or other phenotypic information of a second individual, even if the two individuals are not related by DNA.
- Process 1900 may be implemented on an ancestry finder system such as 1100 .
- the process may be invoked, for example, at a user's request to look for ancestry information. In some embodiments, this process is performed between 1106 and 1108 of process 1100 , for cases in which a third user is requesting ancestry information. In this case, it is known that the third user is related to the first user, but the third user does not share an IBD segment with the second user. In some cases, the third user does not share an IBD segment with the first user either.
- the third user may be a sibling of the first user, but the third user does not have an IBD match with any other user in the database. However, the first user has an IBD match with the second user. Because it is known that the first user and the third user are siblings, and that the first user and second user have shared IBD for at least one segment, then, in some embodiments, ancestry information for the third user can be inferred from the ancestry information about the second user. In some cases, this means that it need not be determined whether the third user has any shared IBD segments with the second user, and this ancestry information can be inferred as soon as it is known that the third user is a sibling of the first user.
- the first user and the third user are full siblings
- the second user is the father of the first user
- the first user has a segment that is a shared IBD segment with his father (the second user), but the third user does not share that IBD segment with his father (the second user).
- both of the father's parents are from Germany, then it can be inferred that the third user has at least two grandparents, his paternal grandparents, from Germany.
- the relationship between the first user and second user is not known, perhaps because the second user is a distant cousin of the first user instead of the father, then it can at least be inferred that the third user has German ancestry, because the third user's lineages are identical to those of the first user's.
- an indication that a third user has a known degree of relationship to the first user is received.
- the third user does not necessarily share an IBD segment with either the first user or the second user.
- the known degree of relationship between the third user and the first user is “sibling.”
- a characteristic of the third user is inferred based at least in part on the inferred characteristic of the first user.
- the third user has German ancestry, or if it is known that the second user is the father of the first user, then it is inferred that the third user has at least two grandparents from Germany.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Bioethics (AREA)
- Analytical Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Chemical & Material Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Animal Behavior & Ethology (AREA)
- Physiology (AREA)
- Fuzzy Systems (AREA)
- Automation & Control Theory (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Genetics & Genomics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Epidemiology (AREA)
- Public Health (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- User Interface Of Digital Computer (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Inferring a characteristic of an individual is disclosed. An indication that a first user and a second user have at least one shared chromosomal segment is received. Information about the second user is obtained. A characteristic of the first user is inferred based at least in part on the information about the second user.
Description
- An Application Data Sheet is filed concurrently with this specification as part of the present application. Each application that the present application claims benefit of or priority to as identified in the concurrently filed Application Data Sheet is incorporated by reference herein in its entirety and for all purposes.
- Genealogy is the study of the history of families and the line of descent from ancestors. It is an interesting subject studied by many professionals as well as hobbyists. Traditional genealogical study techniques typically involve constructing family trees based on surnames and historical records. As gene sequencing technology becomes more accessible, there has been growing interest in genetic ancestry testing in recent years.
- Existing genetic ancestry testing techniques are typically based on deoxyribonucleic acid (DNA) information of the Y chromosome (Y-DNA) or DNA information of the mitochondria (mtDNA). Aside from a small amount of mutation, the Y-DNA is passed down unchanged from father to son and therefore is useful for testing patrilineal ancestry of a man. The mtDNA is passed down mostly unchanged from mother to children and therefore is useful for testing a person's matrilineal ancestry. These techniques are found to be effective for identifying individuals that are related many generations ago (e.g., 10 generations or more), but are typically less effective for identifying closer relationships. Further, many relationships that are not strictly patrilineal or matrilineal cannot be easily detected by the existing techniques. In addition, improved techniques for inferring ancestry information for an individual would be desirable.
- Various embodiments of the invention are disclosed in the following detailed description and the accompanying drawings.
- The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Patent and Trademark Office upon request and payment of the necessary fee.
-
FIG. 1 is a block diagram illustrating an embodiment of a relative finding system. -
FIG. 2 is a flowchart illustrating an embodiment of a process for finding relatives in a relative finding system. -
FIG. 3 is a flowchart illustrating an embodiment of a process for connecting a user with potential relatives found in the database. -
FIGS. 4A-4I are screenshots illustrating user interface examples in connection withprocess 300. -
FIG. 5 is a diagram illustrating an embodiment of a process for determining the expected degree of relationship between two users. -
FIG. 6 is a diagram illustrating example DNA data used for IBD identification byprocess 500. -
FIG. 7 is a diagram illustrating example simulated relationship distribution patterns for different population groups according to one embodiment. -
FIG. 8 is a diagram illustrating an embodiment of a highly parallel IBD identification process. -
FIG. 9 is a diagram illustrating an example in which phased data are compared to identify IBD. -
FIG. 10 is a block diagram illustrating an embodiment of an ancestry finder system. -
FIG. 11 is a flowchart illustrating an embodiment of a process for inferring a characteristic of an individual. -
FIG. 12 is a flowchart illustrating an embodiment of a process for determining that a first user and a second user share at least one IBD segment. -
FIG. 13 shows an interface example for a table view of an ancestry finder system. -
FIG. 14 shows an interface example for the discovery view of a relative finder system that incorporates an ancestry finder system. -
FIG. 15 shows an interface example for a karyotype view of an ancestry finder system. -
FIG. 16 shows an example of the effect of varying some of the settings in a karyotype view of an ancestry finder system. -
FIG. 17 shows an interface example of a karyotype view of an ancestry finder system in which “Number of grandparents from the same country” is 2. -
FIG. 18 shows an interface example for a geographical map view of an ancestry finder system. -
FIG. 19 is a flowchart illustrating an embodiment of a process for inferring a characteristic of an individual. - The invention can be implemented in numerous ways, including as a process; an apparatus; a system; a composition of matter; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention. Unless stated otherwise, a component such as a processor or a memory described as being configured to perform a task may be implemented as a general component that is temporarily configured to perform the task at a given time or a specific component that is manufactured to perform the task. As used herein, the term ‘processor’ refers to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions.
- A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. Numerous specific details are set forth in the following description in order to provide a thorough understanding of the invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the invention is not unnecessarily obscured.
- Because of recombination and independent assortment of chromosomes, the autosomal DNA and X chromosome DNA (collectively referred to as recombining DNA) from the parents is shuffled at the next generation, with small amounts of mutation. Thus, only relatives will share long stretches of genome regions where their recombining DNA is completely or nearly identical. Such regions are referred to as “Identical by Descent” (IBD) regions because they arose from the same DNA sequences in an earlier generation. The relative finder technique described below is based at least in part on locating IBD regions in the recombining chromosomes of individuals.
- In some embodiments, locating IBD regions includes sequencing the entire genomes of the individuals and comparing the genome sequences. In some embodiments, locating IBD regions includes assaying a large number of markers that tend to vary in different individuals and comparing the markers. Examples of such markers include Single Nucleotide Polymorphisms (SNPs), which are points along the genome with two or more common variations; Short Tandem Repeats (STRs), which are repeated patterns of two or more repeated nucleotide sequences adjacent to each other; and Copy-Number Variants (CNVs), which include longer sequences of DNA that could be present in varying numbers in different individuals. Long stretches of DNA sequences from different individuals' genomes in which markers in the same locations are the same or at least compatible indicate that the rest of the sequences, although not assayed directly, are also likely identical.
-
FIG. 1 is a block diagram illustrating an embodiment of a relative finding system. In this example,relative finder system 102 may be implemented using one or more server computers having one or more processors, one or more special purpose computing appliances, or any other appropriate hardware, software, or combinations thereof. The operations of the relative finder system are described in greater detail below. In this example, various users of the system (e.g., user 1 (“Alice”) and user 2 (“Bob”)) access the relative finder system via anetwork 104 using client devices such as 106 and 108. User information (including genetic information and optionally other personal information such as family information, population group, etc.) pertaining to the users is stored in adatabase 110, which can be implemented on an integral storage component of the relative finder system, an attached storage device, a separate storage device accessible by the relative finder system, or a combination thereof. Many different arrangements of the physical components are possible in various embodiments. In various embodiments, the entire genome sequences or assayed DNA markers (SNPs, STRs, CNVs, etc.) are stored in the database to facilitate the relative finding process. For example, approximately 650,000 SNPs per individual's genome are assayed and stored in the database in some implementations. -
System 100 shown in this example includes genetic and other additional non-genetic information for many users. By comparing the recombining DNA information to identify IBD regions between various users, the relative finder system can identify users within the database that are relatives. Since more distant relationships (second cousins or further) are often unknown to the users themselves, the system allows the users to “opt-in” and receive notifications about the existence of relative relationships. Users are also presented with the option of connecting with their newly found relatives. -
FIG. 2 is a flowchart illustrating an embodiment of a process for finding relatives in a relative finding system.Process 200 may be implemented on a relative finder system such as 100. The process may be invoked, for example, at a user's request to look for potential relatives this user may have in the database or by the system to assess the potential relationships among various users. At 202, recombining DNA information of a first user (e.g., Alice) and of a second user (e.g., Bob) is received. In some embodiments, the information is retrieved from a database that stores recombining DNA information of a plurality of users as well as any additional user information. For purposes of illustration, SNP information is described extensively in this and following examples. Other DNA information such as STR information and/or CNV information may be used in other embodiments. - At 204, a predicted degree of relationship between Alice and Bob is determined. In some embodiments, a range of possible relationships between the users is determined and a prediction of the most likely relationship between the users is made. In some embodiments, it is optionally determined whether the predicted degree of relationship at least meets a threshold. The threshold may be a user configurable value, a system default value, a value configured by the system's operator, or any other appropriate value. For example, Bob may select five generations as the maximum threshold, which means he is interested in discovering relatives with whom the user shares a common ancestor five generations or closer. Alternatively, the system may set a default value minimum of three generations, allowing the users to by default find relatives sharing a common ancestor at least three generations out or beyond. In some embodiments, the system, the user, or both, have the option to set a minimum threshold (e.g., two generations) and a maximum threshold (e.g., six generations) so that the user would discover relatives within a maximum number of generations, but would not be surprised by the discovery of a close relative such as a sibling who was previously unknown to the user.
- At 206, Alice or Bob (or both) is notified about her/his relative relationship with the other user. In some embodiments, the system actively notifies the users by sending messages or alerts about the relationship information when it becomes available. Other notification techniques are possible, for example by displaying a list or table of users that are found to be related to the user. Depending on system settings, the potential relatives may be shown anonymously for privacy protection, or shown with visible identities to facilitate making connections. In embodiments where a threshold is set, the user is only notified if the predicted degree of relationship at least meets the threshold. In some embodiments, a user is only notified if both of the user and the potential relative have “opted in” to receive the notification. In various embodiments, the user is notified about certain personal information of the potential relative, the predicted relationship, the possible range of relationships, the amount of DNA matching, or any other appropriate information.
- In some embodiments, at 208, the process optionally infers additional relationships or refines estimates of existing relationships between the users based on other relative relationship information, such as the relative relationship information the users have with a third user. For example, although Alice and Bob are only estimated to be 6th cousins after
step 204, if among Alice's relatives in the system, a third cousin, Cathy, is also a sibling of Bob's, then Alice and Bob are deemed to be third cousins because of their relative relationships to Cathy. The relative relationships with the third user may be determined based on genetic information and analysis using a process similar to 200, based on non-genetic information such as family tree supplied by one of the users, or both. - In some embodiments, the relatives of the users in the system are optionally checked to infer additional relatives at 210. For example, if Bob is identified as a third cousin of Alice's, then Bob's relatives in the system (such as children, siblings, possibly some of the parents, aunts, uncles, cousins, etc.) are also deemed to be relatives of Alice's. In some embodiments a threshold is applied to limit the relationships within a certain range. Additional notifications about these relatives are optionally generated.
- Upon receiving a notification about another user who is a potential relative, the notified user is allowed to make certain choices about how to interact with the potential relative.
FIG. 3 is a flowchart illustrating an embodiment of a process for connecting a user with potential relatives found in the database. The process may be implemented on a relative finder system such as 102, a client system such as 106, or a combination thereof. In this example, it is assumed that it has been determined that Alice and Bob are possibly 4th cousins and that Alice has indicated that she would like to be notified about any potential relatives within 6 generations. In this example,process 300 follows 206 ofprocess 200, where a notification is sent to Alice, indicating that a potential relative has been identified. In some embodiments, the identity of Bob is disclosed to Alice. In some embodiments, the identity of Bob is not disclosed initially to protect Bob's privacy. - Upon receiving the notification, Alice decides that she would like to make a connection with the newly found relative. At 302, an invitation from Alice to Bob inviting Bob to make a connection is generated. In various embodiments, the invitation includes information about how Alice and Bob may be related and any personal information Alice wishes to share such as her own ancestry information. Upon receiving the invitation, Bob can accept the invitation or decline. At 304, an acceptance or a declination is received. If a declination is received, no further action is required. In some embodiments, Alice is notified that a declination has been received. If, however, an acceptance is received, at 306, a connection is made between Alice and Bob. In various embodiments, once a connection is made, the identities and any other sharable personal information (e.g., genetic information, family history, phenotype/traits, etc.) of Alice and Bob are revealed to each other and they may interact with each other. In some embodiments, the connection information is updated in the database.
- In some embodiments, a user can discover many potential relatives in the database at once. Additional potential relatives are added as more users join the system and make their genetic information available for the relative finding process.
FIGS. 4A-4I are screenshots illustrating user interface examples in connection withprocess 300. In this example, the relative finder application provides two views to the user: the discovery view and the list view. -
FIG. 4A shows an interface example for the discovery view at the beginning of the process. No relative has been discovered at this point. In this example, a privacy feature is built into the relative finder application so that close relative information will only be displayed if both the user and the close relative have chosen to view close relatives. This is referred to as the “opt in” feature. The user is further presented with a selection button “show close relatives” to indicate that he/she is interested in finding out about close relatives.FIG. 4B shows a message that is displayed when the user selects “show close relatives”. The message explains to the user how a close relative is defined. In this case, a close relative is defined as a first cousin or closer. In other words, the system has set a default minimum threshold of three degrees. The message further explains that unless there is already an existing connection between the user and the close relative, any newly discovered potential close relatives will not appear in the results unless the potential close relatives have also chosen to view their close relatives. The message further warns about the possibility of finding out about close relatives the user did not know he/she had. The user has the option to proceed with viewing close relatives or cancel the selection. -
FIG. 4C shows the results in the discovery view. In this example, seven potential relatives are found in the database. The predicted relationship, the range of possible relationship, certain personal details a potential relative has made public, the amount of DNA a potential relative shares with the user, and the number of DNA segments the potential relative shares with the user are displayed. The user is presented with a “make contact” selection button for each potential relative. -
FIG. 4D shows the results in the list view. The potential relatives are sorted according to how close the corresponding predicted relationships are to the user in icon form. The user may select an icon that corresponds to a potential relative and view his/her personal information, the predicted relationship, relationship range, and other additional information. The user can also make contact with the potential relative. -
FIGS. 4E-4G show the user interface when the user selects to “make contact” with a potential relative.FIG. 4E shows the first step in making contact, where the user personalizes the introduction message and determine what information the user is willing to share with the potential relative.FIG. 4F shows an optional step in making contact, where the user is told about the cost of using the introduction service. In this case, the introduction is free.FIG. 4G shows the final step, where the introduction message is sent. -
FIG. 4H shows the user interface shown to the potential relative upon receiving the introduction message. In this example, the discovery view indicates that a certain user/potential relative has requested to make a contact. The predicted relationship, personal details of the sender, and DNA sharing information are shown to the recipient. The recipient has the option to select “view message” to view the introduction message from the sender. -
FIG. 4I shows the message as it is displayed to the recipient. In addition to the content of the message, the recipient is given the option to accept or decline the invitation to be in contact with the sender. If the recipient accepts the invitation, the recipient and the sender become connected and may view each other's information and/or interact with each other. - Many other user interfaces can be used in addition to or as alternatives of the ones shown above. For example, in some embodiments, at least some of the potential relatives are displayed in a family tree.
- Determining the relationship between two users in the database is now described. In some embodiments, the determination includes comparing the DNA markers (e.g., SNPs) of two users and identifying IBD regions. The standard SNP based genotyping technology results in genotype calls each having two alleles, one from each half of a chromosome pair. As used herein, a genotype call refers to the identification of the pair of alleles at a particular locus on the chromosome. Genotype calls can be phased or unphased. In phased data, the individual's diploid genotype at a particular locus is resolved into two haplotypes, one for each chromosome. In unphased data, the two alleles are unresolved; in other words, it is uncertain which allele corresponds to which haplotype or chromosome.
- The genotype call at a particular SNP location may be a heterozygous call with two different alleles or a homozygous call with two identical alleles. A heterozygous call is represented using two different letters such as AB that correspond to different alleles. Some SNPs are biallelic SNPs with only two possible states for SNPs. Some SNPs have more states, e.g. triallelic. Other representations are possible.
- In this example, A is selected to represent an allele with base A and B represents an allele with base G at the SNP location. Other representations are possible. A homozygous call is represented using a pair of identical letters such as AA or BB. The two alleles in a homozygous call are interchangeable because the same allele came from each parent. When two individuals have opposite-homozygous calls at a given SNP location, or, in other words, one person has alleles AA and the other person has alleles BB, it is very likely that the region in which the SNP resides does not have IBD since different alleles came from different ancestors. If, however, the two individuals have compatible calls, that is, both have the same homozygotes (i.e., both people have AA alleles or both have BB alleles), both have heterozygotes (i.e., both people have AB alleles), or one has a heterozygote and the other a homozygote (i.e., one has AB and the other has AA or BB), there is some chance that at least one allele is passed down from the same ancestor and therefore the region in which the SNP resides is IBD. Further, based on statistical computations, if a region has a very low rate of opposite-homozygote occurrence over a substantial distance, it is likely that the individuals inherited the DNA sequence in the region from the same ancestor and the region is therefore deemed to be an IBD region.
-
FIG. 5 is a diagram illustrating an embodiment of a process for determining the predicted degree of relationship between two users.Process 500 may be implemented on a relative finder system such as 102 and is applicable to unphased data. At 502, consecutive opposite-homozygous calls in the users' SNPs are identified. The consecutive opposite-homozygous calls can be identified by serially comparing individual SNPs in the users' SNP sequences or in parallel using ffap operations as described below. At 504, the distance between consecutive opposite-homozygous calls is determined. At 506, IBD regions are identified based at least in part on the distance between the opposite-homozygous calls. The distance may be physical distance measured in the number of base pairs or genetic distance accounting for the rate of recombination. For example, in some embodiments, if the genetic distance between the locations of two consecutive opposite-homozygous calls is greater than a threshold of 10 centimorgans (cM), the region between the calls is determined to be an IBD region. This step may be repeated for all the opposite-homozygous calls. A tolerance for genotyping error can be built by allowing some low rate of opposite homozygotes when calculating an IBD segment. In some embodiments, the total number of matching genotype calls is also taken into account when deciding whether the region is IBD. For example, a region may be examined where the distance between consecutive opposite homozygous calls is just below the 10 cM threshold. If a large enough number of genotype calls within that interval match exactly, the interval is deemed IBD. -
FIG. 6 is a diagram illustrating example DNA data used for IBD identification byprocess 500. 602 and 604 correspond to the SNP sequences of Alice and Bob, respectively. Atlocation 606, the alleles of Alice and Bob are opposite-homozygotes, suggesting that the SNP at this location resides in a non-IBD region. Similarly, atlocation 608, the opposite-homozygotes suggest a non-IBD region. Atlocation 610, however, both pairs of alleles are heterozygotes, suggesting that there is potential for IBD. Similarly, there is potential for IBD atlocation 612, where both pairs of alleles are identical homozygotes, and atlocation 614, where Alice's pair of alleles is heterozygous and Bob's is homozygous. If there is no other opposite-homozygote between 606 and 608 and there are a large number of compatible calls between the two locations, it is then likely that the region between 606 and 608 is an IBD region. - Returning to
FIG. 5 , at 508, the number of shared IBD segments and the amount of DNA shared by the two users are computed based on the IBD. In some embodiments, the longest IBD segment is also determined. In some embodiments, the amount of DNA shared includes the sum of the lengths of IBD regions and/or percentage of DNA shared. The sum is referred to as IBDhalf or half IBD because the individuals share DNA identical by descent for at least one of the homologous chromosomes. At 510, the predicted relationship between the users, the range of possible relationships, or both, is determined using the IBDhalf and number of segments, based on the distribution pattern of IBDhalf and shared segments for different types of relationships. For example, in a first degree parent/child relationship, the individuals have IBDhalf that is 100% the total length of all the autosomal chromosomes and 22 shared autosomal chromosome segments; in a second degree grandparent/grandchild relationship, the individuals have IBDhalf that is approximately half the total length of all the autosomal chromosomes and many more shared segments; in each subsequent degree of relationship, the percentage of IBDhalf of the total length is about 50% of the previous degree. Also, for more distant relationships, in each subsequent degree of relationship, the number of shared segments is approximately half of the previous number. - In various embodiments, the effects of genotyping error are accounted for and corrected. In some embodiments, certain genotyped SNPs are removed from consideration if there are a large number of Mendelian errors when comparing data from known parent/offspring trios. In some embodiments, SNPs that have a high no-call rate or otherwise failed quality control measures during the assay process are removed. In some embodiments, in an IBD segment, an occasional opposite-homozygote is allowed if there is sufficient opposite-homozygotes-free distance (e.g., at least 3 cM and 300 SNPs) surrounding the opposite-homozygote.
- There is a statistical range of possible relationships for the same IBDhalf and shared segment number. In some embodiments, the distribution patterns are determined empirically based on survey of real populations. Different population groups may exhibit different distribution patterns. For example, the level of homozygosity within endogamous populations is found to be higher than in populations receiving gene flow from other groups. In some embodiments, the bounds of particular relationships are estimated using simulations of IBD using generated family trees. Based at least in part on the distribution patterns, the IBDhalf, and shared number of segments, the degree of relationship between two individuals can be estimated.
FIG. 7 is a diagram illustrating example simulated relationship distribution patterns for different population groups according to one embodiment. In particular, Ashkenazi Jews and Europeans are two population groups surveyed. In panels A-C, for each combination of IBDhalf and the number of IBD segments in an Ashkenazi sample group, the 95%, 50% and 5% of obtained nth degree cousinships from 1 million simulated pedigrees are plotted. In panels D-F, for each combination of IBDhalf and the number of IBD segments in a European sample group, the 95%, 50% and 5% of obtained nth degree cousinships from 1 million simulated pedigrees are plotted. In panels G-I, the differences between Ashkenazi and European distant cousinship for the prior panels are represented. Each nth cousinship category is scaled by the expected number of nth degree cousins given a model of population growth. Simulations are conducted by specifying an extended pedigree and creating simulated genomes for the pedigree by simulating the mating of individuals drawn from a pool of empirical genomes. Pairs of individuals who appear to share IBDhalf that was not inherited through the specified simulated pedigree are marked as “unknown” in panels A-F. Thus, special distribution patterns can be used to find relatives of users who have indicated that they belong to certain distinctive population groups such as the Ashkenazi. - The amount of IBD sharing is used in some embodiments to identify different population groups. For example, for a given degree of relationship, since Ashkenazi tend to have much more IBD sharing than non-Ashkenazi Europeans, users may be classified as either Ashkenazi or non-Ashkenazi Europeans based on the number and pattern of IBD matches.
- In some embodiments, instead of, or in addition to, determining the relationship based on the overall number of IBD segments and percent DNA shared, individual chromosomes are examined to determine the relationship. For example, X chromosome information is received in some embodiments in addition to the autosomal chromosomes. The X chromosomes of the users are also processed to identify IBD. Since one of the X chromosomes in a female user is passed on from her father without recombination, the female inherits one X chromosome from her maternal grandmother and another one from her mother. Thus, the X chromosome undergoes recombination at a slower rate compared to autosomal chromosomes and more distant relationships can be predicted using IBD found on the X chromosomes.
- In some embodiments, analyses of mutations within IBD segments can be used to estimate ages of the IBD segments and refine estimates of relationships between users.
- In some embodiments, the relationship determined is verified using non-DNA information. For example, the relationship may be checked against the users' family tree information, birth records, or other user information.
- In some embodiments, the efficiency of IBD region identification is improved by comparing a user's DNA information with the DNA information of multiple other users in parallel and using bitwise operations.
FIG. 8 is a diagram illustrating an embodiment of a highly parallel IBD identification process. Alice's SNP calls are compared with those of multiple other users. Alice's SNP calls are pre-processed to identify ones that are homozygous. Alice's heterozygous calls are not further processed since they always indicate that there is possibility of IBD with another user. For each SNP call in Alice's genome that is homozygous, the zygosity states in the corresponding SNP calls in the other users are encoded. In this example, compatible calls (e.g., heterozygous calls and same homozygous calls) are encoded as 0 and opposite-homozygous calls are encoded as 1. For example, for homozygous SNP call AA atlocation 806, opposite-homozygous calls BB are encoded as 1 and compatible calls (AA and AB) are encoded as 0; for homozygous SNP call EE atlocation 812, opposite-homozygous calls FF are encoded as 1 and compatible calls (EE and EF) are encoded as 0, etc. The encoded representations are stored in arrays such as 818, 820, and 824. In some embodiments, the length of the array is the same as the word length of the processor to achieve greater processing efficiency. For example, in a 64-bit processing system, the array length is set to 64 and the zygosity of 64 users' SNP calls are encoded and stored in the array. - A bitwise operation is performed on the encoded arrays to determine whether a section of DNA such as the section between
locations result array 824. Any user with no opposite-homozygous calls between beginninglocation 806 and endinglocation 816 results in an entry value of 0 inarray 824. The corresponding DNA segment, therefore, is deemed as an IBD region for such user and Alice. In contrast, users with opposite-homozygotes result in corresponding entry values of 1 inarray 824 and they are deemed not to share IBD with Alice in this region. In the example shown,user 1 shares IBD with Alice while other users do not. - In some embodiments, phased data is used instead of unphased data. These data can come directly from assays that produce phased data, or from statistical processing of unphased data. IBD regions are determined by matching the SNP sequences between users. In some embodiments, sequences of SNPs are stored in dictionaries using a hash-table data structure for the ease of comparison.
FIG. 9 is a diagram illustrating an example in which phased data are compared to identify IBD. The sequences are split along pre-defined intervals into non-overlapping words. Other embodiments may use overlapping words. Although a preset length of 3 is used for purposes of illustration in the example shown, many implementations may use words of longer lengths (e.g. 100). Also, the length does not have to be the same for every location. InFIG. 9 , in Alice'schromosome pair 1,chromosome 902 is represented by words AGT, CTG, CAA, . . . andchromosome 904 is represented by CGA, CAG, TCA, . . . . At each location, the words are stored in a hash table that includes information about a plurality of users to enable constant retrieval of which users carry matching haplotypes. Similar hash tables are constructed for other sequences starting at other locations. To determine whether Bob'schromosome pair 1 shares any IBD with Alice's, Bob's sequences are processed into words at the same locations as Alice's. Thus, Bob'schromosome 906 yields CAT, GAC, CCG, . . . and chromosome 908 yields AAT, CTG, CAA, . . . . Every word from Bob's chromosomes is then looked up in the corresponding hash table to check whether any other users have the same word at that location in their genomes. In the example shown, the second and third words of chromosome 908 match second and third words of Alice'schromosome 902. This indicates that SNP sequence CTGCAA is present in both chromosomes and suggests the possibility of IBD sharing. If enough matching words are present in close proximity to each other, the region would be deemed IBD. - In some embodiments, relative relationships found using the techniques described above are used to infer characteristics about the users that are related to each other. In some embodiments, the inferred characteristic is based on non-genetic information pertaining to the related users. For example, if a user is found to have a number of relatives that belong to a particular population group, then an inference is made that the user may also belong to the same population group. In some embodiments, genetic information is used to infer characteristics, in particular characteristics specific to shared IBD segments of the related users. Assume, for example, that Alice has sequenced her entire genome but her relatives in the system have only genotyped SNP data. If Alice's genome sequence indicates that she may have inherited a disease gene, then, with Alice's permission, Alice's relatives who have shared IBD with Alice in the same region that includes the disease gene may be notified that they are at risk for the same disease.
-
FIG. 10 is a block diagram illustrating an embodiment of an ancestry finder system. In this example,ancestry finder system 102 may be implemented using one or more server computers having one or more processors, one or more special purpose computing appliances, or any other appropriate hardware, software, or combinations thereof. The operations of the ancestry finder system are described in greater detail below. In this example, various users of the system (e.g., user 1 (“Alice”) and user 2 (“Bob”)) access the ancestry finder system via anetwork 104 using client devices such as 106 and 108. User information (including genetic information and optionally other personal information such as family information, population group, etc.) pertaining to the users is stored in adatabase 110, which can be implemented on an integral storage component of the ancestry finder system, an attached storage device, a separate storage device accessible by the ancestry finder system, or a combination thereof. Many different arrangements of the physical components are possible in various embodiments. In some embodiments, the entire genome sequences or assayed DNA markers (SNPs, STRs, CNVs, etc.) are stored in the database to facilitate the relative finding process. For example, approximately 650,000 SNPs per individual's genome are assayed and stored in the database in some implementations. -
System 1000 shown in this example includes genetic and other additional non-genetic information for many users. By comparing the recombining DNA information to identify shared IBD regions between various users, the ancestry finder system can infer ancestry information or other characteristics of a user. - In some embodiments,
database 110 includes results of a survey of users. For example, the survey may be an ancestry survey that requests information such as the place (e.g., country and city) of birth, race, and ethnicity of the user. In some embodiments, the survey also requests information regarding relatives of the user, such as the place of birth and race/ethnicity of each of the user's parents and grandparents (if known to the user). For example, Alice may provide her place of birth and ethnicity, as well as the places of birth and ethnicities of each of her parents and grandparents. Such information may be useful for determining information about the ancestry of other users who are related to Alice and/or have a common IBD segment with Alice, as further described below. In some embodiments, Alice may also provide ancestry information for her direct maternal line (mother's mother's mother's . . . mother) as far back as possible. Alice's brother (or any male) might provide ancestry information for his direct paternal line (father's father's father's . . . father) as far back as possible. In various embodiments, the user survey results are stored separately fromdatabase 110, such as in a separate database. In various embodiments,relative finder system 102 andancestry finder system 1002 are part of the same system and may share data. -
FIG. 11 is a flowchart illustrating an embodiment of a process for inferring a characteristic of an individual.Process 1100 may be implemented on an ancestry finder system such as 1100. In some embodiments, if a user has a chromosomal segment that is associated with a particular population group (e.g., based on other users having that chromosomal segment who have identified themselves as being associated with the population group), this process infers that the user has ancestry associated with that population group. The process may be invoked, for example, at a user's request to look for ancestry information and/or potential relatives this user may have in the database. One or more steps ofprocess 1100 may be part of a batch process used to obtain ancestry information (or information used to infer ancestry information) for a plurality of users. - At 1102, an indication that a first user (e.g., Alice) and a second user (e.g., Bob) have at least one shared IBD segment is received. In some embodiments, it is determined that the first user and the second user have at least one shared IBD segment based on information retrieved from a database that stores recombining DNA information of a plurality of users as well as additional user information. For purposes of illustration, SNP information is described extensively in this and following examples. Other DNA information such as STR information, CNV information, exomic sequence information, and/or full sequence information may be used in other embodiments. In some embodiments, the second user is an individual from a reference database, and not necessarily a user of the ancestry finder system.
- At 1104, information about the second user is obtained. In some embodiments, the information about the second user comprises one or more characteristics of one or more relatives of the second user. In some embodiments, at least one characteristic is ancestry information. For example, the information about the second user could comprise ancestry information about the four grandparents of the second user, such as the birthplaces of the four grandparents. Such information could, for example, have been provided by the second user when filling out a survey of characteristics of the user and the user's relatives, such as an ancestry survey, an example of which was previously described. In other embodiments, such information could be provided by a reference database. In some embodiments, the information about the second user is obtained from a database, such as
database 110. - At 1106, a characteristic of the first user is inferred based at least in part on the information about the second user. If the first user and second user share at least one common IBD segment, information about the first user could be inferred from known information about the second user. For example, if it is known that the second user's four grandparents were all born in Ireland, than it can be inferred that the first user has at least some Irish ancestry, or at least that there is someone of Irish ancestry associated with that segment value. Characteristics can include ancestry information. Other examples of characteristics that can be inferred besides ancestry information include any other inherited characteristic, such as information associated with diseases, traits, or any other form of phenotypic information.
- At 1108, the inferred characteristic of the first user is presented in a user interface. Various examples of user interfaces for presenting, displaying, or notifying of the inferred characteristic are possible. Some examples of user interfaces include a table view, an (extended) discovery view, a karyotype view, a geographical map view, as more fully described below. For example, a discovery view may display a list or table of users that are found to be related to the user based on the relative finder system. In addition to displaying relative finder results, ancestry information is displayed next to each relative. For example, next to each relative, there may be a column for each grandparent of that relative. An indication of ancestry information for each grandparent may be displayed in each column, as more fully illustrated below. Other examples of user interfaces include a family tree, which displays a family tree filled with inferred ancestry information about each relative. In some embodiments, the system actively notifies the users by sending messages or alerts about ancestry information when it becomes available or at a given interval.
- In some embodiments, it is determined that for a particular segment, the first user shares IBD with other users besides the second user. If the other users that share this chromosomal segment have conflicting inferred ancestry (or other characteristic) information, then this conflict may be handled in various ways in various embodiments. In some embodiments, all the information is presented. In some embodiments, the information is processed to infer ancestry (or other characteristic information). For example, the information may be processed using a majority rules technique. For example, if the ancestry associated with the majority of the matching users is German, then it is inferred that that segment is associated with German ancestry. In some embodiments, for each segment and each possible value of each segment, the information is processed (e.g., using majority rules), and the processed result is pre-computed and stored for future reference by any of the matching users.
- In some embodiments, 1102 is part of a batch process in which a plurality of users (e.g., from a database) are preprocessed to determine shared IBD segments. The results comprise shared IBD segments for a plurality of users. For example, for a particular user, there may be 278 matches to various segments of the user's chromosomes. At 1104, these results are cross-referenced with a database of information (e.g., ancestry information) about those matching users. For example, of the 278 matches, 49 of those users have completed an ancestry information survey that can be used to infer ancestry information about the particular user.
-
FIG. 12 is a flowchart illustrating an embodiment of a process for determining that a first user and a second user share at least one IBD segment.Process 1200 may be implemented on an ancestry finder system such as 1100. The process may be performed, for example, at 1102 ofprocess 1100. At 1202, recombining DNA information of a first user and recombining DNA information of a second user are received. At 1204, a shared IBD segment between the first user and the second user is determined based at least in part on the recombining DNA information of the first user and recombining DNA information of the second user. Any appropriate technique may be used to determine the shared IBD segment. For example, in some embodiments, one or more steps ofprocess 500 are used to determine the shared IBD segment. In some embodiments, one or more steps of the process described with respect toFIG. 6, 8 , or 9 is used to determine the shared IBD segment. -
FIG. 13 shows an interface example for a table view of an ancestry finder system. In some embodiments, this interface is used to present inferred ancestry information for a user at 1108. In this example, a table is displayed or presented that illustrates the results a typical user of European ancestry might receive. Each row in the table corresponds to a chromosomal segment that the user shares by IBD with another individual (i.e., a relative finder hit). Three of the segments are to (one or more) individuals of Irish ancestry, and one is to an individual of German ancestry. This table shows a relatively low number of ancestry finder hits. In other words, only about 40 cM (roughtly 40 Mb), or a bit more than half a percent of this user's genome. (The calculation is 40 cM/6000 cM=0.7%. The denominator is not 3000 cM, the length of the diploid genome, but twice that, the length of the haploid genome, because there could be an ancestry finder hit to either chromosome. This typically is the case with Ashkenazim.) How much of the genome is covered depends on the number of individuals in the database from the same subpopulations as the user and the consanguinity of that subpopulation. A hit or match means that at least a portion of the user's genome was found in the indicated part of the world prior to the era of intercontinental travel. - In various embodiments, other types of interfaces, tables, charts, or views may be used. For example, a table of the top ten countries associated with the user's ancestral origin may be presented or displayed. These top ten countries are determined based on the total number of non-overlapping DNA segments attributable to each country. A pie chart may be used to show the breakdown among countries.
-
FIG. 14 shows an interface example for the discovery view of a relative finder system that incorporates an ancestry finder system. In some embodiments, this interface is used to present inferred ancestry information for a user at 1108. In this example, the user interface ofFIG. 4C is shown in addition to four new columns: GP1-GP4, representing Grandparents 1-4. For each relative, ancestry information is shown in each of the columns for that relative. In this example, the ancestry information is represented by codes that indicate birth countries of the grandparents. DE is Germany, UK is the United Kingdom, IE is Ireland, US is the United States, and JP is Japan. For example, the first row shows a 4th cousin with four grandparents who are all of German ancestry. In other embodiments, other ancestry information could be displayed, such as race/ethnicity (e.g., NA for Native American). In various embodiments, various other ancestry information could be provided, such as information about the relative's parents or siblings. In various embodiments, various other information about characteristics of the relative could be provided, including disease, trait, or any other form of phenotypic information -
FIG. 15 shows an interface example for a karyotype view of an ancestry finder system. In some embodiments, this interface is used to present inferred ancestry information for a user “Paul Pierce” at 1108. - In this example, the user interface shows a graphic of a user's chromosome on the left hand side. On each chromosome, various segments are colored based on with which ancestry (France, Austria, United States, etc.) that segment is associated. In some embodiments, a segment is associated with an ancestry based on an IBD match with an individual of known ancestry. For example, if a user's segment has an IBD match with another individual, and the birth country of all four grandparents of that individual is provided or known, then it can be inferred that the user's segment is associated with that birth country. In other words, that shared IBD segment was associated with that part of the world (prior to the era of intercontinental travel).
- On the right hand side, a control panel is shown. In the control panel, the customer (user) name is shown, along with user configurable settings. “Minimum segment size” is the minimum length of a matching IBD segment, in this example, 5 cM. Minimum segment size governs the minimum length of displayed ancestry finder hits. It's given in centiMorgans (cM) in the example shown, but can also be expressed in terms of base pairs, kilobases, megabases, or any other appropriate unit. In this example, most hits are short, so increasing the minimum quickly reduces the number of segments displayed. These longer hits, although fewer, are increasingly likely to indicate recent shared ancestry, and therefore to represent genuine/applicable ancestry finder results. In this example, the great majority of IBD hits, and thus ancestry finder segments, are between 5 cM and 10 cM in length. The lower end is controlled by a few threshold parameters in the IBD matching techniqueand can take any appropriate value. The lower edge might be different for the IBD hits on live. One range might be 5 cm to 20 cM (20 cM is the approximate length of chromosome 22). Although a pull down menu is shown, in various embodiments, a slider or other element could be used, with a default initial value.
- “Number of grandparents from the same country” indicates the minimum number of grandparents from the same country in order for an association to be made, as will be more fully described below. In this example, ancestry associations for a segment are only made if the birth countries of all four grandparents are the same. “Display type” indicates that this is a karyotype display.
- “Show North American origin?” indicates whether to include the United States and Canada as ancestry finder matches. For example, a user may desire to exclude North American origin matches because they may not be as informative since most individuals born in North America have ancestry in other parts of the world. In some cases, if this option is unchecked, then relatively fewer number of hits will result because there are many users whose four grandparents are from the United States or Canada. This is a toggle in the example, but other user interface elements may be used.
- In some embodiments, a dialog box such as the one shown may be used for a user to provide feedback in this view or any other view (such as the discovery view). For example, the dialog box may open when a cursor hovers over one of the segments. The shown dialog box asks the user whether the indicated ancestry information could be right. The user can indicate “Yes” or “No” and provide a reason. In some embodiments, this feedback is used by the ancestry finder system to improve future inferences. For example, a user may see one segment that is colored brown to indicate Japan. If the user knows that he does not have any Japanese ancestry, he may select “No” in the dialog box. This may happen, for example, if the individual with the matching IBD segment to the user has indicated his grandparents are all from Japan due to misinformation. This could also happen if there is misleading information. For example a user could be born in Australia, but not be Aborigines. In some cases, a particular IBD segment may be found in more than one part of the world, leading to this result. For example, there may be tracks of chromosome that do not actually correspond to recent shared ancestry—they could correspond to old ancestry and be shared by many users.
- In various embodiments, other user configurable settings may be included, such as an option to Show Public Reference Individuals and/or Show Relative Finder Users. There may be hits to customers via the ancestry survey, and to anonymous reference individuals of known ancestry from public genotype databases. The customer may wish to toggle visibility of the public individuals/Relative Finder. In some embodiments, this could rotate through three states: Both->Relative Finder Users Only->Public Only.
- An option to “Notify Me of Changes to My Ancestry Finder Results” may be provided. Ancestry finder results will change/improve with time. This would allow a user to indicate whether and how the user wishes to be notified of these changes. For example, the user could request to be notified as soon as they happen (up to daily), weekly in a batch, monthly in a batch. The user could indicate which results the user wants to be notified of (e.g., four grandparents, three grandparents, and/or Non-North American).
-
FIG. 16 shows an example of the effect of varying some of the settings in a karyotype view of an ancestry finder system, such as that shown inFIG. 15 . - “Number of Grandparents from Same Country” is the minimum number of co-located grandparents for an ancestry finder match to be made. This governs how many grandparents of the individual matching a user must be associated with the same country/ethnicity. For example, if four, then only segments for which all four grandparents come from the same country will be shown. The table shows the overall distribution of ancestry survey responses across an example database of users. The possible values are integers from four to one, inclusive.
- Varying “Number of Grandparents from Same Country” and “Show North American Origin?” affects the number and informativeness of ancestry finder hits displayed. For example, in an example database, most customers of European descent have about 100 relative finder hits in the database, and about a third of the users have taken an ancestry survey, this means there are 100*(⅓) or about 30 users who match and have provided ancestry information via the ancestry survey. Generally speaking, the most informative ancestry finder matches are those when there is a match who has four grandparents from the same country, and that country is not the US or Canada. Since such matches are relatively rare in the example database, perhaps 0 to 5 of the approximately 30 matches from above, users may wish to see their other matches, even though they may be less informative. Decreasing the number of grandparents required to be from the same country from 4 to 3 to 2 to 1 increases the number of hits displayed, as does allowing matches from the US and Canada to be shown.
-
FIG. 17 shows an interface example of a karyotype view of an ancestry finder system in which “Number of grandparents from the same country” is 2. In this example, the interface ofFIG. 15 is shown for a user “Adrian Lee” who has selected 2 as the “Number of grandparents from the same country.” This means that for each shared IBD segment shown (colored), the user had shared IBD with someone who has at least two grandparents from the same country. As shown, some of the segments show two or more colors (split horizontally) to indicate that that segment is IBD matched with someone whose grandparents could be from as many as three different countries. In contrast,FIG. 15 shows only one color per segment because all four grandparents must be from the same country. In various embodiments, a variety of graphical elements or visual cues may be used to depict such inferences about characteristics (such as ancestry information) that have been made based on one or more shared IBD segments. -
FIG. 18 shows an interface example for a geographical map view of an ancestry finder system. The left hand side shows a map of some region of the world. The dots are latitude/longitude coordinates inferred from text typed in by users describing their specific places of birth, and/or those of their ancestors. The dots are the birthplaces of US-born grandparents of other users who have an IBD match with user Paul Pierce. The match is based on a minimum segment size of 5 cM, minimum number of grandparents from the same country of 4, and North American origin. For example, the map could be of North America. -
FIG. 19 is a flowchart illustrating an embodiment of a process for inferring a characteristic of an individual from the ancestry or other phenotypic information of a second individual, even if the two individuals are not related by DNA.Process 1900 may be implemented on an ancestry finder system such as 1100. The process may be invoked, for example, at a user's request to look for ancestry information. In some embodiments, this process is performed between 1106 and 1108 ofprocess 1100, for cases in which a third user is requesting ancestry information. In this case, it is known that the third user is related to the first user, but the third user does not share an IBD segment with the second user. In some cases, the third user does not share an IBD segment with the first user either. - For example, the third user may be a sibling of the first user, but the third user does not have an IBD match with any other user in the database. However, the first user has an IBD match with the second user. Because it is known that the first user and the third user are siblings, and that the first user and second user have shared IBD for at least one segment, then, in some embodiments, ancestry information for the third user can be inferred from the ancestry information about the second user. In some cases, this means that it need not be determined whether the third user has any shared IBD segments with the second user, and this ancestry information can be inferred as soon as it is known that the third user is a sibling of the first user.
- As a more specific example, the first user and the third user are full siblings, the second user is the father of the first user, the first user has a segment that is a shared IBD segment with his father (the second user), but the third user does not share that IBD segment with his father (the second user). If both of the father's parents are from Germany, then it can be inferred that the third user has at least two grandparents, his paternal grandparents, from Germany. If instead the relationship between the first user and second user is not known, perhaps because the second user is a distant cousin of the first user instead of the father, then it can at least be inferred that the third user has German ancestry, because the third user's lineages are identical to those of the first user's.
- At 1902, an indication that a third user has a known degree of relationship to the first user is received. The third user does not necessarily share an IBD segment with either the first user or the second user. In the example above, the known degree of relationship between the third user and the first user is “sibling.”
- At 1904, a characteristic of the third user is inferred based at least in part on the inferred characteristic of the first user. In the example above, it is inferred that the third user has German ancestry, or if it is known that the second user is the father of the first user, then it is inferred that the third user has at least two grandparents from Germany.
- Although the foregoing embodiments have been described in some detail for purposes of clarity of understanding, the invention is not limited to the details provided. There are many alternative ways of implementing the invention. The disclosed embodiments are illustrative and not restrictive.
Claims (21)
1-20. (canceled)
21. An ancestry finder system comprising one or more processors and one or more memories, the one or more processors being configured to:
(a) determine an ancestral origin of at least one chromosomal segment of a first user to be a country, a geographical region, or an ethnicity associated with a matching user, wherein:
the first user and the matching user share the least one chromosomal segment,
the at least one chromosomal segment comprises at least one identical-by-descent (IBD) segment between the first user and the matching user, and
the at least one IBD segment has a length meeting a minimum length;
(b) cause to display in a graphical user interface ancestral information pertaining to the ancestral origin of the at least one chromosomal segment of the first user, wherein the ancestral information comprises a graphical representation of the at least one chromosomal segment corresponding to the at least one IBD segment having the length meeting the minimum length, and wherein the ancestral origin is indicated on the graphical representation of the at least one chromosomal segment;
(c) cause to display in the graphical user interface an input element for adjusting the minimum length;
(d) receive via the input element a user input to adjust the minimum length;
(e) repeat (a) using the adjusted minimum length, wherein the at least one IBD segment has a length meeting the adjusted minimum length; and
(f) cause to update in the graphical user interface the graphical representation of the at least one chromosomal segment corresponding to the at least one IBD segment having the length meeting the adjusted minimum length.
22. The system of claim 21 , wherein:
the ancestral information comprises a graphical representation of a karyotype of the first user;
the karyotype of the first user comprises one or more chromosome pairs; and
the one or more chromosome pairs comprise the at least one chromosomal segment corresponding to the at least one IBD segment.
23. The system of claim 21 , wherein the matching user has at least two grandparents who were born in the country or the geographical region or had an ethnic background in the ethnicity.
24. The system of claim 21 , wherein the at least one IBD segment is determined based on genetic markers in the at least one IBD segment.
25. The system of claim 24 , wherein the at least one IBD segment is determined based on whether the genetic markers in the at least one IBD segment are opposite homozygous between the first user and the matching user.
26. The system of claim 21 , wherein the ancestral origin of the at least one chromosomal segment is indicated by a graphical characteristic.
27. The system of claim 26 , wherein the ancestral origin of the at least one chromosomal segment is indicated by a color.
28. The system of claim 21 , wherein the ancestral origin of the at least one chromosomal segment of the first user comprises two or more different birth countries, geographical regions, or ethnicities.
29. The system of claim 21 , wherein the first user and the matching user are related but previously unknown to be related.
30. The system of claim 21 , wherein the ancestral information pertaining to the ancestral origin of the at least one chromosomal segment of the first user comprises a table of countries associated with the first user's ancestral origin.
31. The system of claim 21 , wherein the ancestral information pertaining to the ancestral origin of the at least one chromosomal segment of the first user comprises a pie chart of countries associated with the first user's ancestral origin.
32. The system of claim 21 , wherein the ancestral information pertaining to the ancestral origin of the at least one chromosomal segment of the first user comprises a map of countries associated with the first user's ancestral origin.
33. The system of claim 21 , wherein the one or more processors is further configured to:
receive an indication that a third user has a known degree of relationship to the first user; and
determine, based at least in part on the ancestral origin of the at least one chromosomal segment of the first user, a characteristic of the third user.
34. A method of operating an ancestry finder database with hundreds of thousands of genetic markers to display an ancestry origin of at least one chromosomal segment of a first user, comprising:
(a) determining, using one or more processors of an ancestry finder system, an ancestral origin of at least one chromosomal segment of the first user to be a country, a geographical region, or an ethnicity associated with a matching user, wherein:
the first user and the matching user share the least one chromosomal segment,
the at least one chromosomal segment comprises at least one identical-by-descent (IBD) segment between the first user and the matching user, and
the at least one IBD segment has a length meeting a minimum length;
(b) causing to display in a graphical user interface ancestral information pertaining to the ancestral origin of the at least one chromosomal segment of the first user, wherein the ancestral information comprises a graphical representation of the at least one chromosomal segment corresponding to the at least one IBD segment having the length meeting the minimum length, and wherein the ancestral origin is indicated on the graphical representation of the at least one chromosomal segment;
(c) causing to display in the graphical user interface an input element for adjusting the minimum length;
(d) receiving via the input element a user input to adjust the minimum length;
(e) repeating (a) using the adjusted minimum length, wherein the at least one IBD segment has a length meeting the adjusted minimum length; and
(f) causing to update in the graphical user interface the graphical representation of the at least one chromosomal segment corresponding to the at least one IBD segment having the length meeting the adjusted minimum length.
35. A computer program product, the computer program product comprising a non-transitory computer readable storage medium having stored thereon computer instructions for estimating and displaying an ancestral origin of at least one chromosomal segment of a first user, the computer instructions comprising:
(a) determining an ancestral origin of at least one chromosomal segment of the first user to be a country, a geographical region, or an ethnicity associated with a matching user, wherein:
the first user and the matching user share the least one chromosomal segment,
the at least one chromosomal segment comprises at least one identical-by-descent (IBD) segment between the first user and the matching user, and
the at least one IBD segment has a length meeting a minimum length;
(b) causing to display in a graphical user interface ancestral information pertaining to the ancestral origin of the at least one chromosomal segment of the first user, wherein the ancestral information comprises a graphical representation of the at least one chromosomal segment corresponding to the at least one IBD segment having the length meeting the minimum length, and wherein the ancestral origin is indicated on the graphical representation of the at least one chromosomal segment;
(c) causing to display in the graphical user interface an input element for adjusting the minimum length;
(d) receiving via the input element a user input to adjust the minimum length;
(e) repeating (a) using the adjusted minimum length, wherein the at least one IBD segment has a length meeting the adjusted minimum length; and
(f) causing to update in the graphical user interface the graphical representation of the at least one chromosomal segment corresponding to the at least one IBD segment having the length meeting the adjusted minimum length.
36. The computer program product of claim 35 , wherein:
the ancestral information comprises a graphical representation of a karyotype of the first user;
the karyotype of the first user comprises one or more chromosome pairs; and
the one or more chromosome pairs comprise the at least one chromosomal segment corresponding to the at least one IBD segment.
37. The computer program product of claim 35 , wherein the matching user has at least two grandparents who were born in the country or the geographical region or had an ethnic background in the ethnicity.
38. The computer program product of claim 35 , wherein the at least one IBD segment is determined based on genetic markers in the at least one IBD segment.
39. The computer program product of claim 35 , wherein the ancestral origin of the at least one chromosomal segment is indicated by a graphical characteristic.
40. The computer program product of claim 39 , wherein the ancestral origin of the at least one chromosomal segment is indicated by a color.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/880,566 US20220375547A1 (en) | 2008-12-31 | 2022-08-03 | Ancestry finder |
US18/198,558 US20240242783A1 (en) | 2008-12-31 | 2023-08-07 | Ancestry Finder |
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US20419508P | 2008-12-31 | 2008-12-31 | |
US12/644,791 US8463554B2 (en) | 2008-12-31 | 2009-12-22 | Finding relatives in a database |
US77454610A | 2010-05-05 | 2010-05-05 | |
US15/664,619 US10854318B2 (en) | 2008-12-31 | 2017-07-31 | Ancestry finder |
US17/077,930 US11468971B2 (en) | 2008-12-31 | 2020-10-22 | Ancestry finder |
US17/880,566 US20220375547A1 (en) | 2008-12-31 | 2022-08-03 | Ancestry finder |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/077,930 Continuation US11468971B2 (en) | 2008-12-31 | 2020-10-22 | Ancestry finder |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/198,558 Continuation US20240242783A1 (en) | 2008-12-31 | 2023-08-07 | Ancestry Finder |
Publications (1)
Publication Number | Publication Date |
---|---|
US20220375547A1 true US20220375547A1 (en) | 2022-11-24 |
Family
ID=42310062
Family Applications (20)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/644,791 Active 2031-05-12 US8463554B2 (en) | 2008-12-31 | 2009-12-22 | Finding relatives in a database |
US13/871,744 Abandoned US20140006433A1 (en) | 2008-12-31 | 2013-04-26 | Finding relatives in a database |
US15/264,493 Abandoned US20170228498A1 (en) | 2008-12-31 | 2016-09-13 | Finding relatives in a database |
US15/664,619 Active 2031-07-07 US10854318B2 (en) | 2008-12-31 | 2017-07-31 | Ancestry finder |
US16/129,645 Abandoned US20190012431A1 (en) | 2008-12-31 | 2018-09-12 | Finding relatives in a database |
US17/073,095 Active US11031101B2 (en) | 2008-12-31 | 2020-10-16 | Finding relatives in a database |
US17/073,110 Active US11049589B2 (en) | 2008-12-31 | 2020-10-16 | Finding relatives in a database |
US17/073,128 Abandoned US20210043280A1 (en) | 2008-12-31 | 2020-10-16 | Finding relatives in a database |
US17/073,122 Abandoned US20210043279A1 (en) | 2008-12-31 | 2020-10-16 | Finding relatives in a database |
US17/077,930 Active US11468971B2 (en) | 2008-12-31 | 2020-10-22 | Ancestry finder |
US17/301,129 Abandoned US20210225458A1 (en) | 2008-12-31 | 2021-03-25 | Finding relatives in a database |
US17/351,052 Active US11322227B2 (en) | 2008-12-31 | 2021-06-17 | Finding relatives in a database |
US17/576,738 Active US11508461B2 (en) | 2008-12-31 | 2022-01-14 | Finding relatives in a database |
US17/880,566 Abandoned US20220375547A1 (en) | 2008-12-31 | 2022-08-03 | Ancestry finder |
US17/975,949 Active US11657902B2 (en) | 2008-12-31 | 2022-10-28 | Finding relatives in a database |
US17/979,412 Active US11935628B2 (en) | 2008-12-31 | 2022-11-02 | Finding relatives in a database |
US18/191,525 Active US11776662B2 (en) | 2008-12-31 | 2023-03-28 | Finding relatives in a database |
US18/198,558 Pending US20240242783A1 (en) | 2008-12-31 | 2023-08-07 | Ancestry Finder |
US18/434,362 Active US12100487B2 (en) | 2008-12-31 | 2024-02-06 | Finding relatives in a database |
US18/762,304 Pending US20240355428A1 (en) | 2008-12-31 | 2024-07-02 | Finding Relatives in a Database |
Family Applications Before (13)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/644,791 Active 2031-05-12 US8463554B2 (en) | 2008-12-31 | 2009-12-22 | Finding relatives in a database |
US13/871,744 Abandoned US20140006433A1 (en) | 2008-12-31 | 2013-04-26 | Finding relatives in a database |
US15/264,493 Abandoned US20170228498A1 (en) | 2008-12-31 | 2016-09-13 | Finding relatives in a database |
US15/664,619 Active 2031-07-07 US10854318B2 (en) | 2008-12-31 | 2017-07-31 | Ancestry finder |
US16/129,645 Abandoned US20190012431A1 (en) | 2008-12-31 | 2018-09-12 | Finding relatives in a database |
US17/073,095 Active US11031101B2 (en) | 2008-12-31 | 2020-10-16 | Finding relatives in a database |
US17/073,110 Active US11049589B2 (en) | 2008-12-31 | 2020-10-16 | Finding relatives in a database |
US17/073,128 Abandoned US20210043280A1 (en) | 2008-12-31 | 2020-10-16 | Finding relatives in a database |
US17/073,122 Abandoned US20210043279A1 (en) | 2008-12-31 | 2020-10-16 | Finding relatives in a database |
US17/077,930 Active US11468971B2 (en) | 2008-12-31 | 2020-10-22 | Ancestry finder |
US17/301,129 Abandoned US20210225458A1 (en) | 2008-12-31 | 2021-03-25 | Finding relatives in a database |
US17/351,052 Active US11322227B2 (en) | 2008-12-31 | 2021-06-17 | Finding relatives in a database |
US17/576,738 Active US11508461B2 (en) | 2008-12-31 | 2022-01-14 | Finding relatives in a database |
Family Applications After (6)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/975,949 Active US11657902B2 (en) | 2008-12-31 | 2022-10-28 | Finding relatives in a database |
US17/979,412 Active US11935628B2 (en) | 2008-12-31 | 2022-11-02 | Finding relatives in a database |
US18/191,525 Active US11776662B2 (en) | 2008-12-31 | 2023-03-28 | Finding relatives in a database |
US18/198,558 Pending US20240242783A1 (en) | 2008-12-31 | 2023-08-07 | Ancestry Finder |
US18/434,362 Active US12100487B2 (en) | 2008-12-31 | 2024-02-06 | Finding relatives in a database |
US18/762,304 Pending US20240355428A1 (en) | 2008-12-31 | 2024-07-02 | Finding Relatives in a Database |
Country Status (3)
Country | Link |
---|---|
US (20) | US8463554B2 (en) |
EP (2) | EP2370929A4 (en) |
WO (1) | WO2010077336A1 (en) |
Families Citing this family (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080228699A1 (en) | 2007-03-16 | 2008-09-18 | Expanse Networks, Inc. | Creation of Attribute Combination Databases |
US9336177B2 (en) * | 2007-10-15 | 2016-05-10 | 23Andme, Inc. | Genome sharing |
US10275569B2 (en) | 2007-10-15 | 2019-04-30 | 22andMe, Inc. | Family inheritance |
EP2370929A4 (en) | 2008-12-31 | 2016-11-23 | 23Andme Inc | Finding relatives in a database |
WO2012099890A1 (en) * | 2011-01-18 | 2012-07-26 | University Of Utah Research Foundation | Estimation of recent shared ancestry |
US8990250B1 (en) | 2011-10-11 | 2015-03-24 | 23Andme, Inc. | Cohort selection with privacy protection |
EP2769322A4 (en) | 2011-10-17 | 2015-03-04 | Intertrust Tech Corp | Systems and methods for protecting and governing genomic and other information |
US10437858B2 (en) | 2011-11-23 | 2019-10-08 | 23Andme, Inc. | Database and data processing system for use with a network-based personal genetics services platform |
US10025877B2 (en) | 2012-06-06 | 2018-07-17 | 23Andme, Inc. | Determining family connections of individuals in a database |
US9836576B1 (en) | 2012-11-08 | 2017-12-05 | 23Andme, Inc. | Phasing of unphased genotype data |
US9213947B1 (en) | 2012-11-08 | 2015-12-15 | 23Andme, Inc. | Scalable pipeline for local ancestry inference |
WO2014145280A1 (en) * | 2013-03-15 | 2014-09-18 | Ancestry.Com Dna, Llc | Family networks |
US9747345B2 (en) * | 2014-08-12 | 2017-08-29 | Ancestry.Com Operations Inc. | System and method for identifying relationships in a data graph |
US10720229B2 (en) | 2014-10-14 | 2020-07-21 | Ancestry.Com Dna, Llc | Reducing error in predicted genetic relationships |
US10395759B2 (en) | 2015-05-18 | 2019-08-27 | Regeneron Pharmaceuticals, Inc. | Methods and systems for copy number variant detection |
JP6327234B2 (en) * | 2015-11-06 | 2018-05-23 | 横河電機株式会社 | Event analysis device, event analysis system, event analysis method, and event analysis program |
AU2017218149B2 (en) | 2016-02-12 | 2020-09-03 | Regeneron Pharmaceuticals, Inc. | Methods and systems for detection of abnormal karyotypes |
US11341378B2 (en) * | 2016-02-26 | 2022-05-24 | Nec Corporation | Information processing apparatus, suspect information generation method and program |
CA3066227A1 (en) | 2017-06-05 | 2018-12-13 | Peng Jiang | Customized coordinate ascent for ranking data records |
US11107556B2 (en) * | 2017-08-29 | 2021-08-31 | Helix OpCo, LLC | Authorization system that permits granular identification of, access to, and recruitment of individualized genomic data |
NZ769586A (en) * | 2018-04-05 | 2020-11-27 | Ancestry Com Dna Llc | Community assignments in identity by descent networks and genetic variant origination |
US11515001B2 (en) * | 2018-05-28 | 2022-11-29 | Eve's Kids Inc. | Systems and methods for genealogical graphing |
US20200104463A1 (en) | 2018-09-28 | 2020-04-02 | Chris Glode | Genomic network service user interface |
US10861587B2 (en) * | 2018-10-24 | 2020-12-08 | Helix OpCo, LLC | Cross-network genomic data user interface |
CA3128459A1 (en) | 2019-02-01 | 2020-08-06 | Ancestry.Com Operations Inc. | Search and ranking of records across different databases |
WO2021016114A1 (en) | 2019-07-19 | 2021-01-28 | 23Andme, Inc. | Phase-aware determination of identity-by-descent dna segments |
CA3154157A1 (en) | 2019-09-13 | 2021-03-18 | 23Andme, Inc. | Methods and systems for determining and displaying pedigrees |
AU2020388555A1 (en) * | 2019-11-18 | 2022-06-02 | Embark Veterinary, Inc. | Methods and systems for determining ancestral relatedness |
CN110909259A (en) * | 2019-11-27 | 2020-03-24 | 腾讯科技(深圳)有限公司 | Block chain-based user recommendation method, device, equipment and storage medium |
CA3165254A1 (en) * | 2019-12-20 | 2021-06-24 | Ancestry.Com Dna, Llc | Linking individual datasets to a database |
US11817176B2 (en) | 2020-08-13 | 2023-11-14 | 23Andme, Inc. | Ancestry composition determination |
US11461193B2 (en) * | 2020-09-24 | 2022-10-04 | International Business Machines Corporation | Data storage volume recovery management |
EP4200858A4 (en) | 2020-10-09 | 2024-08-28 | 23Andme Inc | Formatting and storage of genetic markers |
US12079238B2 (en) * | 2021-07-22 | 2024-09-03 | Ancestry.Com Dna, Llc | Storytelling visualization of genealogy data in a large-scale database |
US12086914B2 (en) * | 2021-11-24 | 2024-09-10 | Ancestry.Com Dna, Llc | Graphical user interface for presenting geographic boundary estimation |
US20240018581A1 (en) * | 2022-07-15 | 2024-01-18 | Massachusetts Institute Of Technology | Mixture deconvolution method for identifying dna profiles |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8855935B2 (en) * | 2006-10-02 | 2014-10-07 | Ancestry.Com Dna, Llc | Method and system for displaying genetic and genealogical data |
US10854318B2 (en) * | 2008-12-31 | 2020-12-01 | 23Andme, Inc. | Ancestry finder |
Family Cites Families (402)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5424186A (en) | 1989-06-07 | 1995-06-13 | Affymax Technologies N.V. | Very large scale immobilized polymer synthesis |
US5143854A (en) | 1989-06-07 | 1992-09-01 | Affymax Technologies N.V. | Large scale photolithographic solid phase synthesis of polypeptides and receptor binding screening thereof |
US5288644A (en) | 1990-04-04 | 1994-02-22 | The Rockefeller University | Instrument and method for the sequencing of genome |
ATE199054T1 (en) | 1990-12-06 | 2001-02-15 | Affymetrix Inc A Delaware Corp | COMPOUNDS AND THEIR USE IN A BINARY SYNTHESIS STRATEGY |
US5301105A (en) | 1991-04-08 | 1994-04-05 | Desmond D. Cummings | All care health management system |
US5384261A (en) | 1991-11-22 | 1995-01-24 | Affymax Technologies N.V. | Very large scale immobilized polymer synthesis using mechanically directed flow paths |
JP3526585B2 (en) | 1992-03-12 | 2004-05-17 | 株式会社リコー | Query Processing Optimization Method for Distributed Database |
US5376526A (en) | 1992-05-06 | 1994-12-27 | The Board Of Trustees Of The Leland Stanford Junior University | Genomic mismatch scanning |
US6131092A (en) | 1992-08-07 | 2000-10-10 | Masand; Brij | System and method for identifying matches of query patterns to document text in a document textbase |
US20030212579A1 (en) | 2002-05-08 | 2003-11-13 | Brown Stephen J. | Remote health management system |
US8078407B1 (en) | 1997-03-28 | 2011-12-13 | Health Hero Network, Inc. | System and method for identifying disease-influencing genes |
US5985559A (en) | 1997-04-30 | 1999-11-16 | Health Hero Network | System and method for preventing, diagnosing, and treating genetic and pathogen-caused disease |
US5551880A (en) | 1993-01-22 | 1996-09-03 | Bonnstetter; Bill J. | Employee success prediction system |
US5649181A (en) | 1993-04-16 | 1997-07-15 | Sybase, Inc. | Method and apparatus for indexing database columns with bit vectors |
US5692501A (en) | 1993-09-20 | 1997-12-02 | Minturn; Paul | Scientific wellness personal/clinical/laboratory assessments, profile and health risk managment system with insurability rankings on cross-correlated 10-point optical health/fitness/wellness scales |
US5839120A (en) | 1993-11-30 | 1998-11-17 | Thearling; Kurt | Genetic algorithm control arrangement for massively parallel computer |
US5660176A (en) | 1993-12-29 | 1997-08-26 | First Opinion Corporation | Computerized medical diagnostic and treatment advice system |
US6750011B1 (en) | 1994-06-17 | 2004-06-15 | Mark W. Perlin | Method and system for genotyping |
AU1837495A (en) | 1994-10-13 | 1996-05-06 | Horus Therapeutics, Inc. | Computer assisted methods for diagnosing diseases |
US5941947A (en) | 1995-08-18 | 1999-08-24 | Microsoft Corporation | System and method for controlling access to data entities in a computer network |
US6897022B2 (en) | 1996-03-29 | 2005-05-24 | University Of Miami | Susceptability and resistance genes for bipolar affective disorder |
US5752242A (en) | 1996-04-18 | 1998-05-12 | Electronic Data Systems Corporation | System and method for automated retrieval of information |
US6203993B1 (en) | 1996-08-14 | 2001-03-20 | Exact Science Corp. | Methods for the detection of nucleic acids |
US5940802A (en) | 1997-03-17 | 1999-08-17 | The Board Of Regents Of The University Of Oklahoma | Digital disease management system |
US6063028A (en) | 1997-03-20 | 2000-05-16 | Luciano; Joanne Sylvia | Automated treatment selection method |
JP2001519070A (en) | 1997-03-24 | 2001-10-16 | クイーンズ ユニバーシティー アット キングストン | Method, product and device for match detection |
US20060020614A1 (en) | 1997-08-08 | 2006-01-26 | Kolawa Adam K | Method and apparatus for automated selection, organization, and recommendation of items based on user preference topography |
US6014631A (en) | 1998-04-02 | 2000-01-11 | Merck-Medco Managed Care, Llc | Computer implemented patient medication review system and process for the managed care, health care and/or pharmacy industry |
US7921068B2 (en) | 1998-05-01 | 2011-04-05 | Health Discovery Corporation | Data mining platform for knowledge discovery from heterogeneous data types and/or heterogeneous data sources |
US7444308B2 (en) | 2001-06-15 | 2008-10-28 | Health Discovery Corporation | Data mining platform for bioinformatics and other knowledge discovery |
US6108647A (en) | 1998-05-21 | 2000-08-22 | Lucent Technologies, Inc. | Method, apparatus and programmed medium for approximating the data cube and obtaining approximate answers to queries in relational databases |
EP1084273A1 (en) | 1998-06-06 | 2001-03-21 | Genostic Pharma Limited | Probes used for genetic profiling |
CA2273616A1 (en) | 1998-06-08 | 1999-12-08 | The Board Of Trustees Of The Leland Stanford Junior University | Method for parallel screening of allelic variation |
AUPP398898A0 (en) | 1998-06-09 | 1998-07-02 | University Of Queensland, The | Diagnostic method and apparatus |
US6216134B1 (en) | 1998-06-25 | 2001-04-10 | Microsoft Corporation | Method and system for visualization of clusters and classifications |
US6703228B1 (en) | 1998-09-25 | 2004-03-09 | Massachusetts Institute Of Technology | Methods and products related to genotyping and DNA analysis |
US6269364B1 (en) | 1998-09-25 | 2001-07-31 | Intel Corporation | Method and apparatus to automatically test and modify a searchable knowledge base |
US6393399B1 (en) | 1998-09-30 | 2002-05-21 | Scansoft, Inc. | Compound word recognition |
US6253203B1 (en) | 1998-10-02 | 2001-06-26 | Ncr Corporation | Privacy-enhanced database |
US6506562B1 (en) | 1998-10-26 | 2003-01-14 | Yale University | Allele frequency differences method for phenotype cloning |
ES2229781T3 (en) | 1998-11-10 | 2005-04-16 | Genset | METHODS, PROGRAMS AND APPARATUS TO IDENTIFY GENOMIC REGIONS THAT HOST AN ASSOCIATED GENE WITH A DETECTABLE TRAIT. |
US7076504B1 (en) | 1998-11-19 | 2006-07-11 | Accenture Llp | Sharing a centralized profile |
US6994962B1 (en) | 1998-12-09 | 2006-02-07 | Massachusetts Institute Of Technology | Methods of identifying point mutations in a genome |
US20010000810A1 (en) | 1998-12-14 | 2001-05-03 | Oliver Alabaster | Computerized visual behavior analysis and training method |
US6601059B1 (en) | 1998-12-23 | 2003-07-29 | Microsoft Corporation | Computerized searching tool with spell checking |
US6487541B1 (en) | 1999-01-22 | 2002-11-26 | International Business Machines Corporation | System and method for collaborative filtering with applications to e-commerce |
US6694311B1 (en) | 1999-01-25 | 2004-02-17 | International Business Machines Corporation | Method and apparatus for fast query approximation using adaptive query vector projection |
GB9904585D0 (en) | 1999-02-26 | 1999-04-21 | Gemini Research Limited | Clinical and diagnostic database |
DE19911130A1 (en) | 1999-03-12 | 2000-09-21 | Hager Joerg | Methods for identifying chromosomal regions and genes |
US6629097B1 (en) | 1999-04-28 | 2003-09-30 | Douglas K. Keith | Displaying implicit associations among items in loosely-structured data sets |
US7159011B1 (en) | 1999-05-11 | 2007-01-02 | Maquis Techtrix, Llc | System and method for managing an online message board |
US6912492B1 (en) | 1999-05-25 | 2005-06-28 | University Of Medicine & Dentistry Of New Jersey | Methods for diagnosing, preventing, and treating developmental disorders due to a combination of genetic and environmental factors |
US9486429B2 (en) | 1999-06-01 | 2016-11-08 | Vanderbilt University | Therapeutic methods employing nitric oxide precursors |
EP1208421A4 (en) | 1999-06-25 | 2004-10-20 | Genaissance Pharmaceuticals | Methods for obtaining and using haplotype data |
US6321163B1 (en) | 1999-09-02 | 2001-11-20 | Genetics Institute, Inc. | Method and apparatus for analyzing nucleic acid sequences |
EP1261932B1 (en) | 1999-10-13 | 2009-09-30 | Sequenom, Inc. | Methods for identifying polymorphic genetic markers |
US6730023B1 (en) | 1999-10-15 | 2004-05-04 | Hemopet | Animal genetic and health profile database management |
US6640211B1 (en) | 1999-10-22 | 2003-10-28 | First Genetic Trust Inc. | Genetic profiling and banking system and method |
US7630986B1 (en) | 1999-10-27 | 2009-12-08 | Pinpoint, Incorporated | Secure data interchange |
US20050090718A1 (en) | 1999-11-02 | 2005-04-28 | Dodds W J. | Animal healthcare well-being and nutrition |
GB2363874B (en) | 1999-11-06 | 2004-08-04 | Dennis Sunga Fernandez | Bioinformatic transaction scheme |
WO2001037878A2 (en) | 1999-11-29 | 2001-05-31 | Orchid Biosciences, Inc. | Methods of identifying optimal drug combinations and compositions thereof |
CA2382165A1 (en) | 1999-12-08 | 2001-06-14 | Genset S.A. | Full-length human cdnas encoding potentially secreted proteins |
US6507840B1 (en) | 1999-12-21 | 2003-01-14 | Lucent Technologies Inc. | Histogram-based approximation of set-valued query-answers |
AU2370901A (en) | 1999-12-30 | 2001-07-16 | Starlab Nv/Sa | Methods for collecting genetic material |
US6980958B1 (en) | 2000-01-11 | 2005-12-27 | Zycare, Inc. | Apparatus and methods for monitoring and modifying anticoagulation therapy of remotely located patients |
US7366719B2 (en) | 2000-01-21 | 2008-04-29 | Health Discovery Corporation | Method for the manipulation, storage, modeling, visualization and quantification of datasets |
US20020048763A1 (en) | 2000-02-04 | 2002-04-25 | Penn Sharron Gaynor | Human genome-derived single exon nucleic acid probes useful for gene expression analysis |
US20020019746A1 (en) | 2000-03-16 | 2002-02-14 | Rienhoff Hugh Y. | Aggregating persons with a select profile for further medical characterization |
DE10017675A1 (en) | 2000-04-08 | 2001-12-06 | Qtl Ag Ges Zur Erforschung Kom | Procedure for the identification and isolation of genome fragments with coupling imbalance |
WO2001079561A2 (en) | 2000-04-17 | 2001-10-25 | Liggett Stephen B | Alpha-2 adrenergic receptor polymorphisms |
US20020052761A1 (en) | 2000-05-11 | 2002-05-02 | Fey Christopher T. | Method and system for genetic screening data collection, analysis, report generation and access |
US20030171876A1 (en) | 2002-03-05 | 2003-09-11 | Victor Markowitz | System and method for managing gene expression data |
US20020077775A1 (en) | 2000-05-25 | 2002-06-20 | Schork Nicholas J. | Methods of DNA marker-based genetic analysis using estimated haplotype frequencies and uses thereof |
US20020049772A1 (en) | 2000-05-26 | 2002-04-25 | Hugh Rienhoff | Computer program product for genetically characterizing an individual for evaluation using genetic and phenotypic variation over a wide area network |
US20020010552A1 (en) | 2000-05-26 | 2002-01-24 | Hugh Rienhoff | System for genetically characterizing an individual for evaluation using genetic and phenotypic variation over a wide area network |
US6931326B1 (en) | 2000-06-26 | 2005-08-16 | Genaissance Pharmaceuticals, Inc. | Methods for obtaining and using haplotype data |
AU2001271670A1 (en) | 2000-06-29 | 2002-01-14 | Alpha Blox Corporation | Caching scheme for multi-dimensional data |
US6519604B1 (en) | 2000-07-19 | 2003-02-11 | Lucent Technologies Inc. | Approximate querying method for databases with multiple grouping attributes |
US6687696B2 (en) | 2000-07-26 | 2004-02-03 | Recommind Inc. | System and method for personalized search, information filtering, and for generating recommendations utilizing statistical latent class models |
EP1346063A2 (en) | 2000-07-31 | 2003-09-24 | The Institute for Systems Biology | Multiparameter analysis for predictive medicine |
US7567870B1 (en) | 2000-07-31 | 2009-07-28 | Institute For Systems Biology | Multiparameter analysis for predictive medicine |
WO2002017207A2 (en) | 2000-08-23 | 2002-02-28 | Arexis Ab | System and method of storing genetic information |
US6812339B1 (en) | 2000-09-08 | 2004-11-02 | Applera Corporation | Polymorphisms in known genes associated with human disease, methods of detection and uses thereof |
US20020072492A1 (en) | 2000-09-12 | 2002-06-13 | Myers Timothy G. | Non-genetic based protein disease markers |
AU2001293297A1 (en) | 2000-09-20 | 2002-04-02 | Case Western Reserve University | Phyisological profiling |
US6740038B2 (en) | 2000-09-29 | 2004-05-25 | New Health Sciences, Inc. | Systems and methods for assessing vascular effects of a treatment |
US20020094532A1 (en) | 2000-10-06 | 2002-07-18 | Bader Joel S. | Efficient tests of association for quantitative traits and affected-unaffected studies using pooled DNA |
JP2004522216A (en) | 2000-10-12 | 2004-07-22 | アイコニックス ファーマシューティカルズ インコーポレイテッド | Cross-correlation between compound information and genome information |
WO2002033520A2 (en) | 2000-10-18 | 2002-04-25 | Genomic Health, Inc. | Genomic profile information systems and methods |
SG135048A1 (en) | 2000-10-18 | 2007-09-28 | Johnson & Johnson Consumer | Intelligent performance-based product recommendation system |
US6898595B2 (en) | 2000-10-19 | 2005-05-24 | General Electric Company | Searching and matching a set of query strings used for accessing information in a database directory |
US6904408B1 (en) | 2000-10-19 | 2005-06-07 | Mccarthy John | Bionet method, system and personalized web content manager responsive to browser viewers' psychological preferences, behavioral responses and physiological stress indicators |
WO2002037102A2 (en) | 2000-10-20 | 2002-05-10 | Children's Medical Center Corporation | Methods for analyzing dynamic changes in cellular informatics |
EP1340075B1 (en) | 2000-10-20 | 2009-01-28 | Virco Bvba | Establishment of biological cut-off values for predicting resistance to therapy |
EP1203562A3 (en) | 2000-10-27 | 2002-07-31 | Tanita Corporation | Method and apparatus for deriving body fat area |
US20050021240A1 (en) | 2000-11-02 | 2005-01-27 | Epigenomics Ag | Systems, methods and computer program products for guiding selection of a therapeutic treatment regimen based on the methylation status of the DNA |
US6450956B1 (en) | 2000-11-06 | 2002-09-17 | Siemens Corporate Research, Inc. | System and method for treatment and outcome measurement analysis |
US20030130798A1 (en) | 2000-11-14 | 2003-07-10 | The Institute For Systems Biology | Multiparameter integration methods for the analysis of biological networks |
US20030195706A1 (en) | 2000-11-20 | 2003-10-16 | Michael Korenberg | Method for classifying genetic data |
JP2004520820A (en) | 2000-12-01 | 2004-07-15 | ユニバーシティ オブ ノース カロライナ アット チャペル ヒル | A method for ultra-high resolution gene mapping and identification of genetic networks between genes under phenotypic traits |
AU2001297684A1 (en) | 2000-12-04 | 2002-08-19 | Genaissance Pharmaceuticals, Inc. | System and method for the management of genomic data |
US20030113727A1 (en) | 2000-12-06 | 2003-06-19 | Girn Kanwaljit Singh | Family history based genetic screening method and apparatus |
US7447754B2 (en) | 2000-12-06 | 2008-11-04 | Microsoft Corporation | Methods and systems for processing multi-media editing projects |
EP1342201A2 (en) | 2000-12-07 | 2003-09-10 | phase IT Intelligent Solutions AG | Expert system for classification and prediction of genetic diseases |
US7085834B2 (en) | 2000-12-22 | 2006-08-01 | Oracle International Corporation | Determining a user's groups |
US20020082868A1 (en) | 2000-12-27 | 2002-06-27 | Pories Walter J. | Systems, methods and computer program products for creating and maintaining electronic medical records |
US20020128860A1 (en) | 2001-01-04 | 2002-09-12 | Leveque Joseph A. | Collecting and managing clinical information |
US7054758B2 (en) | 2001-01-30 | 2006-05-30 | Sciona Limited | Computer-assisted means for assessing lifestyle risk factors |
US8898021B2 (en) | 2001-02-02 | 2014-11-25 | Mark W. Perlin | Method and system for DNA mixture analysis |
EP1364069B1 (en) | 2001-03-01 | 2009-04-22 | Epigenomics AG | Method for the development of gene panels for diagnostic and therapeutic purposes based on the expression and methylatoin status of the genes |
WO2002073504A1 (en) | 2001-03-14 | 2002-09-19 | Gene Logic, Inc. | A system and method for retrieving and using gene expression data from multiple sources |
CN1496412B (en) | 2001-03-14 | 2012-08-08 | 香港中文大学 | Method for estimating danger of diabetes typ B developed in the human species of Chinese bloodline and composition |
CA2377213A1 (en) | 2001-03-20 | 2002-09-20 | Ortho-Clinical Diagnostics, Inc. | Method for providing clinical diagnostic services |
US20030130991A1 (en) | 2001-03-28 | 2003-07-10 | Fidel Reijerse | Knowledge discovery from data sets |
US7957907B2 (en) | 2001-03-30 | 2011-06-07 | Sorenson Molecular Genealogy Foundation | Method for molecular genealogical research |
US20030143554A1 (en) * | 2001-03-31 | 2003-07-31 | Berres Mark E. | Method of genotyping by determination of allele copy number |
AU2002254564A1 (en) | 2001-04-10 | 2002-10-28 | Latanya Sweeney | Systems and methods for deidentifying entries in a data source |
AUPR454001A0 (en) | 2001-04-20 | 2001-05-24 | Careers Fast Track Pty Ltd | Interactive learning and career management system |
US20030014420A1 (en) | 2001-04-20 | 2003-01-16 | Jessee Charles B. | Method and system for data analysis |
WO2002091234A1 (en) | 2001-04-24 | 2002-11-14 | Takahiro Nakamura | Retrieval device for database of secondary information-attached text |
WO2002087431A1 (en) | 2001-05-01 | 2002-11-07 | Structural Bioinformatics, Inc. | Diagnosing inapparent diseases from common clinical tests using bayesian analysis |
US20020183965A1 (en) | 2001-05-02 | 2002-12-05 | Gogolak Victor V. | Method for analyzing drug adverse effects employing multivariate statistical analysis |
WO2002090541A1 (en) | 2001-05-03 | 2002-11-14 | Murdoch Childrens Research Institute | Determination of a genetic predisposition for behavioural disorders |
WO2002095650A2 (en) | 2001-05-21 | 2002-11-28 | Molecular Mining Corporation | Method for determination of co-occurences of attributes |
US20050228595A1 (en) | 2001-05-25 | 2005-10-13 | Cooke Laurence H | Processors for multi-dimensional sequence comparisons |
US20030101000A1 (en) | 2001-07-24 | 2003-05-29 | Bader Joel S. | Family based tests of association using pooled DNA and SNP markers |
US7461077B1 (en) | 2001-07-31 | 2008-12-02 | Nicholas Greenwood | Representation of data records |
US8438042B2 (en) | 2002-04-25 | 2013-05-07 | National Biomedical Research Foundation | Instruments and methods for obtaining informed consent to genetic tests |
US7062752B2 (en) | 2001-08-08 | 2006-06-13 | Hewlett-Packard Development Company, L.P. | Method, system and program product for multi-profile operations and expansive profile operation |
US20030040002A1 (en) | 2001-08-08 | 2003-02-27 | Ledley Fred David | Method for providing current assessments of genetic risk |
US7072794B2 (en) | 2001-08-28 | 2006-07-04 | Rockefeller University | Statistical methods for multivariate ordinal data which are used for data base driven decision support |
US7529685B2 (en) | 2001-08-28 | 2009-05-05 | Md Datacor, Inc. | System, method, and apparatus for storing, retrieving, and integrating clinical, diagnostic, genomic, and therapeutic data |
US7461006B2 (en) | 2001-08-29 | 2008-12-02 | Victor Gogolak | Method and system for the analysis and association of patient-specific and population-based genomic data with drug safety adverse event data |
EP1442411A4 (en) | 2001-09-30 | 2006-02-01 | Realcontacts Ltd | Connection service |
US20030129630A1 (en) | 2001-10-17 | 2003-07-10 | Equigene Research Inc. | Genetic markers associated with desirable and undesirable traits in horses, methods of identifying and using such markers |
US20030130873A1 (en) | 2001-11-19 | 2003-07-10 | Nevin William S. | Health care provider information system |
US6873914B2 (en) | 2001-11-21 | 2005-03-29 | Icoria, Inc. | Methods and systems for analyzing complex biological systems |
US6738762B1 (en) | 2001-11-26 | 2004-05-18 | At&T Corp. | Multidimensional substring selectivity estimation using set hashing of cross-counts |
US7107155B2 (en) | 2001-12-03 | 2006-09-12 | Dnaprint Genomics, Inc. | Methods for the identification of genetic features for complex genetics classifiers |
US20040009495A1 (en) | 2001-12-07 | 2004-01-15 | Whitehead Institute For Biomedical Research | Methods and products related to drug screening using gene expression patterns |
US20050256649A1 (en) | 2001-12-21 | 2005-11-17 | Roses Allen D | High throughput correlation of polymorphic forms with multiple phenotypes within clinical populations |
US20040015337A1 (en) | 2002-01-04 | 2004-01-22 | Thomas Austin W. | Systems and methods for predicting disease behavior |
US7117200B2 (en) | 2002-01-11 | 2006-10-03 | International Business Machines Corporation | Synthesizing information-bearing content from multiple channels |
AU2003212806A1 (en) | 2002-01-15 | 2003-07-30 | Vanderbilt University | Method and apparatus for multifactor dimensionality reduction |
JP2005516310A (en) | 2002-02-01 | 2005-06-02 | ロゼッタ インファーマティクス エルエルシー | Computer system and method for identifying genes and revealing pathways associated with traits |
US7809510B2 (en) | 2002-02-27 | 2010-10-05 | Ip Genesis, Inc. | Positional hashing method for performing DNA sequence similarity search |
JP2005520503A (en) | 2002-03-05 | 2005-07-14 | エムシーダブリユー リサーチ フオンデーシヨン インコーポレーテツド | Methods and compositions for pharmacological and toxicological evaluation of test agents |
US7324928B2 (en) | 2002-03-06 | 2008-01-29 | Kitchen Scott G | Method and system for determining phenotype from genotype |
US7783665B1 (en) | 2002-03-27 | 2010-08-24 | Parallels Holdings, Ltd. | Effective file-sharing among virtual environments |
US20080154566A1 (en) | 2006-10-02 | 2008-06-26 | Sorenson Molecular Genealogy Foundation | Method and system for displaying genetic and genealogical data |
US20040126840A1 (en) | 2002-12-23 | 2004-07-01 | Affymetrix, Inc. | Method, system and computer software for providing genomic ontological data |
US20030203370A1 (en) * | 2002-04-30 | 2003-10-30 | Zohar Yakhini | Method and system for partitioning sets of sequence groups with respect to a set of subsequence groups, useful for designing polymorphism-based typing assays |
WO2003093503A2 (en) | 2002-05-02 | 2003-11-13 | Novartis Ag | Method for bioequivalence determination using expression profiling |
US20040014097A1 (en) | 2002-05-06 | 2004-01-22 | Mcglennen Ronald C. | Genetic test apparatus and method |
US20040175700A1 (en) | 2002-05-15 | 2004-09-09 | Elixir Pharmaceuticals, Inc. | Method for cohort selection |
US7133856B2 (en) | 2002-05-17 | 2006-11-07 | The Board Of Trustees Of The Leland Stanford Junior University | Binary tree for complex supervised learning |
US20040229231A1 (en) | 2002-05-28 | 2004-11-18 | Frudakis Tony N. | Compositions and methods for inferring ancestry |
US20070037182A1 (en) * | 2002-05-28 | 2007-02-15 | Gaskin James Z | Multiplex assays for inferring ancestry |
US20030233377A1 (en) | 2002-06-18 | 2003-12-18 | Ilija Kovac | Methods, systems, software and apparatus for prediction of polygenic conditions |
SE523024C2 (en) | 2002-07-25 | 2004-03-23 | Nobel Biocare Ab | Device for inducing bone by bone inductive or bioactive agent and / or increasing the stability of jaw bone implants and implants therefor |
CA2492879A1 (en) | 2002-07-29 | 2004-02-05 | Opinionlab, Inc. | System and method for providing substantially real-time access to collected information concerning user interaction with a web page of a website |
US7478121B1 (en) | 2002-07-31 | 2009-01-13 | Opinionlab, Inc. | Receiving and reporting page-specific user feedback concerning one or more particular web pages of a website |
US20040024534A1 (en) | 2002-08-02 | 2004-02-05 | Taimont Biotech Inc. | Process of creating an index for diagnosis or prognosis purpose |
WO2004013727A2 (en) | 2002-08-02 | 2004-02-12 | Rosetta Inpharmatics Llc | Computer systems and methods that use clinical and expression quantitative trait loci to associate genes with traits |
EP1534122B1 (en) | 2002-08-15 | 2016-07-20 | Pacific Edge Limited | Medical decision support systems utilizing gene expression and clinical information and method for use |
US20050152905A1 (en) | 2002-08-22 | 2005-07-14 | Omoigui Osemwota S. | Method of biochemical treatment of persistent pain |
US20030065241A1 (en) | 2002-08-27 | 2003-04-03 | Joerg Hohnloser | Medical risk assessment system and method |
EP1547009A1 (en) | 2002-09-20 | 2005-06-29 | Board Of Regents The University Of Texas System | Computer program products, systems and methods for information discovery and relational analyses |
AU2003279023A1 (en) * | 2002-09-26 | 2004-04-19 | Applera Corporation | Mitochondrial dna autoscoring system |
AU2003282907A1 (en) | 2002-10-01 | 2004-04-23 | Fred Hutchinson Cancer Research Center | Methods for estimating haplotype frequencies and disease associations with haplotypes and environmental variables |
WO2004036461A2 (en) | 2002-10-14 | 2004-04-29 | Battelle Memorial Institute | Information reservoir |
WO2004036182A2 (en) | 2002-10-17 | 2004-04-29 | Control Delivery Systems, Inc. | Methods for monitoring treatment of disease |
WO2004038376A2 (en) | 2002-10-24 | 2004-05-06 | Duke University | Binary prediction tree modeling with many predictors and its uses in clinical and genomic applications |
US20090012928A1 (en) | 2002-11-06 | 2009-01-08 | Lussier Yves A | System And Method For Generating An Amalgamated Database |
WO2004044225A2 (en) | 2002-11-11 | 2004-05-27 | Affymetrix, Inc. | Methods for identifying dna copy number changes |
US20040093334A1 (en) | 2002-11-13 | 2004-05-13 | Stephen Scherer | Profile management system |
WO2004046892A2 (en) | 2002-11-20 | 2004-06-03 | Aventis Pharmaceuticals Inc. | Method and system for marketing a treatment regimen |
CN101249090A (en) | 2002-11-22 | 2008-08-27 | 约翰斯·霍普金斯大学 | Target for theraphy of cognitive impairment |
AU2003293132A1 (en) | 2002-11-27 | 2004-06-23 | Sra International, Inc. | Integration of gene expression data and non-gene data |
US7698155B1 (en) | 2002-11-29 | 2010-04-13 | Ingenix, Inc. | System for determining a disease category probability for a healthcare plan member |
US20060063156A1 (en) | 2002-12-06 | 2006-03-23 | Willman Cheryl L | Outcome prediction and risk classification in childhood leukemia |
US20040122705A1 (en) | 2002-12-18 | 2004-06-24 | Sabol John M. | Multilevel integrated medical knowledge base system and method |
US7917468B2 (en) | 2005-08-01 | 2011-03-29 | Seven Networks, Inc. | Linking of personal information management data |
US20040146870A1 (en) | 2003-01-27 | 2004-07-29 | Guochun Liao | Systems and methods for predicting specific genetic loci that affect phenotypic traits |
US7230529B2 (en) | 2003-02-07 | 2007-06-12 | Theradoc, Inc. | System, method, and computer program for interfacing an expert system to a clinical information system |
US20040172313A1 (en) | 2003-02-11 | 2004-09-02 | Stein Robert Gary | System and method for processing health care insurance claims |
AU2004214480A1 (en) | 2003-02-14 | 2004-09-02 | Intergenetics Incorporated | Statistically identifying an increased risk for disease |
US20040172287A1 (en) | 2003-02-19 | 2004-09-02 | O'toole Michael | Method and apparatus for obtaining and distributing healthcare information |
US20060257888A1 (en) * | 2003-02-27 | 2006-11-16 | Methexis Genomics, N.V. | Genetic diagnosis using multiple sequence variant analysis |
US7584058B2 (en) | 2003-02-27 | 2009-09-01 | Methexis Genomics N.V. | Genetic diagnosis using multiple sequence variant analysis |
US20040177071A1 (en) | 2003-03-04 | 2004-09-09 | Massey Bill Wayne | System and method for outcome-based management of medical science liaisons |
US9342657B2 (en) | 2003-03-24 | 2016-05-17 | Nien-Chih Wei | Methods for predicting an individual's clinical treatment outcome from sampling a group of patient's biological profiles |
EP1613734A4 (en) | 2003-04-04 | 2007-04-18 | Agilent Technologies Inc | Visualizing expression data on chromosomal graphic schemes |
EP1615993A4 (en) | 2003-04-09 | 2012-01-04 | Omicia Inc | Methods of selection, reporting and analysis of genetic markers using broad based genetic profiling applications |
US20040243545A1 (en) | 2003-05-29 | 2004-12-02 | Dictaphone Corporation | Systems and methods utilizing natural language medical records |
WO2004097577A2 (en) | 2003-04-24 | 2004-11-11 | New York University | Methods, software arrangements, storage media, and systems for providing a shrinkage-based similarity metric |
US20040229224A1 (en) | 2003-05-13 | 2004-11-18 | Perlegen Sciences, Inc. | Allele-specific expression patterns |
US20040235922A1 (en) | 2003-05-15 | 2004-11-25 | Baile Clifton A. | Compositions and methods for inducing adipose tissue cell death |
US20040243443A1 (en) | 2003-05-29 | 2004-12-02 | Sanyo Electric Co., Ltd. | Healthcare support apparatus, health care support system, health care support method and health care support program |
US20040242454A1 (en) | 2003-06-02 | 2004-12-02 | Gallant Stephen I. | System and method for micro-dose, multiple drug therapy |
US7617202B2 (en) | 2003-06-16 | 2009-11-10 | Microsoft Corporation | Systems and methods that employ a distributional analysis on a query log to improve search results |
US7069308B2 (en) | 2003-06-16 | 2006-06-27 | Friendster, Inc. | System, method and apparatus for connecting users in an online computer system based on their relationships within social networks |
US7972779B2 (en) | 2003-07-11 | 2011-07-05 | Wisconsin Alumni Research Foundation | Method for assessing predisposition to depression |
US20050027560A1 (en) | 2003-07-28 | 2005-02-03 | Deborah Cook | Interactive multi-user medication and medical history management method |
US20050026119A1 (en) | 2003-08-01 | 2005-02-03 | Ellis Janet W. | Career development framework |
US8200775B2 (en) | 2005-02-01 | 2012-06-12 | Newsilike Media Group, Inc | Enhanced syndication |
US20050032066A1 (en) | 2003-08-04 | 2005-02-10 | Heng Chew Kiat | Method for assessing risk of diseases with multiple contributing factors |
US20050055365A1 (en) | 2003-09-09 | 2005-03-10 | I.V. Ramakrishnan | Scalable data extraction techniques for transforming electronic documents into queriable archives |
US20050176057A1 (en) | 2003-09-26 | 2005-08-11 | Troy Bremer | Diagnostic markers of mood disorders and methods of use thereof |
WO2005033895A2 (en) | 2003-10-03 | 2005-04-14 | Cira Discovery Sciences, Inc. | Method and apparatus for discovering patterns in binary or categorical data |
US20050112684A1 (en) | 2003-11-21 | 2005-05-26 | Eric Holzle | Class I and Class II MHC Profiling for Social and Sexual Matching of Human Partners |
US20050120019A1 (en) | 2003-11-29 | 2005-06-02 | International Business Machines Corporation | Method and apparatus for the automatic identification of unsolicited e-mail messages (SPAM) |
EP3269826B1 (en) | 2003-12-01 | 2020-03-11 | Epigenomics AG | Methods and nucleic acids for the analysis of gene expression associated with the development of prostate cell proliferative disorders |
US20050147947A1 (en) | 2003-12-29 | 2005-07-07 | Myfamily.Com, Inc. | Genealogical investigation and documentation systems and methods |
US20050154627A1 (en) | 2003-12-31 | 2005-07-14 | Bojan Zuzek | Transactional data collection, compression, and processing information management system |
US8554876B2 (en) | 2004-01-23 | 2013-10-08 | Hewlett-Packard Development Company, L.P. | User profile service |
US20050170321A1 (en) | 2004-01-30 | 2005-08-04 | Scully Helen M. | Method and system for career assessment |
US20050191678A1 (en) | 2004-02-12 | 2005-09-01 | Geneob Usa Inc. | Genetic predictability for acquiring a disease or condition |
US7127355B2 (en) | 2004-03-05 | 2006-10-24 | Perlegen Sciences, Inc. | Methods for genetic analysis |
EP1730308A4 (en) | 2004-03-05 | 2008-10-08 | Rosetta Inpharmatics Llc | Classification of breast cancer patients using a combination of clinical criteria and informative genesets |
JP2005251115A (en) | 2004-03-08 | 2005-09-15 | Shogakukan Inc | System and method of associative retrieval |
JP4437050B2 (en) | 2004-03-26 | 2010-03-24 | 株式会社日立製作所 | Diagnosis support system, diagnosis support method, and diagnosis support service providing method |
US20080195326A1 (en) | 2004-05-03 | 2008-08-14 | Martin Munzer | Method And System For Comprehensive Knowledge-Based Anonymous Testing And Reporting, And Providing Selective Access To Test Results And Report |
US20080195594A1 (en) | 2004-05-11 | 2008-08-14 | Gerjets Sven W | Computerized Comprehensive Health Assessment and Physician Directed Systems |
US20060218111A1 (en) | 2004-05-13 | 2006-09-28 | Cohen Hunter C | Filtered search results |
US20050260610A1 (en) | 2004-05-20 | 2005-11-24 | Kurtz Richard E | Method for diagnosing and prescribing a regimen of therapy for human health risk |
WO2005123955A2 (en) | 2004-06-09 | 2005-12-29 | Children's Medical Center Corporation | Methods and compositions for modifying gene regulation and dna damage in ageing |
US7599802B2 (en) | 2004-06-10 | 2009-10-06 | Evan Harwood | V-life matching and mating system |
US8335652B2 (en) | 2004-06-23 | 2012-12-18 | Yougene Corp. | Self-improving identification method |
US7223234B2 (en) | 2004-07-10 | 2007-05-29 | Monitrix, Inc. | Apparatus for determining association variables |
US20060025929A1 (en) | 2004-07-30 | 2006-02-02 | Chris Eglington | Method of determining a genetic relationship to at least one individual in a group of famous individuals using a combination of genetic markers |
US8024128B2 (en) | 2004-09-07 | 2011-09-20 | Gene Security Network, Inc. | System and method for improving clinical decisions by aggregating, validating and analysing genetic and phenotypic data |
US20060059159A1 (en) | 2004-09-15 | 2006-03-16 | Vu Hao Thi Truong | Online dating service providing response status tracking for a service subscriber |
US20090089079A1 (en) | 2004-11-09 | 2009-04-02 | The Brigham And Women's Hospital, Inc. | System and method for determining whether to issue an alert to consider prophylaxis for a risk condition |
US20060129435A1 (en) | 2004-12-15 | 2006-06-15 | Critical Connection Inc. | System and method for providing community health data services |
US20060136143A1 (en) | 2004-12-17 | 2006-06-22 | General Electric Company | Personalized genetic-based analysis of medical conditions |
US20060185027A1 (en) | 2004-12-23 | 2006-08-17 | David Bartel | Systems and methods for identifying miRNA targets and for altering miRNA and target expression |
US20060195335A1 (en) | 2005-01-21 | 2006-08-31 | Christian Lana S | System and method for career development |
US20080040151A1 (en) | 2005-02-01 | 2008-02-14 | Moore James F | Uses of managed health care data |
US20070106754A1 (en) | 2005-09-10 | 2007-05-10 | Moore James F | Security facility for maintaining health care data pools |
WO2006084195A2 (en) | 2005-02-03 | 2006-08-10 | The Government Of The United States Of America As Represented By The Secretary Of The Department Of Health And Human Services, Centers For Disease Control And Prevention | Personal assessment including familial risk analysis for personalized disease prevention plan |
US7951078B2 (en) | 2005-02-03 | 2011-05-31 | Maren Theresa Scheuner | Method and apparatus for determining familial risk of disease |
JP2008532496A (en) | 2005-02-18 | 2008-08-21 | ディーエヌエー プリント ジェノミクス インコーポレーティッド | Multiplex assay for inferring ancestry |
US20070061424A1 (en) | 2005-03-09 | 2007-03-15 | Wholived, Inc. | System and method for providing a database of past life information using a virtual cemetery, virtual tomb and virtual safe organizational paradigm |
CA2603550A1 (en) | 2005-03-31 | 2006-10-05 | The Board Of Trustees Of The Leland Stanford Junior University | Compositions and methods for diagnosing and treating neuropsychiatric disorders |
US7657521B2 (en) | 2005-04-15 | 2010-02-02 | General Electric Company | System and method for parsing medical data |
US7917374B2 (en) | 2005-04-25 | 2011-03-29 | Ingenix, Inc. | System and method for early identification of safety concerns of new drugs |
US20070011173A1 (en) | 2005-05-23 | 2007-01-11 | Ebags.Com | Method and apparatus for providing shoe recommendations |
US20060287876A1 (en) | 2005-06-20 | 2006-12-21 | Davor Jedlicka | Computer system and method for assessing family structures using affinographs |
US20070166728A1 (en) | 2005-07-22 | 2007-07-19 | Alphagenics, Inc. | Genetic profile imaging and data-sharing device and methodology for socially relevant traits |
US20070027636A1 (en) | 2005-07-29 | 2007-02-01 | Matthew Rabinowitz | System and method for using genetic, phentoypic and clinical data to make predictions for clinical or lifestyle decisions |
US20070027850A1 (en) | 2005-08-01 | 2007-02-01 | Reprise Media, Llc | Methods and systems for developing and managing a computer-based marketing campaign |
US20070050354A1 (en) | 2005-08-18 | 2007-03-01 | Outland Research | Method and system for matching socially and epidemiologically compatible mates |
US20070061166A1 (en) | 2005-08-29 | 2007-03-15 | Narayanan Ramasubramanian | Techniques for improving loss ratios |
US8566121B2 (en) | 2005-08-29 | 2013-10-22 | Narayanan Ramasubramanian | Personalized medical adherence management system |
US20070122824A1 (en) | 2005-09-09 | 2007-05-31 | Tucker Mark R | Method and Kit for Assessing a Patient's Genetic Information, Lifestyle and Environment Conditions, and Providing a Tailored Therapeutic Regime |
US8364521B2 (en) | 2005-09-14 | 2013-01-29 | Jumptap, Inc. | Rendering targeted advertisement on mobile communication facilities |
US20070198485A1 (en) | 2005-09-14 | 2007-08-23 | Jorey Ramer | Mobile search service discovery |
US20080009268A1 (en) | 2005-09-14 | 2008-01-10 | Jorey Ramer | Authorized mobile content search results |
US7592910B2 (en) | 2005-09-28 | 2009-09-22 | Social Fabric Corporation | Matching system |
CA2624705A1 (en) | 2005-10-03 | 2007-04-12 | Health Dialog Services Corporation | Systems and methods for analysis of healthcare provider performance |
JP2007102709A (en) | 2005-10-07 | 2007-04-19 | Toshiba Corp | Gene diagnostic marker selection program, device and system executing this program, and gene diagnostic system |
US7752215B2 (en) | 2005-10-07 | 2010-07-06 | International Business Machines Corporation | System and method for protecting sensitive data |
US20080015968A1 (en) | 2005-10-14 | 2008-01-17 | Leviathan Entertainment, Llc | Fee-Based Priority Queuing for Insurance Claim Processing |
US8234129B2 (en) | 2005-10-18 | 2012-07-31 | Wellstat Vaccines, Llc | Systems and methods for obtaining, storing, processing and utilizing immunologic and other information of individuals and populations |
CN102260742A (en) | 2005-10-21 | 2011-11-30 | 基因信息股份有限公司 | Method and apparatus for correlating levels of biomarker products with disease |
WO2007054816A2 (en) | 2005-11-14 | 2007-05-18 | Bioren, Inc. | Antibody ultrahumanization by predicted mature cdr blasting and cohort library generation and screening |
US20070111247A1 (en) | 2005-11-17 | 2007-05-17 | Stephens Joel C | Systems and methods for the biometric analysis of index founder populations |
US20090029371A1 (en) | 2005-12-05 | 2009-01-29 | Ihc Intellectual Asset Management, Llc | Method for determining vasoreactivity |
US20090132284A1 (en) | 2005-12-16 | 2009-05-21 | Fey Christopher T | Customizable Prevention Plan Platform, Expert System and Method |
US20070156691A1 (en) | 2006-01-05 | 2007-07-05 | Microsoft Corporation | Management of user access to objects |
US20070178500A1 (en) | 2006-01-18 | 2007-08-02 | Martin Lucas | Methods of determining relative genetic likelihoods of an individual matching a population |
WO2007087314A2 (en) | 2006-01-23 | 2007-08-02 | Zeavision Llc | Macular pigment diagnostic system |
US20070185658A1 (en) | 2006-02-06 | 2007-08-09 | Paris Steven M | Determining probabilities of inherited and correlated traits |
US7818281B2 (en) | 2006-02-14 | 2010-10-19 | Affymetrix, Inc. | Computer software for visualizing recombination events in a group of individuals from recombination breakpoints and assignments in high density SNP genotyping data by generating a color-coded view for each individual chromosome and a whole genome view for the group |
US7788358B2 (en) | 2006-03-06 | 2010-08-31 | Aggregate Knowledge | Using cross-site relationships to generate recommendations |
US8572067B2 (en) | 2006-03-14 | 2013-10-29 | International Business Machines Corporation | Method to estimate the number of distinct value combinations for a set of attributes in a database system |
US8738467B2 (en) | 2006-03-16 | 2014-05-27 | Microsoft Corporation | Cluster-based scalable collaborative filtering |
WO2007109571A2 (en) | 2006-03-17 | 2007-09-27 | Prometheus Laboratories, Inc. | Methods of predicting and monitoring tyrosine kinase inhibitor therapy |
US20070238936A1 (en) | 2006-04-10 | 2007-10-11 | Shirley Ann Becker | Portable Electronic Medical Assistant |
US8626764B2 (en) | 2006-04-13 | 2014-01-07 | International Business Machines Corporation | Methods, systems and computer program products for organizing and/or manipulating cohort based information |
JP5028847B2 (en) | 2006-04-21 | 2012-09-19 | 富士通株式会社 | Gene interaction network analysis support program, recording medium recording the program, gene interaction network analysis support method, and gene interaction network analysis support device |
WO2007133586A2 (en) | 2006-05-08 | 2007-11-22 | Tethys Bioscience, Inc. | Systems and methods for developing diagnostic tests based on biomarker information from legacy clinical sample sets |
US8364711B2 (en) | 2006-05-09 | 2013-01-29 | John Wilkins | Contact management system and method |
US7664718B2 (en) | 2006-05-16 | 2010-02-16 | Sony Corporation | Method and system for seed based clustering of categorical data using hierarchies |
US20070294113A1 (en) | 2006-06-14 | 2007-12-20 | General Electric Company | Method for evaluating correlations between structured and normalized information on genetic variations between humans and their personal clinical patient data from electronic medical patient records |
US20070299881A1 (en) | 2006-06-21 | 2007-12-27 | Shimon Bouganim | System and method for protecting selected fields in database files |
US8888697B2 (en) | 2006-07-24 | 2014-11-18 | Webmd, Llc | Method and system for enabling lay users to obtain relevant, personalized health related information |
US8271201B2 (en) * | 2006-08-11 | 2012-09-18 | University Of Tennesee Research Foundation | Methods of associating an unknown biological specimen with a family |
US8121915B1 (en) | 2006-08-16 | 2012-02-21 | Resource Consortium Limited | Generating financial plans using a personal information aggregator |
US7984421B2 (en) | 2006-10-03 | 2011-07-19 | Ning, Inc. | Web application cloning |
WO2008052344A1 (en) * | 2006-11-01 | 2008-05-08 | 0752004 B.C. Ltd. | Method and system for genetic research using genetic sampling via an interactive online network |
US8990198B2 (en) | 2006-11-02 | 2015-03-24 | Ilan Cohn | Method and system for computerized management of related data records |
US8606591B2 (en) | 2006-11-10 | 2013-12-10 | The Charlotte-Mecklenburg Hospital Authority | Systems, methods, and computer program products for determining an optimum hernia repair procedure |
US20080114737A1 (en) | 2006-11-14 | 2008-05-15 | Daniel Neely | Method and system for automatically identifying users to participate in an electronic conversation |
KR20090105921A (en) | 2006-11-30 | 2009-10-07 | 네이비제닉스 인크. | Genetic analysis systems and methods |
US20080131887A1 (en) | 2006-11-30 | 2008-06-05 | Stephan Dietrich A | Genetic Analysis Systems and Methods |
US7739247B2 (en) | 2006-12-28 | 2010-06-15 | Ebay Inc. | Multi-pass data organization and automatic naming |
US7844604B2 (en) | 2006-12-28 | 2010-11-30 | Yahoo! Inc. | Automatically generating user-customized notifications of changes in a social network system |
US20080228699A1 (en) | 2007-03-16 | 2008-09-18 | Expanse Networks, Inc. | Creation of Attribute Combination Databases |
US7908288B2 (en) | 2007-04-12 | 2011-03-15 | Satheesh Nair | Method and system for research using computer based simultaneous comparison and contrasting of a multiplicity of subjects having specific attributes within specific contexts |
US7816083B2 (en) | 2007-05-03 | 2010-10-19 | Celera Corporation | Genetic polymorphisms associated with neurodegenerative diseases, methods of detection and uses thereof |
US20090186347A1 (en) | 2007-05-11 | 2009-07-23 | Cox David R | Markers for metabolic syndrome |
US20080300958A1 (en) | 2007-05-29 | 2008-12-04 | Tasteindex.Com Llc | Taste network content targeting |
AU2008263644A1 (en) | 2007-06-15 | 2008-12-18 | Isis Innovation Limited | Allelic determination |
US7818396B2 (en) | 2007-06-21 | 2010-10-19 | Microsoft Corporation | Aggregating and searching profile data from multiple services |
US20090094271A1 (en) | 2007-06-26 | 2009-04-09 | Allurdata Llc | Variable driven method and system for the management and display of information |
US7720855B2 (en) | 2007-07-02 | 2010-05-18 | Brown Stephen J | Social network for affecting personal behavior |
WO2009010948A1 (en) | 2007-07-18 | 2009-01-22 | Famillion Ltd. | Method and system for use of a database of personal data records |
US20090043752A1 (en) | 2007-08-08 | 2009-02-12 | Expanse Networks, Inc. | Predicting Side Effect Attributes |
US20090068114A1 (en) | 2007-09-07 | 2009-03-12 | Yousef Haik | Noninvasive Thermometry Monitoring System |
US8010896B2 (en) | 2007-09-13 | 2011-08-30 | International Business Machines Corporation | Using profiling when a shared document is changed in a content management system |
WO2009042975A1 (en) | 2007-09-26 | 2009-04-02 | Navigenics, Inc. | Methods and systems for genomic analysis using ancestral data |
US9336177B2 (en) | 2007-10-15 | 2016-05-10 | 23Andme, Inc. | Genome sharing |
US8589437B1 (en) | 2007-10-15 | 2013-11-19 | 23Andme, Inc. | De-identification and sharing of genetic data |
US10275569B2 (en) | 2007-10-15 | 2019-04-30 | 22andMe, Inc. | Family inheritance |
US8510057B1 (en) | 2007-10-15 | 2013-08-13 | 23Andme, Inc. | Summarizing an aggregate contribution to a characteristic for an individual |
US7877398B2 (en) | 2007-11-19 | 2011-01-25 | International Business Machines Corporation | Masking related sensitive data in groups |
US20110004628A1 (en) | 2008-02-22 | 2011-01-06 | Armstrong John M | Automated ontology generation system and method |
US20090222517A1 (en) | 2008-02-29 | 2009-09-03 | Dimitris Kalofonos | Methods, systems, and apparatus for using virtual devices with peer-to-peer groups |
US20110087693A1 (en) | 2008-02-29 | 2011-04-14 | John Boyce | Methods and Systems for Social Networking Based on Nucleic Acid Sequences |
US20090299645A1 (en) | 2008-03-19 | 2009-12-03 | Brandon Colby | Genetic analysis |
US20170330358A1 (en) | 2008-03-19 | 2017-11-16 | 23Andme, Inc. | Ancestry painting |
US20100041958A1 (en) | 2008-04-24 | 2010-02-18 | Searete Llc | Computational system and method for memory modification |
US20090271375A1 (en) | 2008-04-24 | 2009-10-29 | Searete Llc, A Limited Liability Corporation Of The State Of Delaware | Combination treatment selection methods and systems |
US9311369B2 (en) | 2008-04-28 | 2016-04-12 | Oracle International Corporation | Virtual masked database |
CA2727795A1 (en) | 2008-06-13 | 2009-12-17 | Prognomix, Inc. | Genetic component of complications in type 2 diabetes |
US9477941B2 (en) * | 2008-06-24 | 2016-10-25 | Intelius, Inc. | Genealogy system for interfacing with social networks |
US20090326832A1 (en) | 2008-06-27 | 2009-12-31 | Microsoft Corporation | Graphical models for the analysis of genome-wide associations |
US8191571B2 (en) | 2008-07-30 | 2012-06-05 | Hamilton Sundstrand Corporation | Fluid circuit breaker quick disconnect coupling |
BRPI0917089A2 (en) | 2008-08-08 | 2015-12-15 | Navigenics Inc | custom action plan methods and systems |
WO2010024894A1 (en) | 2008-08-26 | 2010-03-04 | 23Andme, Inc. | Processing data from genotyping chips |
US9218451B2 (en) | 2008-08-26 | 2015-12-22 | 23Andme, Inc. | Processing data from genotyping chips |
US7917438B2 (en) | 2008-09-10 | 2011-03-29 | Expanse Networks, Inc. | System for secure mobile healthcare selection |
US20100063865A1 (en) | 2008-09-10 | 2010-03-11 | Expanse Networks, Inc. | Masked Data Provider Profiling |
US20100063835A1 (en) | 2008-09-10 | 2010-03-11 | Expanse Networks, Inc. | Method for Secure Mobile Healthcare Selection |
US20100076950A1 (en) | 2008-09-10 | 2010-03-25 | Expanse Networks, Inc. | Masked Data Service Selection |
US20100063830A1 (en) | 2008-09-10 | 2010-03-11 | Expanse Networks, Inc. | Masked Data Provider Selection |
US8200509B2 (en) | 2008-09-10 | 2012-06-12 | Expanse Networks, Inc. | Masked data record access |
US20100070292A1 (en) | 2008-09-10 | 2010-03-18 | Expanse Networks, Inc. | Masked Data Transaction Database |
US20100076988A1 (en) | 2008-09-10 | 2010-03-25 | Expanse Networks, Inc. | Masked Data Service Profiling |
EP2335174A1 (en) | 2008-09-12 | 2011-06-22 | Navigenics INC. | Methods and systems for incorporating multiple environmental and genetic risk factors |
WO2010042888A1 (en) | 2008-10-10 | 2010-04-15 | The Regents Of The University Of California | A computational method for comparing, classifying, indexing, and cataloging of electronically stored linear information |
WO2010065139A1 (en) | 2008-12-05 | 2010-06-10 | 23Andme, Inc. | Gamete donor selection based on genetic calculations |
US20100169313A1 (en) | 2008-12-30 | 2010-07-01 | Expanse Networks, Inc. | Pangenetic Web Item Feedback System |
US8108406B2 (en) | 2008-12-30 | 2012-01-31 | Expanse Networks, Inc. | Pangenetic web user behavior prediction system |
US20100169262A1 (en) | 2008-12-30 | 2010-07-01 | Expanse Networks, Inc. | Mobile Device for Pangenetic Web |
US8255403B2 (en) | 2008-12-30 | 2012-08-28 | Expanse Networks, Inc. | Pangenetic web satisfaction prediction system |
US20100169338A1 (en) | 2008-12-30 | 2010-07-01 | Expanse Networks, Inc. | Pangenetic Web Search System |
US8386519B2 (en) | 2008-12-30 | 2013-02-26 | Expanse Networks, Inc. | Pangenetic web item recommendation system |
US8655821B2 (en) | 2009-02-04 | 2014-02-18 | Konstantinos (Constantin) F. Aliferis | Local causal and Markov blanket induction method for causal discovery and feature selection from data |
CN102712949B (en) | 2009-06-01 | 2015-12-16 | 遗传技术有限公司 | For the method for breast cancer risk assessment |
CA2776588A1 (en) | 2009-10-08 | 2011-04-14 | The Children's Hospital Of Philadelphia | Compositions and methods for diagnosing genome related diseases and disorders |
WO2011050341A1 (en) | 2009-10-22 | 2011-04-28 | National Center For Genome Resources | Methods and systems for medical sequencing analysis |
CA2782207A1 (en) | 2009-11-30 | 2011-06-03 | 23Andme, Inc. | Polymorphisms associated with parkinson's disease |
US20120053845A1 (en) | 2010-04-27 | 2012-03-01 | Jeremy Bruestle | Method and system for analysis and error correction of biological sequences and inference of relationship for multiple samples |
DK2601609T3 (en) | 2010-08-02 | 2017-06-06 | Population Bio Inc | COMPOSITIONS AND METHODS FOR DISCOVERING MUTATIONS CAUSING GENETIC DISORDERS |
US20120035954A1 (en) | 2010-08-05 | 2012-02-09 | International Business Machines Corporation | On-demand clinical trials utilizing emr/ehr systems |
US8786603B2 (en) | 2011-02-25 | 2014-07-22 | Ancestry.Com Operations Inc. | Ancestor-to-ancestor relationship linking methods and systems |
WO2012155148A2 (en) | 2011-05-12 | 2012-11-15 | University Of Utah Research Foundation | Predicting gene variant pathogenicity |
US9928338B2 (en) | 2011-06-01 | 2018-03-27 | The Board Of Trustees Of The Leland Stanford Junior University | Method and system for phasing individual genomes in the context of clinical interpretation |
US10790041B2 (en) | 2011-08-17 | 2020-09-29 | 23Andme, Inc. | Method for analyzing and displaying genetic information between family members |
US9367663B2 (en) | 2011-10-06 | 2016-06-14 | Sequenom, Inc. | Methods and processes for non-invasive assessment of genetic variations |
US9984198B2 (en) | 2011-10-06 | 2018-05-29 | Sequenom, Inc. | Reducing sequence read count error in assessment of complex genetic variations |
US8990250B1 (en) | 2011-10-11 | 2015-03-24 | 23Andme, Inc. | Cohort selection with privacy protection |
US10437858B2 (en) | 2011-11-23 | 2019-10-08 | 23Andme, Inc. | Database and data processing system for use with a network-based personal genetics services platform |
EP2820129A1 (en) | 2012-03-02 | 2015-01-07 | Sequenom, Inc. | Methods and processes for non-invasive assessment of genetic variations |
US10777302B2 (en) | 2012-06-04 | 2020-09-15 | 23Andme, Inc. | Identifying variants of interest by imputation |
US10025877B2 (en) | 2012-06-06 | 2018-07-17 | 23Andme, Inc. | Determining family connections of individuals in a database |
US9116882B1 (en) * | 2012-08-02 | 2015-08-25 | 23Andme, Inc. | Identification of matrilineal or patrilineal relatives |
AU2013312355A1 (en) | 2012-09-06 | 2014-09-18 | Ancestry.Com Dna, Llc | Using haplotypes to infer ancestral origins for recently admixed individuals |
US10114922B2 (en) | 2012-09-17 | 2018-10-30 | Ancestry.Com Dna, Llc | Identifying ancestral relationships using a continuous stream of input |
US9213947B1 (en) | 2012-11-08 | 2015-12-15 | 23Andme, Inc. | Scalable pipeline for local ancestry inference |
US9836576B1 (en) | 2012-11-08 | 2017-12-05 | 23Andme, Inc. | Phasing of unphased genotype data |
WO2014110350A2 (en) | 2013-01-11 | 2014-07-17 | Oslo Universitetssykehus Hf | Systems and methods for identifying polymorphisms |
WO2014145280A1 (en) | 2013-03-15 | 2014-09-18 | Ancestry.Com Dna, Llc | Family networks |
US20150106115A1 (en) | 2013-10-10 | 2015-04-16 | International Business Machines Corporation | Densification of longitudinal emr for improved phenotyping |
US20150288780A1 (en) | 2014-04-05 | 2015-10-08 | Antoine El Daher | Profile Evaluation System For Online Dating And Social Networking Websites |
WO2015171457A1 (en) | 2014-05-03 | 2015-11-12 | The Regents Of The University Of California | Methods of identifying biomarkers associated with or causative of the progression of disease, in particular for use in prognosticating primary open angle glaucoma |
EP3198023B1 (en) | 2014-09-26 | 2020-04-22 | Somalogic, Inc. | Cardiovascular risk event prediction and uses thereof |
US10720229B2 (en) | 2014-10-14 | 2020-07-21 | Ancestry.Com Dna, Llc | Reducing error in predicted genetic relationships |
EP3207482B1 (en) | 2014-10-17 | 2023-04-05 | Ancestry.com DNA, LLC | Haplotype phasing models |
US20170329902A1 (en) | 2014-10-29 | 2017-11-16 | 23Andme, Inc. | Estimation of admixture generation |
US20170329899A1 (en) | 2014-10-29 | 2017-11-16 | 23Andme, Inc. | Display of estimated parental contribution to ancestry |
US10867705B2 (en) | 2014-11-06 | 2020-12-15 | Ancestryhealth.Com, Llc | Predicting health outcomes |
NZ737553A (en) | 2015-05-30 | 2017-11-24 | ||
US10957422B2 (en) | 2015-07-07 | 2021-03-23 | Ancestry.Com Dna, Llc | Genetic and genealogical analysis for identification of birth location and surname information |
US10558930B2 (en) | 2015-07-13 | 2020-02-11 | Ancestry.Com Dna, Llc | Local genetic ethnicity determination system |
US20170329915A1 (en) | 2015-08-27 | 2017-11-16 | 23Andme, Inc. | Systems and methods for generating a modular web page template to display personal genetic and physiological condition information |
CN110312825A (en) * | 2016-10-24 | 2019-10-08 | 吉内恩福赛克公司 | Hide existing information in nucleic acid |
CA3062858A1 (en) | 2017-05-12 | 2018-11-15 | The Regents Of The University Of Michigan | Individual and cohort pharmacological phenotype prediction platform |
US10296842B2 (en) | 2017-07-21 | 2019-05-21 | Helix OpCo, LLC | Genomic services system with dual-phase genotype imputation |
US10468141B1 (en) | 2018-11-28 | 2019-11-05 | Asia Genomics Pte. Ltd. | Ancestry-specific genetic risk scores |
WO2021016114A1 (en) | 2019-07-19 | 2021-01-28 | 23Andme, Inc. | Phase-aware determination of identity-by-descent dna segments |
CA3154157A1 (en) | 2019-09-13 | 2021-03-18 | 23Andme, Inc. | Methods and systems for determining and displaying pedigrees |
US11171853B2 (en) | 2020-01-30 | 2021-11-09 | Ciena Corporation | Constraint-based event-driven telemetry |
EP4158638A4 (en) | 2020-05-27 | 2023-11-29 | 23Andme, Inc. | Machine learning platform for generating risk models |
US20220044761A1 (en) | 2020-05-27 | 2022-02-10 | 23Andme, Inc. | Machine learning platform for generating risk models |
US11817176B2 (en) | 2020-08-13 | 2023-11-14 | 23Andme, Inc. | Ancestry composition determination |
EP4200858A4 (en) | 2020-10-09 | 2024-08-28 | 23Andme Inc | Formatting and storage of genetic markers |
WO2022087478A1 (en) | 2020-10-23 | 2022-04-28 | 23Andme, Inc. | Machine learning platform for generating risk models |
-
2009
- 2009-12-22 EP EP09836517.4A patent/EP2370929A4/en not_active Withdrawn
- 2009-12-22 WO PCT/US2009/006706 patent/WO2010077336A1/en active Application Filing
- 2009-12-22 US US12/644,791 patent/US8463554B2/en active Active
- 2009-12-22 EP EP17172048.5A patent/EP3276526A1/en active Pending
-
2013
- 2013-04-26 US US13/871,744 patent/US20140006433A1/en not_active Abandoned
-
2016
- 2016-09-13 US US15/264,493 patent/US20170228498A1/en not_active Abandoned
-
2017
- 2017-07-31 US US15/664,619 patent/US10854318B2/en active Active
-
2018
- 2018-09-12 US US16/129,645 patent/US20190012431A1/en not_active Abandoned
-
2020
- 2020-10-16 US US17/073,095 patent/US11031101B2/en active Active
- 2020-10-16 US US17/073,110 patent/US11049589B2/en active Active
- 2020-10-16 US US17/073,128 patent/US20210043280A1/en not_active Abandoned
- 2020-10-16 US US17/073,122 patent/US20210043279A1/en not_active Abandoned
- 2020-10-22 US US17/077,930 patent/US11468971B2/en active Active
-
2021
- 2021-03-25 US US17/301,129 patent/US20210225458A1/en not_active Abandoned
- 2021-06-17 US US17/351,052 patent/US11322227B2/en active Active
-
2022
- 2022-01-14 US US17/576,738 patent/US11508461B2/en active Active
- 2022-08-03 US US17/880,566 patent/US20220375547A1/en not_active Abandoned
- 2022-10-28 US US17/975,949 patent/US11657902B2/en active Active
- 2022-11-02 US US17/979,412 patent/US11935628B2/en active Active
-
2023
- 2023-03-28 US US18/191,525 patent/US11776662B2/en active Active
- 2023-08-07 US US18/198,558 patent/US20240242783A1/en active Pending
-
2024
- 2024-02-06 US US18/434,362 patent/US12100487B2/en active Active
- 2024-07-02 US US18/762,304 patent/US20240355428A1/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8855935B2 (en) * | 2006-10-02 | 2014-10-07 | Ancestry.Com Dna, Llc | Method and system for displaying genetic and genealogical data |
US10854318B2 (en) * | 2008-12-31 | 2020-12-01 | 23Andme, Inc. | Ancestry finder |
US20210043281A1 (en) * | 2008-12-31 | 2021-02-11 | 23Andme, Inc. | Ancestry finder |
US11031101B2 (en) * | 2008-12-31 | 2021-06-08 | 23Andme, Inc. | Finding relatives in a database |
Non-Patent Citations (4)
Title |
---|
Cartier, K.C. (August, 2008) Application of the mediator design pattern to Monte Carlo simulation in genetic epidemiology. Thesis, Case Western Reserve University, US. 131 pages. (Year: 2008) * |
Rajeevan et al. ALFRED: an allele frequency database for Microevolutionary studies. Evolutionary Bioinformatics Online (2005) vol 1, p 1-10. (Year: 2006) * |
Roberson et al. (2009) Visualization of shared genomic regions and meiotic recombination in high density SNP data. PLOS One, volume 4, issue 8, e6711, 13 pages. (Year: 2009) * |
Roberson, E. (2009) EXAMINING COPY NUMBER ALTERATIONS, UNEXPECTED RELATIONSHIPS AND POPULATION STRUCTURE USING SNPS. Johns Hopkins University, Baltimore, MD. July, 2009. 188 pages. (Year: 2009) * |
Also Published As
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11468971B2 (en) | Ancestry finder | |
US9116882B1 (en) | Identification of matrilineal or patrilineal relatives | |
US10643740B2 (en) | Family inheritance | |
Leshchiner et al. | Mutation mapping and identification by whole-genome sequencing | |
Henn et al. | Cryptic distant relatives are common in both isolated and cosmopolitan genetic samples | |
Tian et al. | Estimating the genome-wide mutation rate with three-way identity by descent | |
US20170329901A1 (en) | Identifying variants of interest by imputation | |
Hsu et al. | The accuracy and bias of single-step genomic prediction for populations under selection | |
Ball | Experimental designs for reliable detection of linkage disequilibrium in unstructured random population association studies | |
Markus et al. | Integration of SNP genotyping confidence scores in IBD inference | |
Adrianto et al. | Estimating allele frequencies | |
Manichaikul et al. | Binary trait mapping in experimental crosses with selective genotyping | |
Bickeböller | Genetic Epidemiology |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: 23ANDME, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MACPHERSON, JOHN MICHAEL;NAUGHTON, BRIAN THOMAS;MOUNTAIN, JOANNA LOUISE;REEL/FRAME:060774/0294 Effective date: 20100820 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |