WO2019156591A1 - Methods and systems for prediction of frailty background - Google Patents
Methods and systems for prediction of frailty background Download PDFInfo
- Publication number
- WO2019156591A1 WO2019156591A1 PCT/RU2018/050155 RU2018050155W WO2019156591A1 WO 2019156591 A1 WO2019156591 A1 WO 2019156591A1 RU 2018050155 W RU2018050155 W RU 2018050155W WO 2019156591 A1 WO2019156591 A1 WO 2019156591A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- frailty
- snps
- prediction model
- indicator
- genetic
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
- G16B25/10—Gene or protein expression profiling; Expression-ratio estimation or normalisation
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
Definitions
- the present disclosure provides improved methods and systems that may be used for frailty or longevity prediction.
- the methods and apparatus disclosed herein may provide for high accuracy and efficiency in the prediction of frailty based on genetic data.
- a frailty model with improved accuracy and performance can be used for predicting longevity of an individual in response to single nucleotide polymorphism (SNP) data.
- SNP single nucleotide polymorphism
- Data from a plurality of identified gene variants can be incorporated into the model in order to output frailty or longevity with improved accuracy.
- a method of determining frailty may comprise: receiving an input from a user, the input comprising a request for a frailty assessment; and displaying a frailty assessment parameter in response to a frailty prediction model based on a plurality of SNPs, wherein the plurality of SNPs comprises at least one of rs76207570, rs3811444, rs2250l27, rs9272588, rs559648l8, or rs4332427.
- a tangible storage medium comprising instructions.
- the tangible storage medium may be configured to: receive a user input comprising a request for a frailty assessment; and display a frailty assessment parameter in response to a frailty prediction model based on a plurality of SNPs, wherein the plurality of SNPs comprises at least one of rs76207570, rs38H444, rs2250l27, rs9272588, rs559648l8, or rs4332427.
- a system for determining frailty comprising the tangible storage medium and a processor configured to execute the instructions is provided.
- the plurality of SNPs further comprises at least one of rs674195l, rsl4376l991, rs3580H34, rs3465l, rs689l62l, rsl0947428, rs7808664, rsl3282l06, rs 10793962, rsl50080415, rsl l852372, rs3743445, rs9892942, rs7502233, rs4633l2, rs62217799, or rs 143728189.
- the plurality of SNPs further comprises at least one of rs674l95l, rsl4376l99l, rs3580H34, rs34651, rs689l621, rsl0947428, rs7808664, rsl3282l06, rsl0793962, rsl500804l5, rsl l852372, rs3743445, rs9892942, rs7502233, rs4633 l2, rs622l7799, and rsl43728l89.
- the plurality of SNPs comprises at least one of rs76207570, rs38H444, rs2250127, rs9272588, rs55964818, rs4332427, rs674195l, rsl43761991, rs3580H34, rs34651, rs6891621, rsl0947428, rs7808664, rsl3282l06, rsl0793962, rs 150080415, rsl 1852372, rs3743445, rs9892942, rs7502233, rs4633l2, rs62217799, or rs 143728189.
- the frailty prediction model is configured to provide a frailty indicator corresponding to an explained variance of at least 1.5.
- the method further comprises generating a frailty indicator for the frailty assessment in response to the frailty prediction model, wherein the frailty prediction model comprises a plurality of coefficients for the plurality of SNPs, and wherein generating the frailty indicator comprises performing a calculation using the plurality of coefficients.
- the plurality of coefficients comprises a respective coefficient corresponding to each of the plurality of SNPs.
- the method further comprises: receiving a plurality of coefficients for the plurality of SNPs; and generating a frailty indicator for the frailty assessment in response to the frailty prediction model using the plurality of coefficients.
- the plurality of coefficients comprises a respective coefficient corresponding to each of the plurality of SNPs.
- the method further comprises receiving a frailty indicator and generating the frailty assessment parameter based on the frailty indicator.
- the frailty indicator is indicative of a relative hazard value.
- the frailty indicator is indicative of an expected lifespan at a predetermined age of the individual.
- the frailty indicator is indicative of an assessment of genetic predisposition for premature death.
- the method further comprises receiving data indicative of genetic information of the user for the frailty assessment.
- a method for generating a frailty prediction model comprises: constructing a training model based on mortality data of a first population of deceased individuals; determining, using the training model, a frailty index for individuals of a second population; identifying a plurality of SNPs associated with human frailty based at least on (1) genetic data of the individuals of the second population and the (2) frailty index for the individuals of the second population; and generating the frailty prediction model based on the plurality of SNPs, wherein the frailty prediction model is configured to determine a frailty indicator of an individual.
- a tangible storage medium comprises instructions configured to: construct a training model based on mortality data of a first population of deceased individuals; determine, using the training model, a frailty index for individuals of a second population; identify a plurality of SNPs associated with human frailty based at least on (1) genetic data of the individuals of the second population and the (2) frailty assessment for the individuals of the second population; and generate a frailty prediction model based on the plurality of SNPs, wherein the frailty prediction model is configured to determine a frailty indicator of an individual.
- a system for generating a frailty prediction model comprising the tangible storage medium and a processor configured to execute the instructions is provided.
- the plurality of SNPs comprises at least one of rs76207570, rs38H444, rs2250l27, rs9272588, rs559648l8, and rs4332427.
- the plurality of SNPs comprises at least one of rs76207570, rs3811444, rs2250l27, rs9272588, rs559648l8, rs4332427, rs674l95l, rsl4376199l, rs3580H34, rs3465l, rs689l621, rsl0947428, rs7808664, rsl3282l06, rsl0793962, rsl500804l5, rsl 1852372, rs3743445, rs9892942, rs7502233, rs4633l2, rs62217799, and rsl43728l89.
- constructing the training model comprises constructing a COX proportional hazard model.
- the second population of individuals comprises living individuals.
- generating the frailty prediction model comprises generating a linear regression model such as a weighted sum of the plurality of SNPs.
- the frailty prediction model is configured to receive as an input genetic data of the individual and provide as an output the frailty indicator in response to the input.
- the frailty indicator is indicative of a relative hazard value, an expected lifespan at a predetermined age of the individual, a rate of death at a predetermined age of the individual relative to a rate of death at the predetermined age of a predetermined group of individuals, or an assessment of genetic predisposition for premature death.
- the mortality data comprises death register data of the deceased individuals of the first population and non-genetic data for the deceased individuals of the first population.
- the non-genetic data comprises a plurality of non-genetic traits. The plurality of non-genetic traits is selected from the group consisting of white blood cell count, red blood cell count, mean corpuscular volume, mean corpuscular hemoglobin, and platelet count.
- the method further comprises receiving the mortality data from a preexisting database.
- the frailty index is a logarithm of a hazard ratio value.
- the hazard ratio value is ratio between a hazard rate of an individual and a mean hazard rate of a predetermined group of individuals.
- generating the frailty prediction model comprises generating a coefficient for each of the plurality of SNPs, wherein the coefficient is indicative of an association with human frailty. In some cases, generating the coefficient comprises multiple linear regression analysis. In some embodiments, the frailty indicator determined by the frailty prediction model corresponds explained variance of at least 1.5. In some embodiments, identifying the plurality of SNPs comprises a genome-wide association study (GWAS). In some cases, identifying the plurality of SNPs comprises identifying a set of SNPs having a predetermined correlation with human frailty and further selecting a subset of SNPs from the set of SNPs. In some cases, conditional and joint analysis is used to select for the subset of SNPs. In some examples, the GWAS is configured to produce an effect allele frequency value of at most about 0.8.
- a method of determining frailty comprises: receiving an input request for a frailty indicator; and generating the frailty indicator in response to a frailty prediction model based on a plurality of SNPs, wherein the plurality of SNPs comprises at least one of rs76207570, rs38H444, rs2250l27, rs9272588, rs559648l8, or rs4332427.
- the plurality of SNPs comprises at least one of rs76207570, rs38H444, rs2250127, rs9272588, rs55964818, rs4332427, rs674195l, rsl4376l991, rs3580H34, rs3465l, rs689l62l, rsl0947428, rs7808664, rsl3282l06, rsl0793962, rs 150080415, rsl 1852372, rs3743445, rs9892942, rs7502233, rs4633l2, rs622l7799, and rs 143728189.
- the plurality of SNPs further comprises at least one of rs674195l, rsl4376l991, rs3580H34, rs3465l, rs689l62l, rsl0947428, rs7808664, rsl3282l06, rs 10793962, rsl50080415, rsl l852372, rs3743445, rs9892942, rs7502233, rs4633l2, rs62217799, or rs 143728189.
- the plurality of SNPs further comprises at least one of rs674l95l, rsl4376l99l, rs3580H34, rs34651, rs689l621, rsl0947428, rs7808664, rsl3282l06, rsl0793962, rsl500804l5, rsl l852372, rs3743445, rs9892942, rs7502233, rs4633 l2, rs622l7799, and rsl43728l89.
- a method of determining frailty comprises: receiving a plurality of coefficients for a plurality of SNPs; and generating a frailty indicator in response to a frailty prediction model based on the plurality of SNPs, wherein the plurality of SNPs comprises at least one of rs76207570, rs38H444, rs2250l27, rs9272588, rs559648l8, or rs4332427.
- the plurality of SNPs further comprises at least one of rs674195l, rsl4376l991, rs3580H34, rs3465l, rs689l62l, rsl0947428, rs7808664, rsl3282l06, rs 10793962, rsl50080415, rsl l852372, rs3743445, rs9892942, rs7502233, rs4633l2, rs62217799, or rs 143728189.
- the plurality of SNPs further comprises at least one of rs674l95l, rsl4376l99l, rs3580H34, rs34651, rs689l621, rsl0947428, rs7808664, rsl3282l06, rsl0793962, rsl500804l5, rsl l852372, rs3743445, rs9892942, rs7502233, rs4633 l2, rs622l7799, and rsl43728l89.
- the plurality of SNPs comprises at least one of rs76207570, rs38H444, rs2250127, rs9272588, rs55964818, rs4332427, rs674195l, rsl4376l991, rs3580H34, rs34651, rs689l621, rsl0947428, rs7808664, rsl3282l06, rsl0793962, rsl 50080415, rsl 1852372, rs3743445, rs9892942, rs7502233, rs4633l2, rs62217799, or rsl43728l89.
- FIG. 1 shows an exemplary process of building a frailty prediction model, in accordance with some embodiments of the invention.
- FIG. 2 shows an exemplary method of constructing a frailty prediction model and predicting longevity or frailty for an individual, in accordance with some embodiments.
- FIG. 3A shows an example of coefficients with high coefficient magnitudes (e.g., high correlation to frailty) and factors corresponding to the coefficients in Cox PH model, in accordance with some embodiments;
- FIG. 3B shows the logarithm of the CPHM hazard ratio increases linearly with age.
- FIG. 4 shows genome-wide significant loci that are associated with frailty or longevity, in accordance with some embodiments.
- FIG. 5A shows a plurality of traits having significant genetic correlation with the frailty index, in accordance with some embodiments;
- FIG. 5B shows genetic correlation matrix and clustering for traits having high genetic correlations with predicted frailty.
- FIG. 6 shows an exemplary network layout comprising frailty prediction systems, in accordance with some embodiments.
- FIG. 7 shows an example of a user device by which a user may access frailty prediction information, in accordance with some embodiments.
- FIG. 8 shows an exemplary process of generating longevity related result on a user device, in accordance with some embodiments.
- FIG. 9 shows a computer system that is programmed or otherwise configured to perform frailty prediction, in accordance with some embodiments.
- FIG. 10 shows a Manhattan plot for GW AS on frailty phenotype.
- FIG. 11 shows a list of candidate genes in regions associated with predicted frailty, as suggested by presence of missense mutations and/or SMR HEIDI.
- the term“about” and its grammatical equivalents in relation to a reference numerical value can include a range of values up to plus or minus 10% from that value.
- the amount“about 10” can include amounts from 9 to 11.
- the term“about” in relation to a reference numerical value can include a range of values plus or minus 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, or 1% from that value.
- the term“at least” and its grammatical equivalents in relation to a reference numerical value can include the reference numerical value and greater than that value.
- the amount“at least 10” can include the value 10 and any numerical value above 10, such as 11, 100, and 1,000.
- the term“at most” and its grammatical equivalents in relation to a reference numerical value can include the reference numerical value and less than that value.
- the amount“at most 10” can include the value 10 and any numerical value under 10, such as 9, 8, 5, 1, 0.5, and 0.1.
- the term“subject,” as used herein, generally refers to an animal, such as a mammalian species (e.g., human) or avian (e.g., bird) species, or other organism, such as a plant. More specifically, the subject can be a vertebrate, e.g., a mammal such as a mouse, a primate, a simian or a human. Animals include, but are not limited to, farm animals, sport animals, and pets.
- a subject can be a healthy individual, an individual that has or is suspected of having a disease or a pre-disposition to the disease, or an individual that is in need of therapy or suspected of needing therapy.
- a subject can be any individual human being.
- polynucleotide generally refers to a molecule comprising one or more nucleic acid subunits.
- a polynucleotide can include one or more subunits selected from adenosine (A), cytosine (C), guanine (G), thymine (T) and uracil (U), or variants thereof.
- a nucleotide can include A, C, G, T or U, or variants thereof.
- a nucleotide (nt) can include any subunit that can be incorporated into a growing nucleic acid strand.
- Such subunit can be an A, C, G, T, or U, or any other subunit that is specific to one or more complementary A, C, G, T or U, or complementary to a purine (i.e., A or G, or variant thereof) or a pyrimidine (i.e., C, T or U, or variant thereof).
- a subunit can enable individual nucleic acid bases or groups of bases (e.g., AA, TA, AT, GC, CG, CT, TC, GT, TG, AC, CA, or uracil-counterparts thereof) to be resolved.
- a polynucleotide is deoxyribonucleic acid (DNA) or ribonucleic acid (RNA), or variants or derivatives thereof.
- a polynucleotide can be single-stranded or double-stranded.
- the term“genome” generally refers to an entirety of an organism’s hereditary information.
- a genome can be encoded either in DNA or in RNA.
- a genome can comprise coding regions that code for proteins as well as non-coding regions.
- a genome can include the sequence of all chromosomes together in an organism. For example, the human genome has a total of 46 chromosomes. The sequence of all of these together constitutes a human genome.
- the term“genetic variant,” as used herein, generally refers to an alteration, variant or polymorphism in a nucleic acid sample or genome of a subject. Such alteration, variant or polymorphism can be with respect to a reference genome, which may be a reference genome of the subject or other individual.
- Single nucleotide polymorphisms are a form of polymorphisms.
- one or more polymorphisms comprise one or more single nucleotide variations (SNVs), insertions, deletions, repeats, small insertions, small deletions, small repeats, stmctural variant junctions, variable length tandem repeats, and/or flanking sequences.
- Copy number variants (CNVs), trans versions and other rearrangements are also forms of genetic variation.
- a genomic alternation may be a base change, insertion, deletion, repeat, copy number variation, transversion, or a combination thereof.
- the terms“frailty” and“longevity” are used interchangeably throughout this specification, and refer to a measure of one or more of an expected life span, genetic predisposition to premature death, rate of death, and frailty risk relative to average group.
- the terms“longevity associated genes” and “frailty associated genes” are used interchangeably throughout this specification and refer to genes with desired correlation with frailty or longevity.
- the methods and systems provide a frailty prediction model to predict longevity or frailty of an individual based on genetic data.
- the frailty prediction model is capable of producing a frailty indicator indicative of a frailty assessment, such as a frailty risk or longevity score, based on individual’s genetic data.
- Frailty can be associated with disability, comorbidity, and other characteristics. Frailty can be strongly associated with higher mortality.
- the frailty prediction model computes the frailty indicator based on a relationship between a plurality of genetic factors and frailty of an individual.
- the genetic factors may comprise SNPs, Indels, CNVs, and/or epigentics markers.
- the plurality of SNPs may be identified to be associated with frailty using an intermediate phenotype.
- FIG. 1 illustrates an exemplary process 100 of building a frailty prediction model, in accordance with embodiments of the invention.
- the process 100 can provide a frailty prediction model 116 configured to generate a frailty indicator for an individual in response to receipt of genetic data of the individual.
- the frailty prediction model 116 can generate the frailty indicator based on information relating to the presence of a plurality of SNPs in the genetic data of the individual.
- the frailty indicator for an individual can, for example, be used to generate a frailty assessment parameter for the individual.
- the frailty indicator can be indicative of one or more of an expected life span, genetic predisposition to premature death, rate of death, and frailty risk relative to average group.
- the frailty assessment parameter may be displayed to a user, such as via a graphical user interface of a user electronic device, in various forms.
- the frailty assessment parameter may be displayed to the user as a graph, a numerical value, text, and/or a diagram.
- the frailty prediction model 116 may comprise a multiple linear relationship between the plurality of SNPs and the frailty indicator.
- the process 100 for building the frailty prediction model 116 can comprise use of a first dataset of information for a first population of individuals, where the first dataset may not include genetic information for the first population of individuals, to train a death prediction model 110.
- the death prediction model 103 can be used to generate a frailty index 112 that can be used in the selection of the plurality of SNPs.
- the death prediction model 103 may be obtained by fitting available mortality data 101 to a Cox proportional hazard model (Cox PH model).
- Cox PH model Cox proportional hazard model
- the process 100 can thereby advantageously enable generation of the frailty prediction model 116 using information for a population of individuals where genetic information for the individuals is not available.
- the provided process 100 can comprise use of a second dataset 105 comprising information for a second population of individuals in the selection of the SNPs, where the second dataset 105 includes genetic information for individuals of the second population.
- the second dataset 105 can be an unrestricted dataset, for example comprising both genetic information and non-genetic information.
- the process 100 may allow for identification of SNPs having a desired association with frailty or longevity from the un-restricted dataset, such as the second dataset 105.
- the death prediction model 103 can generate a computed frailty index for individuals of the second population in response to receipt of the non- genetic data 107 of the second population of individuals.
- the computed frailty index 112 can then be used, along with genetic data 109 of the second population, in the GWAS analysis for selecting the plurality of SNPs.
- the information used in the GWAS analysis can thereby be supplemented with the computed frailty index generated using non-genetic information of the second population of individuals.
- the available information may be enlarged by supplementing available genetic data 109 by a frailty phenotype, such as the frailty index 112.
- the frailty phenotype may be an intermediate phenotype used to identify the plurality of SNPs having the desired association with frailty.
- One or more longevity associated SNPs may be identified using genome-wide association studies (GWAS) applied to the unrestricted dataset which cannot be achieved using conventional methods.
- GWAS genome-wide association studies
- a first dataset comprising mortality data 101 may be used to generate a death prediction model 103.
- the mortality data 101 may comprise death register data and non-genetic data of individuals from a first population.
- the first population may comprise deceased individuals.
- the first population may comprise individuals whose genetic data are not available.
- the first population may comprise deceased individuals whose genetic data is not available.
- the mortality data 101 for the first population of individuals may comprise a plurality of factors. Any number of factors may be selected from the first dataset. For example, at least 10, 50, 100, 200, 300, 400, 500 factors may be selected for building the death prediction model.
- the plurality of factors may comprise genetic factors and/or non-genetic factors.
- the plurality of factors consists of non-genetic factors.
- genetic information for the first population of individuals may not be available such that the mortality data 101 for the individuals do not include genetic information for the individuals.
- a factor may be represented by one or more data fields.
- a data field of a factor may comprise any numerical format such as continuous value or binary value.
- the plurality of factors may include various factors positively or negatively associated with frailty.
- Non-genetic factors may comprise one or more genes or genetic markers having a positive or negative correlation with frailty.
- the non-genetic factors may comprise various traits of the individuals of the first population.
- the plurality of non-genetic factors may or may not change over time. Some of the factors may be time-dependent factors such that the data field of these factors changes over time. One or more of these factors may change only once over time or more than once over time.
- the non-genetic factors may be selected from the group consisting of waist-to-hip ratio, depressive symptoms, lung cancer, asthma, cigarette smoking, education, age at first birth, parental age-at-death, pulse rate, sex, lymphocyte count, and various other data.
- the non-genetic factors may comprise biomarkers selected from the group consisting of cholesterol, direct low density lipoprotein, HDL-cholesterol, triglyceride, apolipoprotein A, apolipoprotein B, C-reactive protein, vitamins, rheumatoid factor, alkaline phosphatase, calcium, testosterone, sex -hormone binding globulin, oestradiol, insulin like growth factor, hemoglobin Ale, glucose, cystatin C, creatinine, protein, urea, phosphate, urate, sodium, microalbumin, potassium, bilirubin, gamma, Glutamyltransferase, alanine aminotransferase, aspartate aminotransferase, or any combination thereof.
- biomarkers selected from the group consisting of cholesterol, direct low density lipoprotein, HDL-cholesterol, triglyceride, apolipoprotein A, apolipoprotein B, C-reactive protein
- Information relating to the plurality of factors may be obtained from blood assay results, cognitive tests, physical measurements (e.g., spriometry, anthropometry, blood pressure, grip strength), lifestyle and environment historical record, health and medical history, family history, psychosocial factors, early life factors and/or various others.
- physical measurements e.g., spriometry, anthropometry, blood pressure, grip strength
- lifestyle and environment historical record e.g., health and medical history, family history, psychosocial factors, early life factors and/or various others.
- the mortality data 101 may comprise death register data.
- the death register data can be indicative of a life span of an individual, for example providing information of the length of time an individual was alive.
- the death register data can be in various forms.
- the death register data may be right-censored mortality data representing follow-up time in years for each subject. This may be defined as the difference between the date a person was last observed or the date of death, and the date the person took a blood test.
- the death register data for an individual may comprise a Boolean marker indicating whether death occurred, for example having a binary value representative of whether the individual is deceased. The life span of the individual can then be derived from the Boolean marker.
- the one or more factors may or may not be all available.
- the missing data may be replaced or imputed with substitute values such as mean values, normalized to zero mean, and/or unit variance of 1.
- substitute values such as mean values, normalized to zero mean, and/or unit variance of 1.
- a variety of other data imputation methods can be used to replace the missing data.
- the mortality data 101 may be used to train a death prediction model 110.
- the mortality data 101 can be used to generate coefficients for a death prediction model 103.
- the death prediction model 103 may represent a relationship between a frailty index of an individual and a plurality of factors as described above.
- the death prediction model 103 may be used to compute frailty indices based on genetic and/or non-genetic data.
- the mortality data 101 may not include genetic data such that the death prediction model 103 is generated using non-genetic factors, such as the plurality of factors described herein.
- the death prediction model 103 may be obtained by fitting a Cox Proportional hazards model (Cox PH model) to the mortality data 101.
- the Cox PH model can provide survival-time (time-to-event) outcomes (e.g., frailty) on one or more predictor variables (e.g., non-genetic factors as described herein).
- the Cox PH model can be used to model a relationship between the predictor variables a hazard rate for an individual or a hazard ratio for the individual.
- Hazard ratios can be a comparison of the hazard rates of event occurrence for an individual of a group and a hazard rate of the group.
- the Cox PH model can be used to estimate hazard ratios for individuals based on the predictor variables.
- the registered death data for an individual can be provided as a cumulative hazard or hazard-ratio and the plurality of factors can be provided as the one or more predictor variables.
- Cumulative hazard at a time t can be the risk of dying between time 0 and time t, and the survivor function at time t is the probability of surviving to time t (e.g., exponential function).
- the training procedure 110 can be used to generate a plurality of coefficients for the death prediction model 103.
- the training procedure 110 can be used to generate coefficients for the Cox PH model.
- the coefficients can be indicative of correlations between the plurality of factors and a hazard rate or hazard ratio.
- the sign of the coefficient positive or negative, can indicate the direction of correlation with the hazard rate or hazard ratio.
- a positive coefficient can indicate positive correlation with an increased hazard rate (e.g., a worse prognosis for life expectancy) and a negative coefficient can indicate a negative correlation with an increased hazard rate (e.g., a protective effect of the variable with which it is associated).
- survival models can also be used for the death prediction model to generate the estimated frailty as an intermediate phenotype.
- accelerated life model or other kinds of proportional hazard models such as exponential and Weibull models may also be used for computing an estimated frailty index as the intermediate phenotype.
- the death prediction model 103 may comprise a plurality of coefficients corresponding to the plurality of factors respectively. In some cases, the death prediction model 103 can comprise a coefficient for each of the plurality of factors. For example, training the death prediction model 110 can generate a corresponding coefficient for each of the plurality of coefficients.
- a hazard rate of an individual can be calculated using the trained death prediction model 103.
- hazard ratio may be computed and used as an intermediate phenotype for an individual.
- the hazard ratio may be calculated as the ratio of an individual’s hazard rate over a group’s mean hazard rate. For example, a hazard ratio of 2 is thought to mean that an individual has twice the chance of dying than a comparison group (e.g., group average).
- the group may be a baseline group or comparison group.
- the group may be of the same cohort as the individual.
- the group may be categorized in various different ways such as gender, age, ethnicity and the like.
- a logarithm of the hazard ratio may be used as the frailty index or intermediate phenotype of an individual.
- the hazard ratio may be a constant along with time. In some cases, the hazard ratio may vary depending on time.
- the trained death prediction model 103 can then be used for computation of a frailty index 112 for each individual from a second dataset 105.
- the death prediction model 103 can generate a frailty index 112 for an individual of a second population in response to receiving information relating to the predictor variables of the death prediction model 103.
- the computed frailty index can be a hazard rate.
- the computed frailty index 112 can be a hazard ratio.
- the computed frailty index 112 may be the logarithm of a hazard ratio as described herein.
- the computed frailty index 112 may be a measure of the frailty phenotype or intermediate phenotype.
- the second dataset 105 may comprise both genetic data 109 and non-genetic data 107.
- the second dataset 105 may be from individuals for which both genetic data 109 and non-genetic data 107 are available.
- the second dataset may comprise data for a second population of individuals.
- the second population comprises living individuals.
- the living individuals may have genetic data available.
- the second dataset can comprise the genetic data of the second population of individuals.
- the death prediction model 103 can be trained using non- genetic information from a first population of individuals.
- the trained death prediction model 103 can be used to determine frailty indices for individuals of a second population.
- the non-genetic data 107 of the second population may comprise one or more of the factors that may be used to generate the computed frailty index 112 for individuals of the second population using the death prediction model 103.
- the computed frailty index 112 may comprise a hazard ratio or logarithm of hazard ratio of an individual in the second population using the fitted death prediction model 103.
- coefficients for the death prediction model 103 can be generated using a plurality of factors from the mortality data 101 of the first population of individuals, where the coefficients can correspond to the plurality factors.
- Information relating to the plurality of factors from the non-genetic data 107 can be provided to the death prediction model 103 to generate the computed frailty index 112 for individuals of the second population.
- information relating to the plurality of factors for each individual of the second population of individuals can be provided to the death prediction model 103 such that a computed frailty index 112 can be generated for each individual of the second population.
- the second dataset 105 for the second population can comprise both genetic data 109 and non-genetic data. Using this method, an individual in the second population may have both genetic data 109 available and a computed frailty index 112 associated with the individual as a phenotype.
- GWAS may be performed using the genetic data 109 and the frailty index 112 to identify SNPs having desired correlation with longevity or frailty 114.
- a plurality of genetic markers may be identified by the GWAS meta-analysis as having desired association with frailty at the genome wide level. In some cases, because all of these longevity or frailty associated genes are not typically present on a genotyping array, imputation of these missing genes may be performed. A number of genome-wide significant loci may be identified to have consistent association with frailty and other traits.
- the plurality of SNPs may be selected via a genome-wide SNP selection procedure. In some cases, methods for selecting the plurality of SNPs can comprise conditional and joint multiple-SNP analysis of GWAS may be used to analyze the correlation of the associated SNPs for frailty.
- a regression analysis such as a multiple regression analysis, may be performed to build the frailty prediction model 116 based on the plurality of identified SNPs.
- the regression analysis may generate a plurality of coefficients.
- Such regression analysis may produce a coefficient for each SNP.
- the regression analysis may be used to generate a plurality of coefficients, the plurality of coefficients comprising a coefficient which corresponds to a respective SNP of the plurality of SNPs.
- each SNP can have a corresponding coefficient.
- the coefficient for an SNP can be indicative of a weight assigned for the corresponding SNPs in the frailty prediction model.
- the coefficient can be indicative of the strength and/or direction of correlation between the SNP and the frailty indicator generated by the frailty prediction model.
- the frailty prediction model 116 may comprise a linear regression (e g., multivariate regression). Using this frailty prediction model, an individual’s frailty assessment index can be computed as a weighted sum of the list of SNPs. The weights may correspond to the plurality of coefficients obtained from the multiple linear regression analysis.
- the frailty prediction model can be used to generate a frailty assessment of an individual in response to receiving genetic information of the individual, such as genetic information relating to the presence or absence of the plurality of SNPs.
- the frailty prediction model 116 may have a performance metric.
- the performance metric indicates an improved accuracy compared to other models.
- residuals from the multivariate regression analysis may be used as a target for further estimations with the list of SNPs. These residuals should not have correlations with sex, age and genetics principal components or other traits that were analyzed in the multivariate regression analysis.
- another multivariate regression model may be built using the residual values as target and the list of SNPs as predictors. In this scenario, an explained variance may be used as the performance metrics.
- a set of 23 identified SNPs (shown in FIG. 4) accounts for 0.7% of variation in the residual values.
- any number of the 23 identified SNPs can be used for building the frailty prediction model. For example, at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, or 23 of the identified SNPs may be used to fit the frailty prediction model.
- these SNPs passed genome-wide significance threshold of p-values ⁇ 5 8 , in which case the explained variance is 1.5%.
- the provided frailty prediction model may have an explained variance in a range of about 1.5% to about 5%, including in range of about 1.5% to about 4.5%, about 1.5% to about 3.5%, about 1.5% to about 2.5%, about 2% to about 4%, about 3% to about 4%, about 3% to about 5%, or about 4% to about 5%.
- the frailty prediction model 116 may be used to predict lifespan. Frailty change can be easily translated into lifespan change measured in years for a given population. For example, when using the top 22 SNPs among the 23 identified SNPs (in some cases rsl4376l991 was excluded due to low imputation quality) the mean effect of the 22 SNPs is 0.37 years per allele. In some cases, it may not be possible to have all of 22 SNPs effect signs to be the same for any given genotype set, in such scenarios cumulative lifespan effect for the discovery cohort may be calculated. It was shown that 2.5% of population have lifespan at least 1.81 years shorter, and another extreme 2.5% of population (after 97.5 percentile) have 2.21 years longer lifespan.
- the traits including the frailty index being analyzed by the multiple regression analysis may be adjusted for sex, age, genetic and principal components of the genetic data (e.g., sex, age, genetic and principal components of the genetic data are taken into account as variables) .
- data imputation may be performed to substitute missing SNPs data.
- FIGs. 2-5 show an example of frailty prediction model obtained using the provided method and system.
- FIG. 2 shows an exemplary method 200 of constructing a frailty prediction model and generating a frailty assessment for an individual using the frailty prediction model.
- the frailty prediction model can predict the longevity or frailty of an individual.
- the method may be used to identify a list of SNPs with significant association with frailty.
- the list of SNPs may be identified by performing GWAS on a dataset comprising computed frailty index as an intermediate phenotype.
- a frailty prediction model is then constructed based on the identified SNPs such that frailty or longevity related information can be generated for an individual using genetic data of the individual, for example to provide a frailty indicator for the individual.
- the computed frailty index may be computed using a trained death prediction model.
- the death prediction model may be the Cox PH model as described in FIG. 1.
- Mortality data may be used for building the Cox PH model.
- the mortality data can be from any available database such as United Kingdom Biobank and any other resource.
- the mortality data as described above may comprise death register data and a plurality of factors.
- the mortality data may not comprise genetic data.
- the Cox PH model can be trained using data which does not comprise genetic data, enabling application of the step to datasets for which genetic data is not available.
- the Cox PH model can be trained using data from individuals for which genetic data is not available.
- Mortality data preparation may comprise data normalization, data imputation, and various other methods. For instance, when the data field representing continuous or integer values, a Box-Cox transformation may be applied to transform the non normal variables into a normal shape.
- data imputation may be performed in case of missing data. For instance, missing data may be substitute with mean values and normalized to zero mean and unit variance of 1.
- FIG. 3 A shows an example of coefficients with high coefficient magnitudes (e.g., high correlation to frailty) and corresponding variables.
- coefficients may be produced based on mortality data from a group of individuals whose genetic data are not available.
- the trained Cox PH model may be evaluated for performance or goodness of fit.
- the concordance index may be calculated between the predicted hazard ratio generated by the trained Cox PH model and hazard ration based on actual death register data to assess the performance.
- the Cox PH model constructed using the provided method may have a concordance index of at least 0.7.
- the concordance index of the Cox PH model may be about 0.8.
- the constructed Cox PH model may then be used for generating an intermediate phenotype for each individual in a second population 205.
- the intermediate phenotype may be a computed frailty index for an individual.
- the intermediate phenotype may be the logarithm of a hazard ratio computed using the Cox PH model.
- the frailty index provides a measure of an individual’s frailty and susceptibility to premature death.
- the hazard ratio is the hazard rate of an individual relative to the mean hazard rate of a group of individuals.
- the hazard ratio for an individual of the second population can be ratio of a hazard rate of the individual relative to a mean hazard rate of the second population.
- the computed frailty index and genetic data from the second dataset can be used to identify a plurality of SNPs associated with frailty 207.
- the second dataset in the example is from the UK Biobank.
- GW AS study is performed using the genetic data and the frailty index to identify the plurality of SNPs.
- the frailty index may be adjusted for sex, age, SNP array, or genetic principal components before being used as phenotype in the GW AS study.
- data imputation may be applied. Imputation may be performed using one or more reference panels.
- a plurality of imputed autosomal and directly typed X-chromosome variants may be used to identify a plurality of longevity associated SNPs shown in FIG. 4. As shown in FIG. 4, 23 genome- wide significant loci are located on autosomes. Such SNPs are identified according to the p value. For example, threshold may be determined and SNPs with p value below such threshold may be identified as the significant SNPs.
- the GWAS may be performed for multiple traits including the frailty index and blood traits. The blood traits may be from the second dataset such as the UK Biobank.
- the blood traits may comprise: white blood cell (leukocyte) count, red blood cell (erythrocyte) count, mean corpuscular volume, mean corpuscular haemoglobin and platelet count. These traits may be adjusted for sex, age, genotyping array platform and genetic principal components (e.g., the first ten genetic principal components) using linear regression model.
- a set of associated SNPs may be selected.
- the set of SNPs may be selected using conditional and joint analysis.
- the conditional and joint analysis method may be used for estimating the joint effects of multiple SNPs for a quantitative trait such as the frailty index.
- the conditional and joint analysis method may use meta-analysis summary statistics and one or more reference panels of SNPs to estimate the linkage disequilibrium (LD).
- LD linkage disequilibrium
- a subset of SNPs with p value below certain threshold may be selected.
- the following stepwise selection strategy may be used to select the associated SNPs iteratively over all the SNPs across the whole genome, regardless of their P values from the meta analysis, except for the most significant SNP, which was used for model initiation.
- a subset having a pre-determined number of SNPs may be selected.
- the Steps may include:
- the cutoff p value may be determined by a user. Different cutoff p value may lead to a linear regression model comprising more or less SNPs.
- the pre-determined number of SNPs can be any integer. For example, the pre-determined number can be at least 2, 5, 10, 15, 20, 25, 30, 35, 40 and the like. In some cases, the pre-determined number may be determined according to a desired prediction accuracy or computation cost. Different number of SNPs included in the subset for calculating a frailty index may be associated with different level of accuracy. For example, a subset of 10 SNPs may produce a frailty prediction assessment more accurate than a subset of 2 SNPs. In some cases, an optimal number of SNPs (e.g., 20) may be determined such that the frailty index can be computed with low computation cost and a sufficient accuracy.
- Further analysis may be performed to identify and verify the plurality SNPs.
- in silico functional genomic analysis of associated regions may be performed.
- the list of SNPs may be prepared for functional annotation.
- Genetic correlation between the frailty index and a plurality of complex traits may be analyzed using blood parameters including white blood cell (white blood cell (leukocyte) count, red blood cell (erythrocyte) count, mean corpuscular volume, mean corpuscular haemoglobin, platelet count.
- FIG. 5A shows a plurality of traits having significant genetic correlation with frailty. Next these genetic correlated traits and the identified SNPs may be analyzed to identify associated regions. Additionally, HEIDI test may be performed to verify and further identify the longevity associated genes.
- the plurality of SNPs may be identified using the genome- wide association study (GW AS).
- GW AS can be used to measure and analyze nucleic acid sequence variations (genetic variations) across the human genome to determine specific genetic risk factors observed in conditions that are shared among the a group of individuals.
- Genetic variations can comprise mutations, such as single nucleotide polymorphisms (SNP).
- SNPs may be a unique unit of genetic variation, which can function as a marker of the genomic region. Common conditions, such as diseases and frailty, may be impacted by genetic variations that are shared by a group of individuals.
- CNV Copy Number Variation
- Chromosomal inversions any type of epigenetic variations.
- Genetic variations may be investigated at the level of haplotypes where sets of genetic variations are co-inherited. Associations between a phenotypical trait and a genetic variant may not necessarily mean that the variant is causative for the trait.
- Several methods for identifying a genetic variant may be known to a person skilled in the art. Any of such method may be employed to determine if a genetic variation is associated with the phenotypical observation. Both discreet phenotypical observation such as eye color as well as continuous observations such as height may associate to a genetic variation, being a single SNP or a set of different genetic variations.
- the sources of data, such as phenotypes and genotypes, for GWAS may be obtained from biobanks. Phenotypes and genotypes may be obtained directly from biobanks, such as the United Kingdom biobank (UKB).
- UUKB United Kingdom biobank
- select samples may be analyzed genome-wide for association with the estimated frailty in the GW AS. Select groups of samples may be removed as determined by factors such as quality control testing, genetically inferred sex mismatch, UKB recommended genomic analysis exclusions and samples that comprise genetic relatedness pairing.
- the logarithm of the hazard ratio prediction for each sample in the testing cohort may be determined as disclosed herein and can represent a measure of an individual’s frailty and susceptibility to premature death relative to a population’s mean risks and may be further used as a phenotype for GWAS.
- GWAS may be conducted for a select list of traits from the constructed Cox LH model and from other traits such as white blood cell (leukocyte) count, red blood cell (erythrocyte) count, mean corpuscular volume, mean corpuscular haemoglobin, and platelet count.
- Genome-wide microarray arrays for GWAS can include but are not limited to Affymetrix Genome-Wide Human SNP 6.0 arrays and Affymetrix Genome-Wide Human SNP 5.0 arrays, lllumina HD BeadChip, NimbleGen CGH Microarrays, Agilent GCH.
- the GWAS may comprise the steps of analyzing the genetic variations of the genetic profiles and then identifying the genetic variants that throughout the sample cohort correlate and associate with the frailty index, thereby identifying a genetic variation that is associated with a phenotypical feature.
- imputation may be used to determine the statistical inference of unobserved genotypes.
- the imputation of SNPs not on the genotype chip may be utilized for the study. Such a process may increase the number of SNPs that can be tested for association, increases the power of the study, and facilitates meta-analysis of GWAS across distinct cohorts.
- known genetic variants in a population may be first obtained from sources such as the Haplotype Map of the Human Genome or the 1000 Genomes Project. Imputation may utilize information from the linkage disequilibrium (LD) structure in a sequence to infer the alleles of SNPs not directly genotyped in the study (hidden SNPs).
- LD linkage disequilibrium
- LD is a property of SNPs on a contiguous stretch of genomic sequence and may be the non-random association of alleles at different loci in a given population. Additionally, LD can indicate the degree to which an allele of one SNP is inherited or correlated with an allele of another SNP within a population.
- Genotype imputation may be performed by statistical methods that combine the GWAS data together with a reference panel of haplotypes. Such methods can be conducted by sharing haplotypes between individuals over portions of sequence to impute alleles. Examples of software packages available to impute genotypes from a genotyping array to reference panels, such as 1000 Genomes Project haplotypes may include MaCH, Minimac, IMPUTE2, and Beagle. Prior to imputation, phasing tools such as SHAPEIT2 can allow for pre-phasing of input genetic variations, for improved imputation accuracy and computational performance. In other embodiments, the genotyping and imputation data may be obtained from the biobank. After groups of phenotypes have been selected for a study population, and the genotypes have been obtained using methods known to those skilled in the art, the statistical analysis of genetic data may then be obtained.
- phasing tools such as SHAPEIT2 can allow for pre-phasing of input genetic variations, for improved imputation accuracy and computational performance.
- the analysis of the genome-wide association data may be a series of single-locus statistic tests, examining each SNP independently for association to the phenotype.
- the quantitative traits may be analyzed using generalized linear model (GLM) approaches, such as the Analysis of Variance (ANOVA).
- GLM generalized linear model
- ANOVA Analysis of Variance
- the categorical case-control traits may be analyzed using either contingency table methods or logistic regression.
- the statistical tests may be adjusted for factors that are known to influence the trait, such as sex, age, study site, and known clinical covariates. Covariate adjustments of this sort may be important as they reduce the misleading associations due to sampling artifacts or biases in study design.
- certain statistical parameters serves are indicators of the degree of association between the genetic variant and frailty, such as the effect allele frequency, the p-value, and the regression coefficient estimate (b-value).
- an effective allele and a reference allele may be determined.
- the effective allele may be the coded allele and the reference allele may be the non-coded allele.
- Most chips used in GWAS can distinguish between two genotypes at a given locus, which are the two alleles. As a result, the frequency of these alleles in the total population from their frequency in a sample population may be determined.
- the frequency of the effective allele may comprise a value of at most about 0.8, at most about 0.5, at most about 0.3, or at most about 0.1.
- the p-value may refer to the statistical significance and probability of the association to a particular SNP.
- the p-value may be a threshold set at most about lxlO 3 , at most about lxlO 4 , at most about lxlO 5 , at most about lxlO 6 , at most about lxlO 7 , at most about 1x10 8 , at most about lxlO 9 , at most about lxlO 10 , or at most about lxlO 15 .
- the regression coefficient may be another important parameter to determine during statistical analysis in GWAS.
- the regression coefficient is a parameter that represents the strength of association between the SNP and the frailty index. Examples of softwares for analyzing quantitative phenotypic data and for association testing may comprise RegScan, SNPTEST, or PLINK.
- a plurality of SNPs is identified to be associated with longevity or frailty using the method herein.
- these SNPs can comprise one or more of those as listed in the table of FIG. 4.
- One or more of these SNPs can be used in the frailty prediction model as described herein.
- the plurality of SNPs may comprise one or more of rs76207570, rs3811444, rs2250l27, rs9272588, rs559648l8, and rs4332427.
- the plurality of SNPs comprises each of the SNPs rs76207570, rs38H444, rs2250l27, rs9272588, rs559648l8, and rs4332427. In some cases, the plurality of SNPs comprises only some of rs76207570, rs38H444, rs2250l27, rs9272588, rs559648l8, and rs4332427.
- the plurality of SNPs comprises one or more of rs76207570, rs3811444, rs2250l27, rs9272588, rs559648l8, rs4332427, rs674195l, rsl4376l99l, rs3580H34, rs3465l, rs689162l, rsl0947428, rs7808664, rsl3282l06, rsl0793962, rsl500804l5, rsl l852372, rs3743445, rs9892942, rs7502233, rs4633 l2, rs622l7799, and rsl43728l89.
- the plurality of SNPs comprises each of the rs76207570, rs38H444, rs2250l27, rs9272588, rs559648l8, rs4332427, rs674l951, rsl4376l99l, rs3580H34, rs3465l, rs689l62l, rsl0947428, rs7808664, rsl3282l06, rs 10793962, rsl50080415, rsl l852372, rs3743445, rs9892942, rs7502233, rs4633l2, rs62217799, and rs 143728189.
- the plurality of SNPs comprises only some of rs76207570, rs38H444, rs2250l27, rs9272588, rs559648l8, rs4332427, rs674l951, rsl4376l99l, rs3580H34, rs3465l, rs689l62l, rsl0947428, rs7808664, rsl3282l06, rs 10793962, rsl50080415, rsl l852372, rs3743445, rs9892942, rs7502233, rs4633l2, rs62217799, and rs 143728189.
- the plurality of SNPs can comprise one or more of rs76207570, rs38H444, rs2250l27, rs9272588, rs55964818, rs4332427, rs3580H34, rs689l621, rs7808664, rsl3282l06, rsl0793962, rsl50080415, rs3743445, rs9892942, rs7502233 and rs4633 l2.
- the plurality of SNPs can comprise each of rs76207570, rs38H444, rs2250l27, rs9272588, rs559648l8, rs4332427, rs3580H34, rs689l62l, rs7808664, rsl3282l06, rsl0793962, rs 150080415, rs3743445, rs9892942, rs7502233 and rs4633l2.
- the plurality of SNPs may comprise at least two SNPs such as rs76207570 and rs38l 1444, rs76207570 and rs2250l27, rs76207570 and rs9272588, rs76207570 and rs559648l8, rs76207570 and rs4332427, rs38H444 and rs2250127, rs38H444 and rs9272588, rs38H444 and rs559648l8, rs3811444 and rs4332427, rs2250!27 and rs9272588, rs2250!27 and rs559648l8, rs2250l27 and rs4332427, rs9272588 and rs559648l8, rs9272588 and rs4332427, rs559648l8 and rs4332427.
- SNPs such
- the plurality of SNPs may comprise three SNPs selected from one of the following groups: ⁇ rs76207570, rs38H444, rs2250l27 ⁇ , ⁇ rs76207570, rs38H444, rs9272588 ⁇ , ⁇ rs76207570, rs38H444, rs559648l8 ⁇ , ⁇ rs76207570, rs38H444, rs4332427 ⁇ , ⁇ rs38H444, rs2250l27, rs9272588 ⁇ , ⁇ rs38l 1444, rs2250127, rs559648l8 ⁇ , ⁇ rs38H444, rs2250l27, rs559648l8 ⁇ , ⁇ rs38H444, rs2250127,rs4332427 ⁇ , or ⁇ rs9272588, rs559648l8, rs4332427 ⁇ .
- the plurality of SNPs may comprise four SNPs selected from one of the following groups: ⁇ rs2250l27, rs9272588, rs559648l8, rs4332427 ⁇ , ⁇ rs76207570, rs9272588, rs559648l8, rs4332427 ⁇ , ⁇ rs76207570, rs38H444, rs559648l8, rs4332427 ⁇ , ⁇ rs76207570, rs38H444, rs2250l27, rs4332427 ⁇ , ⁇ rs76207570, rs38H444, rs2250127, rs9272588 ⁇ , ⁇ rs38H444, rs9272588, rs559648l8, rs4332427 ⁇ , ⁇ rs38H444, rs2250l27, rs559648l8, rs4332427 ⁇ , ⁇ rs38H444,
- the plurality of SNPs may comprise five SNPs selected from one of the following groups: ⁇ rs38H444, rs2250l27, rs9272588, rs559648l8, rs4332427 ⁇ , ⁇ rs76207570, rs2250l27, rs9272588, rs559648l8, rs4332427 ⁇ , ⁇ rs76207570, rs38H444, rs9272588, rs55964818, rs4332427 ⁇ , ⁇ rs76207570, rs38H444, rs2250127, rs559648l8, rs4332427 ⁇ , ⁇ rs76207570, rs38H444, rs2250l27, rs9272588, rs4332427 ⁇ , ⁇ rs76207570, rs38H444, rs2250l27, rs9272588, rs43324
- one or more genetic markers having increased association with the frailty phenotype may be located within 10 kb, 50 kb, 100 kb, 200 kb, 300 kb, 400 kb, or 500 kb of any one of the SNPs described herein, such as SNPs selected from rs76207570, rs3811444, rs2250l27, rs9272588, rs559648l8, rs4332427, rs674195l, rsl4376l99l, rs3580H34, rs34651, rs689162l, rsl0947428, rs7808664, rsl3282106, rsl0793962, rsl500804l5, rsl l852372, rs3743445, rs9892942, rs7502233, rs4633l2, rs622l7799,
- multiple linear regression analysis is applied to the selected set of associated SNPs and frailty index to construct the frailty prediction model, adjusted for sex, age and the first ten genetics principal components 209.
- a set of coefficients associated the SNPs are produced and a frailty index can be calculated based on an individual’s genetic data.
- the frailty index may be the logarithm of the hazard ratio which is calculated as the weighted sum of the set of SNPs.
- a variety of longevity or frailty related metrics for an individual may be generated using the frailty prediction model 211, for example providing a frailty assessment for the individual.
- the frailty assessment may comprise various longevity or frailty related metrics derived from the hazard ratio.
- the various longevity or frailty related metrics may include, for example, expected life span, rate of death, genetic predisposition to premature death, frailty risk relative to average group and the like.
- a plurality of genes for human predicted frailty is identified using the provided method.
- the plurality of genes may comprise MACF1 and TRIM58.
- FIG. 11 shows a list of candidate genes in regions associated with predicted frailty among which MACF1 and TRIM58 are closely related to frailty.
- FIG. 6 shows an example of a network layout 600 comprising frailty prediction systems 610, in accordance with some embodiments.
- network layout 600 may include one or more user devices 602, a server 604, a network 606, one or more databases 608, and a frailty prediction system 610.
- Each of the components 602, 604, 608, and 610 may be operatively connected to one another via the network 606.
- the network 606 may comprise any type of communication links that allows transmission of data from one electronic component to another.
- a user device may be, for example, one or more computing devices configured to perform one or more operations consistent with the disclosed embodiments.
- a user device may be a computing device that is capable of executing software or applications provided by one or more frailty prediction systems.
- the software and/or applications may provide to a user frailty to longevity related result.
- the user may or may not be asked to provide user information via the software or applications.
- the software and/or applications may be provided on a frailty prediction server or locally on the user device.
- the server or the software may retrieve genetic data associated with the user stored in a database.
- the genetic data may be processed by the software or application to generate frailty or longevity prediction result.
- Computation of the frailty or longevity prediction result may or may not require user input (e.g., sex, age, ethnics, etc).
- the frailty prediction software or application is designed to allow the user to obtain accurate frailty risk or longevity related information with minimum user input.
- the frailty prediction calculation may be hosted by the server on one or more interactive webpages, and accessed by one or more users.
- a user device can include, among other things, desktop computers, laptops or notebook computers, mobile devices (e.g., smart phones, cell phones, personal digital assistants (PDAs), and tablets), or wearable devices (e.g., smartwatches).
- a user device can also include any other media content player, for example, a set-top box, a television set, a video game system, or any electronic device capable of providing or rendering data.
- a user device may include known computing components, such as one or more processors, and one or more memory devices storing software instructions executed by the processor(s) and data.
- the network layout may include a plurality of user devices. Each user device may be associated with a user. Users may include any individual or groups of individuals using software or applications provided by the frailty prediction system. For example, the users may access a user device or a web account using an application programmable interface (API) provided by the frailty prediction system. In some embodiments, more than one user may be associated with a user device. Alternatively, more than one user device may be associated with a user. The users may be located geographically at a same location, for example users working in a same office or a same geographical location. In some instances, some or all of the users and user devices may be at remote geographical locations (e.g., different cities, countries, etc.), although this is not a limitation of the invention.
- API application programmable interface
- the network layout may include a plurality of nodes. Each user device in the network layout may correspond to a node. If a“user device 602” is followed by a number or a letter, it means that the“user device 602” may correspond to a node sharing the same number or letter. For example, as shown in FIG. 6, user device 602-1 may correspond to node 1 which is associated with user 1, user device 602-2 may correspond to node 2 which is associated with user 2, and user device 602 -k may correspond to node k which is associated with user k, where k may be any integer greater than 1.
- a node may be a logically independent entity in the network layout. Therefore, the plurality of nodes in the network layout can represent different entities. For example, each node may be associated with a user, a group of users, or groups of users. For example, in one embodiment, a node may correspond to an individual entity (e.g., an individual). In some particular embodiments, a node may correspond to multiple entities (e.g., a group of individuals).
- a user may be registered or associated with an entity that provides services associated with one or more operations performed by the disclosed embodiments.
- the user may be a registered user of an entity (e.g., a company, an organization, an individual, etc.) that provides one or more of servers 604, databases 608, and/or frailty prediction systems 610 for frailty risk prediction consistent with certain disclosed embodiments.
- entity e.g., a company, an organization, an individual, etc.
- the disclosed embodiments are not limited to any specific relationships or affiliations between the users and an entity, person(s), or entities providing server 604, databases 608, and frailty prediction systems 610.
- a user device may be configured to receive input from one or more users.
- a user may provide an input to a user device using an input device, for example, a keyboard, a mouse, a touch-screen panel, voice recognition and/or dictation software, or any combination of the above.
- the input may include a user performing various virtual actions during a frailty risk prediction session.
- the input may include, for example, a user selecting a desired frailty or longevity related result to view from a plurality of options that are presented to the user during a frailty risk prediction session.
- the input may include a user providing permission to the server to access genetic data of the user.
- the input may include a user providing user credentials such as password or biometrics to verify the identity of the user in order to user the software or application.
- two-way data transfer capability may be provided between the server and each user device.
- the user devices can also communicate with one another via the server (i.e., using a client-server architecture).
- the user devices can communicate directly with one another via a peer-to-peer communication channel.
- the peer-to- peer communication channel can help to reduce workload on the server by utilizing resources (e.g., bandwidth, storage space, and/or processing power) of the user devices.
- a server may comprise one or more server computers configured to perform one or more operations consistent with disclosed embodiments.
- a server may be implemented as a single computer, through which a user device is able to communicate with other components of the network layout.
- a user device may communicate with the server through the network.
- the server may communicate on behalf of a user device with the frailty prediction system(s) or the database through the network.
- the server may embody the functionality of one or more frailty prediction system(s).
- the frailty prediction system(s) may be implemented inside and/or outside of the server.
- the frailty prediction system(s) may be software and/or hardware components included with the server or remote from the server.
- a user device may be directly connected to the server through a separate link (not shown in FIG. 6).
- the server may be configured to operate as a front-end device configured to provide access to one or more frailty prediction system(s) consistent with certain disclosed embodiments.
- the server may, in some embodiments, utilize the frailty prediction system(s) to process input data from a user device in order to retrieve genetic data from a database to compute a frailty risk prediction or longevity related result.
- the server may be configured to store the users’ frailty prediction result data in the database.
- the server may also be configured to search, retrieve, and analyze (compare) genetic data and log-in information stored in the database. In some cases, the data and information may include a user’s previous frailty calculation result or user input non-genetic information.
- a server may include a web server, an enterprise server, or any other type of computer server, and can be computer programmed to accept requests (e.g., HTTP, or other protocols that can initiate data transmission) from a computing device (e.g., a user device) and to serve the computing device with requested data.
- a server can be a broadcasting facility, such as free-to-air, cable, satellite, and other broadcasting facility, for distributing data.
- a server may also be a server in a data network (e.g., a cloud computing network).
- a server may include known computing components, such as one or more processors, one or more memory devices storing software instructions executed by the processor(s), and data.
- a server can have one or more processors and at least one memory for storing program instructions.
- the processor(s) can be a single or multiple microprocessors, field programmable gate arrays (FPGAs), or digital signal processors (DSPs) capable of executing particular sets of instructions.
- Computer-readable instructions can be stored on a tangible non- transitory computer-readable medium, such as a flexible disk, a hard disk, a CD-ROM (compact disk-read only memory), and MO (magneto-optical), a DVD-ROM (digital versatile disk-read only memory), a DVD RAM (digital versatile disk-random access memory), or a semiconductor memory.
- a tangible non- transitory computer-readable medium such as a flexible disk, a hard disk, a CD-ROM (compact disk-read only memory), and MO (magneto-optical), a DVD-ROM (digital versatile disk-read only memory), a DVD RAM (digital versatile disk-random access memory), or a semiconductor memory.
- the methods disclosed herein can be implemented in hardware components or combinations of hardware and software such as, for example, ASICs, special purpose computers, or general purpose computers. While FIG. 6 illustrates the server as a single server, in some embodiments, multiple devices may implement the functionality associated with the server.
- the network may be configured to provide communication between various components of the network layout depicted in FIG. 6.
- the network may be implemented, in some embodiments, as one or more networks that connect devices and/or components in the network layout for allowing communication between them.
- the network may be implemented as the Internet, a wireless network, a wired network, a local area network (LAN), a Wide Area Network (WANs), Bluetooth, Near Field Communication (NFC), or any other type of network that provides communications between one or more components of the network layout.
- the network may be implemented using cell and/or pager networks, satellite, licensed radio, or a combination of licensed and unlicensed radio.
- the network may be wireless, wired, or a combination thereof.
- the frailty prediction system(s) may be implemented as one or more computers storing instructions that, when executed by one or more processors), generate a plurality of frailty or longevity related results from which a user can select to view conform to a format that is defined by the user.
- the frailty prediction system(s) may compute a frailty index of the user by retrieving genetic data from a database associated with the user, and may further calculate longevity related result according to a user selection (e.g., lifespan, frailty risk, etc).
- the frailty prediction system(s) may further display the frailty prediction result to the user in a format predetermined by the frailty prediction system or by the user.
- the frailty prediction system(s) may or may not require user identification information in order to verify or authenticate the user to obtain the associated genetic data of the user or perform the frailty prediction functions.
- the server may be the computer in which the frailty prediction system(s) are implemented.
- the frailty prediction system(s) may be implemented on separate computers.
- a user device may send a user input to the server, and the server may connect to other frailty prediction system(s) over the network.
- the frailty prediction system(s) may comprise software that, when executed by processor(s), perform processes for computing a frailty risk or longevity related result for a user.
- the frailty prediction system(s) may further perform analysis of the frailty prediction results and provide recommendations or insights on the frailty prediction results.
- the server may access and execute the frailty prediction system(s) to perform one or more processes consistent with the disclosed embodiments.
- the frailty prediction system(s) may be software stored in memory accessible by the server (e.g., in a memory local to the server or remote memory accessible over a communication link, such as the network).
- the frailty prediction system(s) may be implemented as one or more computers, as software stored on a memory device accessible by the server, or a combination thereof.
- one frailty prediction system may be computer hardware executing one or more frailty prediction calculations
- another frailty prediction system may be software that, when executed by the server, performs further analysis of the frailty prediction results such as providing recommendations or insights on the frailty prediction results.
- the frailty prediction system(s) can be used to provide frailty risk or longevity related information to users in a variety of different ways.
- the frailty prediction system(s) may store and/or execute software that performs a computation of frailty index of the user based on user genetic data retrieved from a database and a frailty prediction model.
- the frailty prediction system(s) may also store and/or execute software that performs further analysis of the frailty prediction results of the user or may provide lifestyle or clinical insights based on the results.
- the frailty prediction system(s) may store and/or execute software that performs an algorithm to dynamically select a frailty prediction model according to sex, age, or ethnicity from user input data.
- the frailty prediction system(s) may further store and/or execute software that performs an algorithm for dynamically updating the frailty prediction model when more training data becoming available.
- the frailty prediction system(s) may further store and/or execute software that performs process to construct a frailty prediction model consistent with the method disclosed herein.
- the disclosed embodiments may be configured to implement the frailty prediction system(s) such that a variety of algorithms may be performed for performing frailty prediction analysis and/or constructing a frailty prediction model. Although a plurality of frailty prediction systems have been described for performing the above algorithms, it should be noted that some or all of the algorithms may be performed using a single frailty prediction system, consistent with disclosed embodiments.
- the user devices, the server, and the frailty prediction system(s) may be connected or interconnected to one or more database(s) 608-1, 608-2.
- the database(s) may be one or more memory devices configured to store data (e.g., genetic data, frailty prediction models, historical frailty prediction result, etc.). Additionally, the database(s) may also, in some embodiments, be implemented as a computer system with a storage device. In one aspect, the database(s) may be used by components of the network layout to perform one or more operations consistent with the disclosed embodiments. In certain embodiments, one or more the database(s) may be co-located with the server, or may be co-located with one another on the network. One of ordinary skill will recognize that the disclosed embodiments are not limited to the configuration and/or arrangement of the database(s).
- any of the user devices, the server, the database(s), and/or the frailty prediction system(s) may, in some embodiments, be implemented as a computer system.
- the network is shown in FIG. 6 as a "central" point for communications between components of the network layout, the disclosed embodiments are not limited thereto.
- one or more components of the network layout may be interconnected in a variety of ways, and may in some embodiments be directly connected to, co-located with, or remote from one another, as one of ordinary skill will appreciate.
- the disclosed embodiments may be implemented on the server, the disclosed embodiments are not so limited.
- other devices such as one or more user devices
- FIG. 7 shows an example of a user device 700 by which a user may access frailty assessment.
- a user device 700 may be, for example, one or more computing devices configured to perform one or more operations consistent with the disclosed embodiments.
- a user device may be a computing device that is capable of executing software or applications provided by one or more frailty prediction systems.
- the user device may comprise a display screen 701 to display various longevity or frailty related metrics to the user.
- the display screen 701 may display input from the user to the user to facilitate use of the device to input information used to generate and display the desired frailty assessment parameter.
- a user device can include, among other things, desktop computers, laptops or notebook computers, mobile devices (e.g., smart phones, cell phones, personal digital assistants (PDAs), and tablets), or wearable devices (e.g., smartwatches).
- a user device can also include any other media content player, for example, a set-top box, a television set, a video game system, or any electronic device capable of providing or rendering data.
- a user device may include known computing components, such as one or more processors, and one or more memory devices storing software instructions executed by the processor(s) and data.
- the user device may optionally be portable.
- the user device may be handheld.
- the user device may include a display 701. The display may visually illustrate information. The information shown on the display may be changeable.
- the display may include a screen, such as a liquid crystal display (LCD) screen, light-emitting diode (LED) screen, organic light-emitting diode (OLED) screen, plasma screen, electronic ink (e-ink) screen, touchscreen, or any other type of screen or display.
- a screen such as a liquid crystal display (LCD) screen, light-emitting diode (LED) screen, organic light-emitting diode (OLED) screen, plasma screen, electronic ink (e-ink) screen, touchscreen, or any other type of screen or display.
- LCD liquid crystal display
- LED light-emitting diode
- OLED organic light-emitting diode
- plasma screen plasma screen
- e-ink screen electronic ink
- the display may show a graphical user interface.
- the graphical user interface may be part of a browser, software, or application that may aid in the user performing a frailty prediction function using the device.
- the interface may allow the user to run the application using the device.
- the interface may be configured to receive user input as described elsewhere herein. Using the graphical user interface may or may not require user identification and/or authentication.
- the graphical user interface may allow a user to view frailty or longevity related metrics.
- the user may be allowed to select one or more frailty or longevity related metrics to view.
- the one or more frailty or longevity related metrics may include, for example, lifespan, frailty risk, frailty risk or lifespan relative to a group average value and the like.
- the group or the reference group may be predetermined by the user. For instance, a user may be permitted to determine to view a frailty risk relative to a group in a pre-determined geographic location, a group with a particular cohort, a group with the same ethnicity and the like.
- the graphical user interface may allow the user to set up a format of the frailty or longevity related metrics.
- the user may be allowed to select a user preferred format to view the result by age group, by timeline, in the form of bar graphs, pie chart, histograms, line charts, numerical numbers (e.g., risk score) or percentage (e.g., percentage of risk relative to the group), or various other forms.
- the user device may be capable of accepting inputs via a user interactive device 703.
- user interactive devices may include a keyboard, button, mouse, touchscreen, touchpad, joystick, trackball, camera, microphone, motion sensor, heat sensor, inertial sensor, or any other type of user interactive device.
- a user may input user information such as command to initiate the frailty prediction calculation, non-genetic information (e.g., sex, age, ethnicity, etc) through the user interactive device.
- the user device may comprise one or more memory storage units which may comprise non-transitory computer readable medium comprising code, logic, or instructions for performing one or more steps.
- the user device may comprise one or more processors capable of executing one or more steps, for instance in accordance with the non-transitory computer readable media.
- the one or more memory storage units may store one or more software applications or commands relating to the software applications.
- the one or more processors may, individually or collectively, execute steps of the software application.
- a communication unit may be provided on the device.
- the communication unit may allow the user device to communicate with an external device.
- the external device may be a device of a transaction entity, server, or may be a cloud-based infrastructure.
- the communications may include communications over a network or a direct communication.
- the communication unit may permit wireless or wired communications. Examples of wireless communications may include, but are not limited to WiFi, 3G, 4G, LTE, radiofrequency, Bluetooth, infrared, or any other type of communications.
- the device may have an on-board power source.
- an external power source may provide power to power the user device.
- An external power source may provide power to the user device via a wired or wireless connection.
- An on-board power source may power an entirety of the user device, or one or more individual components of the wireless device.
- multiple on-board power sources may be provided that may power different components of the device. For instance, one or more sensor of the device may be powered using a separate source from one or more memory storage unit, processors, communication unit, and/or display of the device.
- FIG. 8 shows an exemplary process 800 of displaying longevity related metrics on a user device, such as a frailty assessment parameter, in response to a user input, in accordance with embodiments of the invention.
- a user may input information into a user device 801.
- the input can include information to initiate a process for generating a frailty assessment for the individual.
- the user input information may or may not be used as part of the calculation of the frailty prediction to generate the frailty assessment parameter for the individual.
- the user input provides access to genetic information of the individual used in the calculation of the frailty assessment.
- a user may be prompted to provide information such as sex, age, ethnicity that may be processed during the frailty prediction.
- a user may be allowed to select one or more longevity related metrics to be displayed on the user device as a result of the computation. For instance, a user may be allowed to select to view lifespan, or frailty risk relative to a group from a plurality of options. In some cases, a user may be allowed to select a format of displaying the longevity related result. The result may be displayed in text, graph, diagram, and/or numerical value form.
- a user may be permitted to view the predicted frailty risk by age group, by timeline, in the form of bar graphs, pie chart, histograms, line charts, numerical numbers (e.g., risk score) or percentage (e.g., percentage of risk relative to the group), or any other visual representation may be used to show the lifespan or frailty risk.
- a user may not need to provide such information in order to view the result.
- the user input information may comprise confirmation indicating user consent to access genetic data of the associated user. For example, a user may be prompted to confirm whether grant access to the frailty prediction system to access genetic data of the user.
- a user may be required to log into the application running on the user device by providing user credentials such as password, PIN or fingerprint.
- the user identity information may be transmitted to a genetic database for retrieving genetic data associated with the user 803.
- the genetic data may be processed by a frailty prediction model 805.
- the frailty prediction model may be locally stored with the user device.
- the frailty prediction model may be stored remotely from the user device.
- the frailty prediction model may be selected from a plurality of frailty prediction models according to one or more factors provided by the user input information 807.
- a plurality of frailty prediction models may be constructed and stored in a databased.
- the plurality of frailty prediction models may be constructed using datasets from different cohorts (e.g., sex, ethnicity, age, etc). For example, a frailty prediction model may be selected according to the sex, ethnicity or age of the user.
- a general frailty prediction model is used for all users.
- a frailty index may be calculated by the frailty prediction model 809. The calculation may be performed in part or in whole on the user device, and / or in part or in whole on a server in remote communication with the user device.
- the server may perform the computation and transmit the computed frailty index to the user device.
- the server may be configured to retrieve genetic data from a database and transmit the genetic data to the user device for computation.
- the computed frailty index may or may not be transmitted to the server for further analysis or calculation.
- the frailty index may be a logarithm of the hazard ratio of the user.
- the frailty index may or may not be displayed to the user on the user device.
- the associated longevity related result may be calculated 811 based on the frailty index.
- a user may be permitted to provide user options for the longevity related result at any stage. For example, a user may provide input to view lifespan or frailty risk at the initial of the process or after calculation of the frailty index.
- the longevity related result is displayed to the user on the user device 813.
- the genetic data may be retrieved from a third party database.
- the genetic data may be provided by a sample of tissue, blood, urine, or other substances in the body of the user and transported to a genome test site for producing the genetic data.
- a sample can be any biological sample isolated from a subject.
- a sample can comprise, without limitation, bodily fluid, whole blood, platelets, serum, plasma, stool, red blood cells, white blood cells or leukocytes, endothelial cells, tissue biopsies, synovial fluid, lymphatic fluid, ascites fluid, interstitial or extracellular fluid, the fluid in spaces between cells, including gingival crevicular fluid, bone marrow, cerebrospinal fluid, saliva, mucous, sputum, semen, sweat, urine, fluid from nasal brushings, fluid from a pap smear, or any other bodily fluids.
- a sample can comprise nucleic acids from different sources.
- a sample can comprise germline DNA or somatic DNA.
- a sample can comprise nucleic acids carrying mutations.
- a sample can comprise DNA carrying germline mutations and/or somatic mutations.
- a sample can also comprise DNA carrying cancer-associated mutations (e.g., cancer- associated somatic mutations).
- a sample comprises one or more of: a single base substitution, a copy number variation, an indel, a gene fusion, a transversion, a translocation, an inversion, a deletion, aneuploidy, partial aneuploidy, polyploidy, chromosomal instability, chromosomal structure alterations, chromosome fusions, a gene truncation, a gene amplification, a gene duplication, a chromosomal lesion, a DNA lesion, abnormal changes in nucleic acid chemical modifications, abnormal changes in epigenetic patterns, abnormal changes in distributions of nucleic acid (e.g., cfDNA) fragments across genomic regions, abnormal changes in distributions of nucleic acid (e.g., cfDNA) fragment lengths, and abnormal changes in nucleic acid methylation.
- nucleic acid chemical modifications abnormal changes in epigenetic patterns
- Methods herein can comprise obtaining certain amounts of nucleic acid molecules.
- the method can comprise obtaining up to about 600 ng, up to about 500 ng, up to about 400 ng, up to about 300 ng, up to about 200 ng, up to about 100 ng, up to about 50 ng, or up to about 20 ng of nucleic acid molecules from a sample.
- the method can comprise obtaining at least 1 femtogram (fg), at least 10 fg, at least 100 fg, at least 1 picogram (pg), at least 10 pg, at least 100 pg, at least 1 ng, at least 10 ng, at least 100 ng, at least 150 ng, or at least 200 ng of nucleic acid molecules.
- the method can comprise obtaining at most 1 femtogram (fg), at most 10 fg, at most 100 fg, at most 1 picogram (pg), at most 10 pg, at most 100 pg, at most 1 ng, at most 10 ng, at most 100 ng, at most 150 ng, or at most 200 ng of nucleic acid molecules.
- fg femtogram
- pg picogram
- the method can comprise obtaining 1 femtogram (fg) to 200 ng, 1 picogram (pg) to 200 ng, 1 ng to 100 ng, 10 ng to 150 ng, 10 ng to 200 ng, 10 ng to 300 ng, 10 ng to 400 ng, 10 ng to 500 ng, 10 ng to 600 ng, 10 ng to 700 ng, 10 ng to 800 ng, 10 ng to 900 ng, or 10 ng to 1000 ng of nucleic acid molecules.
- Isolation and extraction of polynucleotides may be performed through collection of bodily fluids using a variety of techniques.
- collection may comprise aspiration of a bodily fluid from a subject using a syringe.
- collection may comprise pipetting or direct collection of fluid into a collecting vessel.
- polynucleotides may be isolated and extracted using a variety of techniques utilized in the art.
- DNA may be isolated, extracted and prepared using commercially available kits such as the Qiagen Qiamp® Circulating Nucleic Acid Kit protocol.
- Qiagen QubitTM dsDNA HS Assay kit protocol AgilentTM DNA 1000 kit, or TruSeqTM Sequencing Library Preparation; Low-Throughput (LT) protocol may be used.
- Purification of DNA may be accomplished using any methodology, including, but not limited to, the use of commercial kits and protocols provided by companies such as Sigma Aldrich, Life Technologies, Promega, Affymetrix, IBI or the like. Kits and protocols may also be non-commercially available.
- the polynucleotides may be pre-mixed with one or more additional materials, such as one or more reagents (e.g., ligase, protease, polymerase) prior to determining the genetic variant.
- one or more reagents e.g., ligase, protease, polymerase
- SNP genotyping may be accomplished using methods selected from the group consisting of hybridization methods, enzyme based methods, post amplification methods, and/or sequencing.
- Hybridization-based methods may comprise dynamic allele- specific hybridization, molecular beacons, and SNP microarrays.
- Enzyme based methods may comprise one or more of restriction fragment length polymorphism, PCR-based methods, FLAP endonuclease, primer extension, 5’nuclease, and oligonucleotide ligation assay.
- Post amplification methods may comprise one or more of single strand conformation polymorphism, temperature gradient gel electrophoresis, denaturing high performance liquid chromatography, high resolution melting of the entire amplicon, use of DNA mismatch-binding proteins, SNPlex, and surveyor nuclease assay.
- FIG. 9 shows a computer system 901 that is programmed or otherwise configured to perform frailty prediction.
- the computer system 901 can regulate various aspects of sequence analysis of the present disclosure, such as, for example, matching data against known sequences and variants.
- the computer system 901 can be an electronic device of a user or a computer system that is remotely located with respect to the electronic device.
- the electronic device can be a mobile electronic device.
- the computer system 901 includes a central processing unit (CPU, also“processor” and“computer processor” herein) 905, which can be a single core or multi core processor, or a plurality of processors for parallel processing.
- the computer system 901 also includes memory or memory location 910 (e.g., random-access memory, read-only memory, flash memory), electronic storage unit 915 (e.g., hard disk), communication interface 920 (e.g., network adapter) for communicating with one or more other systems, and peripheral devices 925, such as cache, other memory, data storage and/or electronic display adapters.
- the memory 910, storage unit 915, interface 920 and peripheral devices 925 are in communication with the CPU 905 through a communication bus (solid lines), such as a motherboard.
- the storage unit 915 can be a data storage unit (or data repository) for storing data.
- the computer system 901 can be operatively coupled to a computer network (“network”) 930 with the aid of the communication interface 920.
- the network 930 can be the Internet, an internet and/or extranet, or an intranet and/or extranet that is in communication with the Internet.
- the network 930 in some cases is a telecommunication and/or data network.
- the network 930 can include one or more computer servers, which can enable distributed computing, such as cloud computing.
- the network 930, in some cases with the aid of the computer system 901, can implement a peer-to-peer network, which may enable devices coupled to the computer system 901 to behave as a client or a server.
- the CPU 905 can execute a sequence of machine-readable instructions, which can be embodied in a program or software.
- the instructions may be stored in a memory location, such as the memory 910.
- the instructions can be directed to the CPU 905, which can subsequently program or otherwise configure.
- the CPU 905 to implement methods of the present disclosure. Examples of operations performed by the CPU 905 can include fetch, decode, execute, and writeback.
- the CPU 905 can be part of a circuit, such as an integrated circuit.
- a circuit such as an integrated circuit.
- One or more other components of the system 901 can be included in the circuit.
- the circuit is an application specific integrated circuit (ASIC).
- the storage unit 915 can store files, such as drivers, libraries and saved programs.
- the storage unit 915 can store user data, e.g., user preferences and user programs.
- the computer system 901 in some cases can include one or more additional data storage units that are external to the computer system 901, such as located on a remote server that is in communication with the computer system 901 through an intranet or the Internet.
- the computer system 901 can communicate with one or more remote computer systems through the network 930.
- the computer system 901 can communicate with a remote computer system of a user (e.g., a physician).
- remote computer systems include personal computers (e.g., portable PC), slate or tablet PC’s (e.g., Apple® iPad, Samsung® Galaxy Tab), telephones, Smart phones (e.g., Apple® iPhone, Android-enabled device, Blackberry®), or personal digital assistants.
- the user can access the computer system 901 via the network 930.
- Methods as described herein can be implemented by way of machine (e.g., computer processor) executable code stored on an electronic storage location of the computer system 901, such as, for example, on the memory 910 or electronic storage unit 915.
- the machine executable or machine readable code can be provided in the form of software.
- the code can be executed by the processor 905.
- the code can be retrieved from the storage unit 915 and stored on the memory 910 for ready access by the processor 905.
- the electronic storage unit 915 can be precluded, and machine-executable instructions are stored on memory 910.
- the code can be pre-compiled and configured for use with a machine having a processer adapted to execute the code, or can be compiled during mntime.
- the code can be supplied in a programming language that can be selected to enable the code to execute in a pre compiled or as-compiled fashion.
- aspects of the systems and methods provided herein can be embodied in programming.
- Various aspects of the technology may be thought of as “products” or“articles of manufacture” typically in the form of machine (or processor) executable code and/or associated data that is carried on or embodied in a type of machine readable medium.
- Machine-executable code can be stored on an electronic storage unit, such as memory (e.g., read only memory, random-access memory, flash memory) or a hard disk.
- “Storage” type media can include any or all of the tangible memory of the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide non-transitory storage at any time for the software programming. All or portions of the software may at times be communicated through the Internet or various other telecommunication networks. Such communications, for example, may enable loading of the software from one computer or processor into another, for example, from a management server or host computer into the computer platform of an application server.
- another type of media that may bear the software elements includes optical, electrical and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links.
- a machine readable medium such as computer-executable code
- a tangible storage medium such as computer-executable code
- Non-volatile storage media include, for example, optical or magnetic disks, such as any of the storage devices in any computer(s) or the like, such as may be used to implement the databases, etc. shown in the drawings.
- Volatile storage media include dynamic memory, such as main memory of such a computer platform.
- Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that comprise a bus within a computer system.
- Carrier- wave transmission media may take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and infrared (IR) data communications.
- RF radio frequency
- IR infrared
- Common forms of computer-readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a ROM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer may read programming code and/or data.
- Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution.
- the computer system 901 can include or be in communication with an electronic display 935 that comprises a user interface (UI) 940 for providing, for example, information about cancer diagnosis.
- UI user interface
- Examples of UI’s include, without limitation, a graphical user interface (GUI) and web-based user interface.
- GUI graphical user interface
- Methods and systems of the present disclosure can be implemented by way of one or more algorithms.
- An algorithm can be implemented by way of software upon execution by the central processing unit 905. The algorithm can, for example, determine whether a cancer is present and/or progressing.
- UK Biobank is a prospective cohort study of over 500,000 individuals from across the United Kingdom.
- UK Biobank is an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. Participants, aged between 40 and 69, were invited to one of 22 centres across the UK between 2006 and 2010. Blood, urine and saliva samples were collected, physical measurements were taken, and each individual answered an extensive questionnaire focused on questions of health and lifestyle. All participants gave written informed consent and the study was approved by the North West Multicentre Research Ethics Committee.
- UK Biohank has Human Tissue Authority research tissue bank approval, meaning separate ethical approvals are not required to use the existing data. Genotyping is in progress, with a wave 1 public release of genotypes for -150,000 participants in June/July 2015. Phenotypes and genotypes data are available directly from UK Biobank.
- genotypes were available for 152,736 subjects at 847,441 sites, of which 815,490 were located on autosomes, 21,231, 1,041 and 310 at X, Y and MT chromosomes respectively.
- UK Biobank provided 15 principal components of genetic relatedness (UK Biobank field id 22009) and a binary assessment of whether subjects were genomically British (UK Biobank field id 22006), based on principal components analysis of their genetic data.
- Imputed data were prepared by UK Biobank.
- autosomal phasing was carried out using a version of SHAPEIT2 modified to allow for large sample sizes.
- Imputation was carried out using IMPUTE2 using the merged UK10K and 1,000 Genomes Phase 3 reference panels to yield higher imputation accuracy of British haplotypes.
- the imputations resulted in 73,355,362 SNPs, short indels and large structural variants, imputed in 152,727 individuals.
- a diverse set of measurements from UK Biobank study were used to construct a Cox PH model that has a good fit on mortality data.
- the following parameters were used: blood assay results, cognitive tests, physical measures (e.g., spirometry, anthropometry, blood pressure, grip strength), touchscreen questionnaires related the following life factors: lifestyle and environment, health and medical history, family history, psychosocial factors, early life factors, and other data (124 data-fields in total in Table 1).
- death register data required to build Cox PH model was used, which represents right-censored mortality data of two parameters: 1) follow up time in years for each person (defined as difference between date of person was last observed or the date of death and the date the person took the blood tests), 2) Boolean marker indicating whether death occurred.
- the mean follow-up time was 7 years with max of 10 years.
- 14419 events were observed.
- Box-Cox transformation were applied for normalization of the date- field.
- 442,698 samples that have reported British ethnicity were selected.
- These data were split into train datasets and test datasets based on whether genetics data are available or not for a given sample.
- 322,412 individuals who did not have genetic data were chosen as train cohort and 120,286 samples with genetics data available were chosen for testing cohort. In each cohort, missing data were imputed with mean values and normalized to zero mean and unit variance of 1.
- GWAS was performed for six traits: LnHR and five blood tratis: white blood cell (leukocyte) count (denoted as ukb30000), red blood cell (erythrocyte) count (denoted as ukb300l0), mean corpuscular volume (denoted as ukb30040), mean corpuscular haemoglobin (denoted as ukb30050) and platelet count (denoted as ukb30080).
- RegScan was used for genome-wide association testing. RegScan is a command line tool for performing fast association analysis between allele frequencies and continuous traits. It uses linear regression to estimate marker effects on continuous traits. The traits analyzed were adjusted for sex, age, genotyping array platform (Axiom or Affymetrix) and the first 10 genetic principal components (UK Biobank data field 22009) were used in the linear regression model. Residuals were inverse normal transformed with customized R script to zero mean and unit variance and used as an input to RegScan. RegScan was run in GWAS mode with default parameters, for each chromosome file separately.
- Imputed variants with minor allele count (MAC) more than 50 and imputation information score more than 0.3 were used for discovery cohort genotypes. Variants that led to more than two alleles in the same genomic position were excluded, leaving only bialleleic SNPs. Also a few SNPs that had the same rsID in different genomic locations were excluded. Call files for X, Y and MT chromosomes were converted to ped+map file format using tool provided by UK Biobank. PLINK which is a free, open-source whole genome association analysis toolset was used to convert ped+map to general file format suitable for RegScan as input.
- GCTA program genome -wide complex trait analysis
- the method starts with the "top SNP"(the one with smallest p-value, conditional that p ⁇ pO, where pO is specific threshold defined by user) in the meta-analysis and then the p-values for all the remaining SNPs are calculated conditional on the selected SNP. It then selects the next top SNP in the conditional analysis (p ⁇ pO) and proceeds to fit all the selected SNPs in the model meanwhile dropping all those SNPs with p-values > pO. The iteration continues until no SNP is added or dropped from the model thus finding a subset of associated SNPs with a threshold for LD (r2 ⁇ 0:9) between SNPs. Finally, a joint analysis of the subset of associated SNPs is performed.
- LD reference sub-sample of randomly chosen 10,000 people from the total set of 120,286 people were used for GW AS discovery phase.
- LD hub (a centralised database of summary -level GW AS results and a web interface for LD score regression) tools were used for estimation of captured heritability and genetic correlations for six traits (logarithm of hazard ratio and five blood traits) and 170 human traits and common diseases. LD score regression tool were used for estimation the genetics correlations between logarithm of hazard ratio and the five blood traits. All GW AS summary statistics were filtered by SNP quality r2>0.7 and MAF>0.05 (7001988 SNPs in total).
- 1,162,742 SNPs defined by overlap between the identified set of SNPs using the disclosed method and’high quality SNPs’ as suggested by authors of LD hub (these represent common HapMap3 SNPs that usually have high imputation quality; also, this set excludes HLA region) were used. These 1,162,742 SNPs were used for further analysis of heritability and genetics correlations, and also for estimation of genomic control inflation factor.
- FIG. 5B shows genetic correlation matrix and clustering for traits having high genetic correlations with predicted frailty.
- the matrix of genetic correlations for 196 traits provided by LD hub tool were downloaded and selected all traits that overlapped with the 170 human traits used for calculation of genetic correlations with predicted frailty and removed all duplicated traits by using only the most recent study (as indicated by the largest PMID number). This filtering has led to the total of 123 traits. Then traits with genetic correlation significance p-value ⁇ 0.0l/l75 and
- Example 8 In silico functional analysis [171] For prioritizing genes in associated regions, gene set enrichment and tissue or cell type enrichment analyses, DEPICT software (an integrative tool that based on predicted gene functions systematically prioritizes the most likely causal genes at associated loci, highlights enriched pathways, and identifies tissues/cell types where genes from associated loci are highly expressed) were used. Independent (as selected by Conditional and joint analysis procedure) variants with p ⁇ 5xl0 8 (23 SNPs) and p ⁇ lxlO 5 (185 SNPs) were included into analysis. A subset of 10,000 individuals from UK Biobank were used for computations of LD (the same subset as used for Conditional and joint analysis analysis).
- PAINTOR software (a probabilistic framework that integrates association strength with genomic functional annotation data to improve accuracy in selecting plausible causal variants for functional validation) was used to prepare the set of SNPs for functional annotation.
- LD matrices and annotation files to PAINTOR were provided.
- clumping analysis was set with pi and p2 p-value thresholds as p ⁇ 5xl0 8 , r2 as 0.1 and MAF>0.002.
- pair-wise correlation matrix was generated for all SNPs in each region in clumping analysis results using PLINK— r option. Text files filled with ones were used as annotation files.
- all output results were aggregated into one file and SNPs marked by PAINTOR as 99% credible set were chosen for functional annotation by VEP version with GRCH37 genomic reference.
- phenoscanner was not used in the screening phase, but rather tested directly if association results for specific probe are reported in the region of interest and, if positive, tested whether index/proxy SNP had p ⁇ 5xl0 8 in eQTL analysis; if positive, the HEIDI test was performed.
- phenoscanner was used to identify the tissues of interest, and, with selected tissues, performed the same analysis as for blood eQTLs described above.
- Multiple HEIDI testing will necessarily generate relatively low p-values even when hypothesis of pleiotropy is true; additionally, differences between patterns of association may be generated (and/or exaggerated) by local differences in LD between two populations where GWAS were performed. Patterns of association between two GWAS are sufficiently dissimilar if it was observed pHEIDI
- FIG. 10 shows that certain SNPs on certain chromosomes have increased association with the frailty phenotype, such as SNPs on one or more of chromosomes 1, 2, 3, 5, 6, 7, 8, 9, 10, 12, 15, 17, 20 and 22.
- SNPs with -log io(p- value) value greater than a threshold value such as the threshold value as shown by the horizontal line in FIG. 10, can be designated as SNPs with increased association the frailty phenotype.
- the SNPs with the increased association as shown from FIG. 10 correspond to those 23 SNPs as listed in the table of FIG. 4.
- a threshold of pSMR ⁇ 10-6 was used to decide that the twosample Mendelian randomisation (MR) relation between two traits is significant; in case of significant relation, hypothesis of pleiotropy was considered as likely explanation if pHEIDI > 0.05, and this hypothesis was rejected if pHEIDI ⁇ 10-4. Using these criteria strong candidate genes were suggested for at least five regions (see FIG. 11).
- FIG. 11 shows a list of candidate genes in regions associated with predicted frailty, as suggested by presence of missense mutations and/or SMR HEIDI.
- chromosome 1 region at 40 Mb (the MACFlregion) is associated to levels of C reactive protein, HDF and TG (24097068).
- the second region on chromosome 1 (at 248 Mb, TRIM58 region), was previously associated with blood parameters, such as platelet count and variability in red blood cell volumes (distribution width, RDW).
- Dynein Fight Chain Roadblock-Type 1 protein (DYNFRB 1 gene) were reported in a recent study (REF-SUHREproteinsNatComm). To test the hypothesis whether the same functional variant is responsible for association of predicted frailty and these traits to the TRIM58 region, SMR/HEIDI analysis was performed.
- the current methods for computational prediction of mutation functionality may not be perfect. Additionally, for late-onset, largely evolutionary neutral traits, the prediction of SNP functionality based on the level of evolutionary conservation may not be useful.
- the SNP rs38l 1444 allele C is associated with increased platelet count (22139419), decreased variability in red blood cell (RBC) volume, and increased abundance of RBC oleic acid (25500335). This SNP also associates with whole blood concentration of stearoylcarnitine levels, and expression of nearby genes SMYD3 and OR2W3 and trans-located gene JAM3. The association of rs3811444 with expression of OR2W3 is also reported by the Blood eQTL browser.
- JAM-C homologue of human JAM3
- MACF1 is a ubiquitously expressed cytoskeletal linker and is considered as anti longevity candidate gene. It was discovered that the frailty-increasing allele T of the the rs 17513135 SNP that has the strongest association with predicted frailty in chromosome 1 region at 40Mb, is likely exhibiting a pleiotropic effect onto (or acts through) increasing the expression of MACF1 gene. It was known that MACF1 plays a huge role in different development processes as well as in parthenogenesis of wide spectrum of diseases, particularly ageing and chronic inflammatory diseases. Experimental validation also showed high expression of MACF1 in several lung cancer subtypes, especially in lung adenocarcinoma and squamous cell carcinomas. MACF1 knockdown dramatically impaired the reproductively of the solid tumors. MACF1 may nave evolutionary conservative role in ontogenesis of evolutionary distant organisms.
- MSRA encodes methionine sulfoxide reductase A, which is involved in damage repair resulting from oxidative stress.
- GeneAge database longevity studies in from invertebrates suggest a role for MSRA in ageing. Over-expression of a MSRA homologue in fmit flies extends, and disruption of MSRA in mice decreases lifespan, respectively.
- MSRA has been associated with agerelated diseases, such as Alzheimer’s disease.
- MSRA expression level is negatively associated with frailty in the disclosed analysis.
Landscapes
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- Biotechnology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Theoretical Computer Science (AREA)
- Genetics & Genomics (AREA)
- Public Health (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Molecular Biology (AREA)
- Epidemiology (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Bioethics (AREA)
- Artificial Intelligence (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Biomedical Technology (AREA)
- Pathology (AREA)
- Primary Health Care (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
A method for generating a frailty prediction model is provided. The frailty prediction model is configured to determine a frailty indicator of an individual. The method comprises: constructing a training model based on mortality data of a first population of deceased individuals; determining, using the training model, a frailty index for individuals of a second population; identifying a plurality of SNPs associated with human frailty based at least on (1) genetic data of the individuals of the second population and the (2) frailty index for the individuals of the second population; and generating the frailty prediction model based on the plurality of SNPs.
Description
METHODS AND SYSTEMS FOR PREDICTION OF FRAILTY
BACKGROUND
[1] Improved methods and apparatus for determining frailty and longevity would be beneficial for improving health management and drug discovery related to anti-aging. However, prior approaches to determining frailty and longevity can be less than ideal in at least some respects. For example, prior approaches based on physiologic related data can be less than ideal and may provide less accuracy than would be ideal. Prior approaches to determining frailty and longevity based on genetic data can be less than ideal. For example, prior identification of longevity-associated gene variants based available genomes of super-centenarians can result in decreased accuracy in at least some instances. Also, prior methods and apparatus have less than ideally addressed available genetic data to determine frailty and longevity, which can result in less than ideal accuracy.
[2] In light of the above it would be desirable to have improved methods and apparatus for determining longevity and frailty in response to genetic data.
SUMMARY
[3] The present disclosure provides improved methods and systems that may be used for frailty or longevity prediction. The methods and apparatus disclosed herein may provide for high accuracy and efficiency in the prediction of frailty based on genetic data. A frailty model with improved accuracy and performance can be used for predicting longevity of an individual in response to single nucleotide polymorphism (SNP) data. Data from a plurality of identified gene variants can be incorporated into the model in order to output frailty or longevity with improved accuracy.
[4] In one aspect, a method of determining frailty is provided. The method may comprise: receiving an input from a user, the input comprising a request for a frailty assessment; and displaying a frailty assessment parameter in response to a frailty prediction model based on a plurality of SNPs, wherein the plurality of SNPs comprises at least one of rs76207570, rs3811444, rs2250l27, rs9272588, rs559648l8, or rs4332427.
[5] In a separate yet related aspect, a tangible storage medium comprising instructions is provided. The tangible storage medium may be configured to: receive a user input comprising a request for a frailty assessment; and display a frailty assessment parameter in response to a frailty prediction model based on a plurality of SNPs, wherein the plurality of SNPs comprises at least one of rs76207570, rs38H444, rs2250l27, rs9272588, rs559648l8, or rs4332427. In another related yet separate aspect, a system for determining frailty, comprising the tangible storage medium and a processor configured to execute the instructions is provided.
[6] In some embodiments, the plurality of SNPs further comprises at least one of rs674195l, rsl4376l991, rs3580H34, rs3465l, rs689l62l, rsl0947428, rs7808664, rsl3282l06, rs 10793962, rsl50080415, rsl l852372, rs3743445, rs9892942, rs7502233, rs4633l2, rs62217799, or rs 143728189. In some embodiments, the plurality of SNPs further comprises at least one of rs674l95l, rsl4376l99l, rs3580H34, rs34651, rs689l621, rsl0947428, rs7808664, rsl3282l06, rsl0793962, rsl500804l5, rsl l852372, rs3743445, rs9892942, rs7502233, rs4633 l2, rs622l7799, and rsl43728l89.
[7] In some embodiments, the plurality of SNPs comprises at least one of rs76207570, rs38H444, rs2250127, rs9272588, rs55964818, rs4332427, rs674195l, rsl43761991, rs3580H34, rs34651, rs6891621, rsl0947428, rs7808664, rsl3282l06, rsl0793962, rs 150080415, rsl 1852372, rs3743445, rs9892942, rs7502233, rs4633l2, rs62217799, or rs 143728189. In some embodiments, the frailty prediction model is configured to provide a frailty indicator corresponding to an explained variance of at least 1.5.
[8] In some embodiments, the method further comprises generating a frailty indicator for the frailty assessment in response to the frailty prediction model, wherein the frailty prediction model comprises a plurality of coefficients for the plurality of SNPs, and wherein generating the frailty indicator comprises performing a calculation using the plurality of coefficients. In some
cases, the plurality of coefficients comprises a respective coefficient corresponding to each of the plurality of SNPs.
[9] In some embodiments, the method further comprises: receiving a plurality of coefficients for the plurality of SNPs; and generating a frailty indicator for the frailty assessment in response to the frailty prediction model using the plurality of coefficients. In some cases, the plurality of coefficients comprises a respective coefficient corresponding to each of the plurality of SNPs.
[10] In some embodiments, the method further comprises receiving a frailty indicator and generating the frailty assessment parameter based on the frailty indicator. In some cases, the frailty indicator is indicative of a relative hazard value. In some cases, the frailty indicator is indicative of an expected lifespan at a predetermined age of the individual. In some cases, the frailty indicator is indicative of an assessment of genetic predisposition for premature death. In some embodiments, the method further comprises receiving data indicative of genetic information of the user for the frailty assessment.
[11] In another aspect, a method for generating a frailty prediction model is provided. In some embodiments, the method comprises: constructing a training model based on mortality data of a first population of deceased individuals; determining, using the training model, a frailty index for individuals of a second population; identifying a plurality of SNPs associated with human frailty based at least on (1) genetic data of the individuals of the second population and the (2) frailty index for the individuals of the second population; and generating the frailty prediction model based on the plurality of SNPs, wherein the frailty prediction model is configured to determine a frailty indicator of an individual.
[12] In a related yet separate aspect, a tangible storage medium is provided. The tangible storage medium comprises instructions configured to: construct a training model based on mortality data of a first population of deceased individuals; determine, using the training model, a frailty index for individuals of a second population; identify a plurality of SNPs associated with human frailty based at least on (1) genetic data of the individuals of the second population and the (2) frailty assessment for the individuals of the second population; and generate a frailty prediction model based on the plurality of SNPs, wherein the frailty prediction model is configured to determine a frailty indicator of an individual. In another aspect, a system for generating a frailty prediction model, comprising the tangible storage medium and a processor configured to execute the instructions is provided.
[13] In some embodiments, the plurality of SNPs comprises at least one of rs76207570, rs38H444, rs2250l27, rs9272588, rs559648l8, and rs4332427. In some embodiments, the plurality of SNPs comprises at least one of rs76207570, rs3811444, rs2250l27, rs9272588, rs559648l8, rs4332427, rs674l95l, rsl4376199l, rs3580H34, rs3465l, rs689l621, rsl0947428, rs7808664, rsl3282l06, rsl0793962, rsl500804l5, rsl 1852372, rs3743445, rs9892942, rs7502233, rs4633l2, rs62217799, and rsl43728l89. In some embodiments, constructing the training model comprises constructing a COX proportional hazard model. In some embodiments, the second population of individuals comprises living individuals. In some embodiments, generating the frailty prediction model comprises generating a linear regression model such as a weighted sum of the plurality of SNPs.
[14] In some embodiments, the frailty prediction model is configured to receive as an input genetic data of the individual and provide as an output the frailty indicator in response to the input. In some embodiments, the frailty indicator is indicative of a relative hazard value, an expected lifespan at a predetermined age of the individual, a rate of death at a predetermined age of the individual relative to a rate of death at the predetermined age of a predetermined group of individuals, or an assessment of genetic predisposition for premature death.
[15] In some embodiments, the mortality data comprises death register data of the deceased individuals of the first population and non-genetic data for the deceased individuals of the first population. In some cases, the non-genetic data comprises a plurality of non-genetic traits.
The plurality of non-genetic traits is selected from the group consisting of white blood cell count, red blood cell count, mean corpuscular volume, mean corpuscular hemoglobin, and platelet count.
[16] In some embodiments, the method further comprises receiving the mortality data from a preexisting database. In some embodiments, the frailty index is a logarithm of a hazard ratio value. In some cases, the hazard ratio value is ratio between a hazard rate of an individual and a mean hazard rate of a predetermined group of individuals.
[17] In some embodiments, generating the frailty prediction model comprises generating a coefficient for each of the plurality of SNPs, wherein the coefficient is indicative of an association with human frailty. In some cases, generating the coefficient comprises multiple linear regression analysis. In some embodiments, the frailty indicator determined by the frailty prediction model corresponds explained variance of at least 1.5. In some embodiments, identifying the plurality of SNPs comprises a genome-wide association study (GWAS). In some cases, identifying the plurality of SNPs comprises identifying a set of SNPs having a predetermined correlation with human frailty and further selecting a subset of SNPs from the set of SNPs. In some cases, conditional and joint analysis is used to select for the subset of SNPs. In some examples, the GWAS is configured to produce an effect allele frequency value of at most about 0.8.
[18] In another aspect, a method of determining frailty is provided. The method comprises: receiving an input request for a frailty indicator; and generating the frailty indicator in response to a frailty prediction model based on a plurality of SNPs, wherein the plurality of SNPs comprises at least one of rs76207570, rs38H444, rs2250l27, rs9272588, rs559648l8, or rs4332427.
[19] In some embodiments, the plurality of SNPs comprises at least one of rs76207570, rs38H444, rs2250127, rs9272588, rs55964818, rs4332427, rs674195l, rsl4376l991, rs3580H34, rs3465l, rs689l62l, rsl0947428, rs7808664, rsl3282l06, rsl0793962, rs 150080415, rsl 1852372, rs3743445, rs9892942, rs7502233, rs4633l2, rs622l7799, and rs 143728189. In some embodiments, the plurality of SNPs further comprises at least one of rs674195l, rsl4376l991, rs3580H34, rs3465l, rs689l62l, rsl0947428, rs7808664, rsl3282l06, rs 10793962, rsl50080415, rsl l852372, rs3743445, rs9892942, rs7502233, rs4633l2, rs62217799, or rs 143728189. In some embodiments, the plurality of SNPs further comprises at least one of rs674l95l, rsl4376l99l, rs3580H34, rs34651, rs689l621, rsl0947428, rs7808664, rsl3282l06, rsl0793962, rsl500804l5, rsl l852372, rs3743445, rs9892942, rs7502233, rs4633 l2, rs622l7799, and rsl43728l89.
[20] In another aspect of the invention, a method of determining frailty is provided. The method comprises: receiving a plurality of coefficients for a plurality of SNPs; and generating a frailty indicator in response to a frailty prediction model based on the plurality of SNPs, wherein the plurality of SNPs comprises at least one of rs76207570, rs38H444, rs2250l27, rs9272588, rs559648l8, or rs4332427.
[21] In some embodiments, the plurality of SNPs further comprises at least one of rs674195l, rsl4376l991, rs3580H34, rs3465l, rs689l62l, rsl0947428, rs7808664, rsl3282l06, rs 10793962, rsl50080415, rsl l852372, rs3743445, rs9892942, rs7502233, rs4633l2, rs62217799, or rs 143728189. In some embodiments, the plurality of SNPs further comprises at least one of rs674l95l, rsl4376l99l, rs3580H34, rs34651, rs689l621, rsl0947428, rs7808664, rsl3282l06, rsl0793962, rsl500804l5, rsl l852372, rs3743445, rs9892942, rs7502233, rs4633 l2, rs622l7799, and rsl43728l89.
[22] In some embodiments, the plurality of SNPs comprises at least one of rs76207570, rs38H444, rs2250127, rs9272588, rs55964818, rs4332427, rs674195l, rsl4376l991, rs3580H34, rs34651, rs689l621, rsl0947428, rs7808664, rsl3282l06, rsl0793962, rsl 50080415, rsl 1852372, rs3743445, rs9892942, rs7502233, rs4633l2, rs62217799, or rsl43728l89.
INCORPORATION BY REFERENCE
[23] All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.
BRIEF DESCRIPTION OF THE DRAWINGS
[24] The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings of which:
[25] FIG. 1 shows an exemplary process of building a frailty prediction model, in accordance with some embodiments of the invention.
[26] FIG. 2 shows an exemplary method of constructing a frailty prediction model and predicting longevity or frailty for an individual, in accordance with some embodiments.
[27] FIG. 3A shows an example of coefficients with high coefficient magnitudes (e.g., high correlation to frailty) and factors corresponding to the coefficients in Cox PH model, in accordance with some embodiments; FIG. 3B shows the logarithm of the CPHM hazard ratio increases linearly with age.
[28] FIG. 4 shows genome-wide significant loci that are associated with frailty or longevity, in accordance with some embodiments.
[29] FIG. 5A shows a plurality of traits having significant genetic correlation with the frailty index, in accordance with some embodiments; FIG. 5B shows genetic correlation matrix and clustering for traits having high genetic correlations with predicted frailty.
[30] FIG. 6 shows an exemplary network layout comprising frailty prediction systems, in accordance with some embodiments.
[31] FIG. 7 shows an example of a user device by which a user may access frailty prediction information, in accordance with some embodiments.
[32] FIG. 8 shows an exemplary process of generating longevity related result on a user device, in accordance with some embodiments.
[33] FIG. 9 shows a computer system that is programmed or otherwise configured to perform frailty prediction, in accordance with some embodiments.
[34] FIG. 10 shows a Manhattan plot for GW AS on frailty phenotype.
[35] FIG. 11 shows a list of candidate genes in regions associated with predicted frailty, as suggested by presence of missense mutations and/or SMR HEIDI.
DET AIDED DESCRIPTION
[36] While various embodiments of the disclosure have been shown and described herein, those skilled in the art will understand that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions may occur to those skilled in the art without departing from the disclosure. It should be understood that various alternatives to the embodiments of the disclosure described herein may be employed.
[37] The term“about” and its grammatical equivalents in relation to a reference numerical value can include a range of values up to plus or minus 10% from that value. For example, the amount“about 10” can include amounts from 9 to 11. The term“about” in relation to a reference numerical value can include a range of values plus or minus 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, or 1% from that value.
[38] The term“at least” and its grammatical equivalents in relation to a reference numerical value can include the reference numerical value and greater than that value. For example, the amount“at least 10” can include the value 10 and any numerical value above 10, such as 11, 100, and 1,000.
[39] The term“at most” and its grammatical equivalents in relation to a reference numerical value can include the reference numerical value and less than that value. For example, the amount“at most 10” can include the value 10 and any numerical value under 10, such as 9, 8, 5, 1, 0.5, and 0.1.
[40] As used herein the singular forms“a”,“an”, and“the” can include plural referents unless the context clearly dictates otherwise. All technical and scientific terms used herein can have the same meaning as commonly understood to one of ordinary skill in the art to which this disclosure belongs unless clearly indicated otherwise.
[41] The term“subject,” as used herein, generally refers to an animal, such as a mammalian species (e.g., human) or avian (e.g., bird) species, or other organism, such as a plant. More specifically, the subject can be a vertebrate, e.g., a mammal such as a mouse, a primate, a simian or a human. Animals include, but are not limited to, farm animals, sport animals, and pets. A subject can be a healthy individual, an individual that has or is suspected of having a disease or a pre-disposition to the disease, or an individual that is in need of therapy or suspected of needing therapy. A subject can be any individual human being.
[42] The term “polynucleotide,” as used herein, generally refers to a molecule comprising one or more nucleic acid subunits. A polynucleotide can include one or more subunits selected from adenosine (A), cytosine (C), guanine (G), thymine (T) and uracil (U), or variants thereof. A nucleotide can include A, C, G, T or U, or variants thereof. A nucleotide (nt) can include any subunit that can be incorporated into a growing nucleic acid strand. Such subunit can be an A, C, G, T, or U, or any other subunit that is specific to one or more complementary A, C, G, T or U, or complementary to a purine (i.e., A or G, or variant thereof) or a pyrimidine (i.e., C, T or U, or variant thereof). A subunit can enable individual nucleic acid bases or groups of bases (e.g., AA, TA, AT, GC, CG, CT, TC, GT, TG, AC, CA, or uracil-counterparts thereof) to be resolved. In some examples, a polynucleotide is deoxyribonucleic acid (DNA) or ribonucleic acid (RNA), or variants or derivatives thereof. A polynucleotide can be single-stranded or double-stranded.
[43] The term“genome” generally refers to an entirety of an organism’s hereditary information. A genome can be encoded either in DNA or in RNA. A genome can comprise coding regions that code for proteins as well as non-coding regions. A genome can include the sequence of all chromosomes together in an organism. For example, the human genome has a total of 46 chromosomes. The sequence of all of these together constitutes a human genome.
[44] The term“genetic variant,” as used herein, generally refers to an alteration, variant or polymorphism in a nucleic acid sample or genome of a subject. Such alteration, variant or polymorphism can be with respect to a reference genome, which may be a reference genome of the subject or other individual. Single nucleotide polymorphisms (SNPs) are a form of polymorphisms. In some examples, one or more polymorphisms comprise one or more single nucleotide variations (SNVs), insertions, deletions, repeats, small insertions, small deletions, small repeats, stmctural variant junctions, variable length tandem repeats, and/or flanking sequences. Copy number variants (CNVs), trans versions and other rearrangements are also forms of genetic variation. A genomic alternation may be a base change, insertion, deletion, repeat, copy number variation, transversion, or a combination thereof.
[45] The terms“frailty” and“longevity” are used interchangeably throughout this specification, and refer to a measure of one or more of an expected life span, genetic predisposition to premature death, rate of death, and frailty risk relative to average group. The terms“longevity associated genes” and “frailty associated genes” are used interchangeably throughout this specification and refer to genes with desired correlation with frailty or longevity.
Method Overview
[46] The methods and systems provide a frailty prediction model to predict longevity or frailty of an individual based on genetic data. The frailty prediction model is capable of producing a frailty indicator indicative of a frailty assessment, such as a frailty risk or longevity score, based
on individual’s genetic data. Frailty can be associated with disability, comorbidity, and other characteristics. Frailty can be strongly associated with higher mortality. The frailty prediction model computes the frailty indicator based on a relationship between a plurality of genetic factors and frailty of an individual. The genetic factors may comprise SNPs, Indels, CNVs, and/or epigentics markers. In many embodiments, the plurality of SNPs may be identified to be associated with frailty using an intermediate phenotype.
[47] FIG. 1 illustrates an exemplary process 100 of building a frailty prediction model, in accordance with embodiments of the invention. The process 100 can provide a frailty prediction model 116 configured to generate a frailty indicator for an individual in response to receipt of genetic data of the individual. The frailty prediction model 116 can generate the frailty indicator based on information relating to the presence of a plurality of SNPs in the genetic data of the individual. The frailty indicator for an individual can, for example, be used to generate a frailty assessment parameter for the individual. The frailty indicator can be indicative of one or more of an expected life span, genetic predisposition to premature death, rate of death, and frailty risk relative to average group. The frailty assessment parameter may be displayed to a user, such as via a graphical user interface of a user electronic device, in various forms. For example, the frailty assessment parameter may be displayed to the user as a graph, a numerical value, text, and/or a diagram. In some embodiments, the frailty prediction model 116 may comprise a multiple linear relationship between the plurality of SNPs and the frailty indicator. The process 100 for building the frailty prediction model 116 can comprise use of a first dataset of information for a first population of individuals, where the first dataset may not include genetic information for the first population of individuals, to train a death prediction model 110. The death prediction model 103 can be used to generate a frailty index 112 that can be used in the selection of the plurality of SNPs. In some embodiments, the death prediction model 103 may be obtained by fitting available mortality data 101 to a Cox proportional hazard model (Cox PH model). The process 100 can thereby advantageously enable generation of the frailty prediction model 116 using information for a population of individuals where genetic information for the individuals is not available.
[48] The provided process 100 can comprise use of a second dataset 105 comprising information for a second population of individuals in the selection of the SNPs, where the second dataset 105 includes genetic information for individuals of the second population. The second dataset 105 can be an unrestricted dataset, for example comprising both genetic information and non-genetic information. The process 100 may allow for identification of SNPs having a desired association with frailty or longevity from the un-restricted dataset, such as the second dataset 105. As will be described in further details herein, the death prediction model 103 can generate a computed frailty index for individuals of the second population in response to receipt of the non- genetic data 107 of the second population of individuals. The computed frailty index 112 can then be used, along with genetic data 109 of the second population, in the GWAS analysis for selecting the plurality of SNPs. The information used in the GWAS analysis can thereby be supplemented with the computed frailty index generated using non-genetic information of the second population of individuals. The available information may be enlarged by supplementing available genetic data 109 by a frailty phenotype, such as the frailty index 112. The frailty phenotype may be an intermediate phenotype used to identify the plurality of SNPs having the desired association with frailty. One or more longevity associated SNPs may be identified using genome-wide association studies (GWAS) applied to the unrestricted dataset which cannot be achieved using conventional methods. This unrestricted dataset is advantageous for providing a frailty prediction model with improved accuracy compared to model constructed using dataset with limited death register data or extremely successfully aging individuals.
[49] As illustrated in FIG. 1, a first dataset comprising mortality data 101 may be used to generate a death prediction model 103. The mortality data 101 may comprise death register data and non-genetic data of individuals from a first population. In some embodiments, the first population may comprise deceased individuals. In some embodiments, the first population may
comprise individuals whose genetic data are not available. For example, the first population may comprise deceased individuals whose genetic data is not available.
[50] The mortality data 101 for the first population of individuals may comprise a plurality of factors. Any number of factors may be selected from the first dataset. For example, at least 10, 50, 100, 200, 300, 400, 500 factors may be selected for building the death prediction model. In some cases, the plurality of factors may comprise genetic factors and/or non-genetic factors. In some cases, the plurality of factors consists of non-genetic factors. As described herein, in some cases, genetic information for the first population of individuals may not be available such that the mortality data 101 for the individuals do not include genetic information for the individuals. A factor may be represented by one or more data fields. A data field of a factor may comprise any numerical format such as continuous value or binary value. The plurality of factors may include various factors positively or negatively associated with frailty. Genetic factors may comprise one or more genes or genetic markers having a positive or negative correlation with frailty. The non-genetic factors may comprise various traits of the individuals of the first population. The plurality of non-genetic factors may or may not change over time. Some of the factors may be time-dependent factors such that the data field of these factors changes over time. One or more of these factors may change only once over time or more than once over time. In some cases, the non-genetic factors may be selected from the group consisting of waist-to-hip ratio, depressive symptoms, lung cancer, asthma, cigarette smoking, education, age at first birth, parental age-at-death, pulse rate, sex, lymphocyte count, and various other data. In some cases, the non-genetic factors may comprise biomarkers selected from the group consisting of cholesterol, direct low density lipoprotein, HDL-cholesterol, triglyceride, apolipoprotein A, apolipoprotein B, C-reactive protein, vitamins, rheumatoid factor, alkaline phosphatase, calcium, testosterone, sex -hormone binding globulin, oestradiol, insulin like growth factor, hemoglobin Ale, glucose, cystatin C, creatinine, protein, urea, phosphate, urate, sodium, microalbumin, potassium, bilirubin, gamma, Glutamyltransferase, alanine aminotransferase, aspartate aminotransferase, or any combination thereof. Information relating to the plurality of factors may be obtained from blood assay results, cognitive tests, physical measurements (e.g., spriometry, anthropometry, blood pressure, grip strength), lifestyle and environment historical record, health and medical history, family history, psychosocial factors, early life factors and/or various others.
[51] The mortality data 101 may comprise death register data. The death register data can be indicative of a life span of an individual, for example providing information of the length of time an individual was alive. The death register data can be in various forms. In some cases, the death register data may be right-censored mortality data representing follow-up time in years for each subject. This may be defined as the difference between the date a person was last observed or the date of death, and the date the person took a blood test. In some cases, the death register data for an individual may comprise a Boolean marker indicating whether death occurred, for example having a binary value representative of whether the individual is deceased. The life span of the individual can then be derived from the Boolean marker.
[52] For each individual in the first population, the one or more factors may or may not be all available. In the case when data is missing, the missing data may be replaced or imputed with substitute values such as mean values, normalized to zero mean, and/or unit variance of 1. A variety of other data imputation methods can be used to replace the missing data.
[53] The mortality data 101 may be used to train a death prediction model 110. For example, the mortality data 101 can be used to generate coefficients for a death prediction model 103. The death prediction model 103 may represent a relationship between a frailty index of an individual and a plurality of factors as described above. The death prediction model 103 may be used to compute frailty indices based on genetic and/or non-genetic data. As described herein, in some cases, the mortality data 101 may not include genetic data such that the death prediction model 103 is generated using non-genetic factors, such as the plurality of factors described herein.
[54] In some cases, the death prediction model 103 may be obtained by fitting a Cox Proportional hazards model (Cox PH model) to the mortality data 101. The Cox PH model can
provide survival-time (time-to-event) outcomes (e.g., frailty) on one or more predictor variables (e.g., non-genetic factors as described herein). The Cox PH model can be used to model a relationship between the predictor variables a hazard rate for an individual or a hazard ratio for the individual. Hazard ratios can be a comparison of the hazard rates of event occurrence for an individual of a group and a hazard rate of the group. For example, once trained, the Cox PH model can be used to estimate hazard ratios for individuals based on the predictor variables. The registered death data for an individual can be provided as a cumulative hazard or hazard-ratio and the plurality of factors can be provided as the one or more predictor variables. Cumulative hazard at a time t can be the risk of dying between time 0 and time t, and the survivor function at time t is the probability of surviving to time t (e.g., exponential function). In some cases, the training procedure 110 can be used to generate a plurality of coefficients for the death prediction model 103. For example, the training procedure 110 can be used to generate coefficients for the Cox PH model. The coefficients can be indicative of correlations between the plurality of factors and a hazard rate or hazard ratio. For example, the sign of the coefficient, positive or negative, can indicate the direction of correlation with the hazard rate or hazard ratio. A positive coefficient can indicate positive correlation with an increased hazard rate (e.g., a worse prognosis for life expectancy) and a negative coefficient can indicate a negative correlation with an increased hazard rate (e.g., a protective effect of the variable with which it is associated). It should be noted that other survival models can also be used for the death prediction model to generate the estimated frailty as an intermediate phenotype. For example, accelerated life model or other kinds of proportional hazard models such as exponential and Weibull models may also be used for computing an estimated frailty index as the intermediate phenotype.
[55] The death prediction model 103 may comprise a plurality of coefficients corresponding to the plurality of factors respectively. In some cases, the death prediction model 103 can comprise a coefficient for each of the plurality of factors. For example, training the death prediction model 110 can generate a corresponding coefficient for each of the plurality of coefficients.
[56] A hazard rate of an individual can be calculated using the trained death prediction model 103. In some embodiments, hazard ratio may be computed and used as an intermediate phenotype for an individual. In some cases, the hazard ratio may be calculated as the ratio of an individual’s hazard rate over a group’s mean hazard rate. For example, a hazard ratio of 2 is thought to mean that an individual has twice the chance of dying than a comparison group (e.g., group average). The group may be a baseline group or comparison group. The group may be of the same cohort as the individual. The group may be categorized in various different ways such as gender, age, ethnicity and the like. In some cases, a logarithm of the hazard ratio may be used as the frailty index or intermediate phenotype of an individual. The hazard ratio may be a constant along with time. In some cases, the hazard ratio may vary depending on time.
[57] The trained death prediction model 103 can then be used for computation of a frailty index 112 for each individual from a second dataset 105. For example, the death prediction model 103 can generate a frailty index 112 for an individual of a second population in response to receiving information relating to the predictor variables of the death prediction model 103. In some cases, the computed frailty index can be a hazard rate. In some cases, the computed frailty index 112 can be a hazard ratio. In some cases, the computed frailty index 112 may be the logarithm of a hazard ratio as described herein. The computed frailty index 112 may be a measure of the frailty phenotype or intermediate phenotype. In some embodiments, the second dataset 105 may comprise both genetic data 109 and non-genetic data 107. For example, the second dataset 105 may be from individuals for which both genetic data 109 and non-genetic data 107 are available. The second dataset may comprise data for a second population of individuals. The second population comprises living individuals. The living individuals may have genetic data available. The second dataset can comprise the genetic data of the second population of individuals.
[58] As described herein, the death prediction model 103 can be trained using non- genetic information from a first population of individuals. The trained death prediction model 103 can be used to determine frailty indices for individuals of a second population. In some embodiments, the non-genetic data 107 of the second population may comprise one or more of the factors that may be used to generate the computed frailty index 112 for individuals of the second population using the death prediction model 103. In some embodiments, the computed frailty index 112 may comprise a hazard ratio or logarithm of hazard ratio of an individual in the second population using the fitted death prediction model 103.
[59] As described herein, coefficients for the death prediction model 103 can be generated using a plurality of factors from the mortality data 101 of the first population of individuals, where the coefficients can correspond to the plurality factors. Information relating to the plurality of factors from the non-genetic data 107 can be provided to the death prediction model 103 to generate the computed frailty index 112 for individuals of the second population. For example, information relating to the plurality of factors for each individual of the second population of individuals can be provided to the death prediction model 103 such that a computed frailty index 112 can be generated for each individual of the second population. As described herein, the second dataset 105 for the second population can comprise both genetic data 109 and non-genetic data. Using this method, an individual in the second population may have both genetic data 109 available and a computed frailty index 112 associated with the individual as a phenotype.
[60] Next, GWAS may be performed using the genetic data 109 and the frailty index 112 to identify SNPs having desired correlation with longevity or frailty 114. A plurality of genetic markers may be identified by the GWAS meta-analysis as having desired association with frailty at the genome wide level. In some cases, because all of these longevity or frailty associated genes are not typically present on a genotyping array, imputation of these missing genes may be performed. A number of genome-wide significant loci may be identified to have consistent association with frailty and other traits. The plurality of SNPs may be selected via a genome-wide SNP selection procedure. In some cases, methods for selecting the plurality of SNPs can comprise conditional and joint multiple-SNP analysis of GWAS may be used to analyze the correlation of the associated SNPs for frailty.
[61] After a list of longevity associated SNPs are identified, a regression analysis, such as a multiple regression analysis, may be performed to build the frailty prediction model 116 based on the plurality of identified SNPs. The regression analysis may generate a plurality of coefficients. Such regression analysis may produce a coefficient for each SNP. For example, the regression analysis may be used to generate a plurality of coefficients, the plurality of coefficients comprising a coefficient which corresponds to a respective SNP of the plurality of SNPs. In some cases, each SNP can have a corresponding coefficient. The coefficient for an SNP can be indicative of a weight assigned for the corresponding SNPs in the frailty prediction model. The coefficient can be indicative of the strength and/or direction of correlation between the SNP and the frailty indicator generated by the frailty prediction model.
[62] In some cases, the frailty prediction model 116 may comprise a linear regression (e g., multivariate regression). Using this frailty prediction model, an individual’s frailty assessment index can be computed as a weighted sum of the list of SNPs. The weights may correspond to the plurality of coefficients obtained from the multiple linear regression analysis. The frailty prediction model can be used to generate a frailty assessment of an individual in response to receiving genetic information of the individual, such as genetic information relating to the presence or absence of the plurality of SNPs.
[63] The frailty prediction model 116 may have a performance metric. The performance metric indicates an improved accuracy compared to other models. In examples, residuals from the multivariate regression analysis may be used as a target for further estimations with the list of SNPs. These residuals should not have correlations with sex, age and genetics principal components or other traits that were analyzed in the multivariate regression analysis. Next, another multivariate regression model may be built using the residual values as target and the list of SNPs
as predictors. In this scenario, an explained variance may be used as the performance metrics. In an example, a set of 23 identified SNPs (shown in FIG. 4) accounts for 0.7% of variation in the residual values. Any number of the 23 identified SNPs can be used for building the frailty prediction model. For example, at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, or 23 of the identified SNPs may be used to fit the frailty prediction model. When 685 SNPs that significantly associated with phenotype in GWAS were used, these SNPs passed genome-wide significance threshold of p-values < 5 8, in which case the explained variance is 1.5%. In some cases, the provided frailty prediction model may have an explained variance in a range of about 1.5% to about 5%, including in range of about 1.5% to about 4.5%, about 1.5% to about 3.5%, about 1.5% to about 2.5%, about 2% to about 4%, about 3% to about 4%, about 3% to about 5%, or about 4% to about 5%.
[64] The frailty prediction model 116 may be used to predict lifespan. Frailty change can be easily translated into lifespan change measured in years for a given population. For example, when using the top 22 SNPs among the 23 identified SNPs (in some cases rsl4376l991 was excluded due to low imputation quality) the mean effect of the 22 SNPs is 0.37 years per allele. In some cases, it may not be possible to have all of 22 SNPs effect signs to be the same for any given genotype set, in such scenarios cumulative lifespan effect for the discovery cohort may be calculated. It was shown that 2.5% of population have lifespan at least 1.81 years shorter, and another extreme 2.5% of population (after 97.5 percentile) have 2.21 years longer lifespan.
[65] In some cases, the traits including the frailty index being analyzed by the multiple regression analysis may be adjusted for sex, age, genetic and principal components of the genetic data (e.g., sex, age, genetic and principal components of the genetic data are taken into account as variables) . In some cases, data imputation may be performed to substitute missing SNPs data.
[66] Example of frailty prediction model
[67] FIGs. 2-5 show an example of frailty prediction model obtained using the provided method and system. FIG. 2 shows an exemplary method 200 of constructing a frailty prediction model and generating a frailty assessment for an individual using the frailty prediction model. The frailty prediction model can predict the longevity or frailty of an individual. The method may be used to identify a list of SNPs with significant association with frailty. The list of SNPs may be identified by performing GWAS on a dataset comprising computed frailty index as an intermediate phenotype. A frailty prediction model is then constructed based on the identified SNPs such that frailty or longevity related information can be generated for an individual using genetic data of the individual, for example to provide a frailty indicator for the individual.
[68] In the example, the computed frailty index may be computed using a trained death prediction model. The death prediction model may be the Cox PH model as described in FIG. 1. Mortality data may be used for building the Cox PH model. The mortality data can be from any available database such as United Kingdom Biobank and any other resource. The mortality data as described above may comprise death register data and a plurality of factors. The mortality data may not comprise genetic data. For example, the Cox PH model can be trained using data which does not comprise genetic data, enabling application of the step to datasets for which genetic data is not available. The Cox PH model can be trained using data from individuals for which genetic data is not available.
[69] In some cases, data processing may be performed to prepare the mortality data for generating the Cox PH model 201. Mortality data preparation may comprise data normalization, data imputation, and various other methods. For instance, when the data field representing continuous or integer values, a Box-Cox transformation may be applied to transform the non normal variables into a normal shape. In another example, data imputation may be performed in case of missing data. For instance, missing data may be substitute with mean values and normalized to zero mean and unit variance of 1.
[70] Next, the Cox PH model is fitted to the mortality data 203. FIG. 3 A shows an example of coefficients with high coefficient magnitudes (e.g., high correlation to frailty) and corresponding variables. In the example, such coefficients may be produced based on mortality
data from a group of individuals whose genetic data are not available. The trained Cox PH model may be evaluated for performance or goodness of fit. For example, the concordance index may be calculated between the predicted hazard ratio generated by the trained Cox PH model and hazard ration based on actual death register data to assess the performance. The Cox PH model constructed using the provided method may have a concordance index of at least 0.7. The concordance index of the Cox PH model may be about 0.8.
[71] As expected from the nature of empirical mortality, the logarithm of the CPHM hazard ratio increases linearly with age as shown in FIG. 3B, a manifestation of progressive reduction of the organism vitality. The linear fit slope corresponds to the Gompertz exponent value of 0:083 vr 1 , consistent with the accepted Mortality Rate Doubling Time (MRDT) value of 8 years. It shows that the mortality model produces a fair measure of the age-dependent hazard to predict chances of or time to death for the UKB participants.
[72] Referring back to FIG. 2, the constructed Cox PH model may then be used for generating an intermediate phenotype for each individual in a second population 205. The intermediate phenotype may be a computed frailty index for an individual. The intermediate phenotype may be the logarithm of a hazard ratio computed using the Cox PH model. The frailty index provides a measure of an individual’s frailty and susceptibility to premature death. As described elsewhere herein, the hazard ratio is the hazard rate of an individual relative to the mean hazard rate of a group of individuals. For example, the hazard ratio for an individual of the second population can be ratio of a hazard rate of the individual relative to a mean hazard rate of the second population.
[73] As described herein, as genetic data for the second population of individuals is available, the computed frailty index and genetic data from the second dataset can be used to identify a plurality of SNPs associated with frailty 207. The second dataset in the example is from the UK Biobank. GW AS study is performed using the genetic data and the frailty index to identify the plurality of SNPs. In some cases, the frailty index may be adjusted for sex, age, SNP array, or genetic principal components before being used as phenotype in the GW AS study. In some cases, data imputation may be applied. Imputation may be performed using one or more reference panels. For example, a plurality of imputed autosomal and directly typed X-chromosome variants may be used to identify a plurality of longevity associated SNPs shown in FIG. 4. As shown in FIG. 4, 23 genome- wide significant loci are located on autosomes. Such SNPs are identified according to the p value. For example, threshold may be determined and SNPs with p value below such threshold may be identified as the significant SNPs. In some cases, the GWAS may be performed for multiple traits including the frailty index and blood traits. The blood traits may be from the second dataset such as the UK Biobank. The blood traits may comprise: white blood cell (leukocyte) count, red blood cell (erythrocyte) count, mean corpuscular volume, mean corpuscular haemoglobin and platelet count. These traits may be adjusted for sex, age, genotyping array platform and genetic principal components (e.g., the first ten genetic principal components) using linear regression model.
[74] In another embodiment, a set of associated SNPs may be selected. The set of SNPs may be selected using conditional and joint analysis. The conditional and joint analysis method may be used for estimating the joint effects of multiple SNPs for a quantitative trait such as the frailty index. The conditional and joint analysis method may use meta-analysis summary statistics and one or more reference panels of SNPs to estimate the linkage disequilibrium (LD). A subset of SNPs with p value below certain threshold may be selected. In a multiple regression framework, the following stepwise selection strategy may be used to select the associated SNPs iteratively over all the SNPs across the whole genome, regardless of their P values from the meta analysis, except for the most significant SNP, which was used for model initiation. In some cases, a subset having a pre-determined number of SNPs may be selected. The Steps may include:
(1) Start with a model with the most significant SNPs in the meta-analysis with P value below a cutoff P value, such as 5 x 1CT8.
(2) For the tth step, calculate the P values of all the remaining SNPs conditional on the SNP(s) that have already been selected in the model. To avoid problems due to colinearity, if the squared multiple correlation between a SNP to be tested and the selected SNP(s) is larger than a cutoff value, such as 0.9, the conditional P value for that SNP will be set to 1.
(3) Select the SNP with minimum conditional P value that is lower than the cutoff P value. However, if adding the new SNP causes new collinearity problems between any of the selected SNPs and the others, drop the new SNP and repeat this process.
(4) Fit all the selected SNPs jointly in a model and drop the SNPs with the P value that is greater than the cutoff P value.
(5) Repeat processes (2), (3) and (4) until no SNPs can be added or removed from the model or when the target set reaches the pre-determined number of SNPs.
It should be noted that the cutoff p value may be determined by a user. Different cutoff p value may lead to a linear regression model comprising more or less SNPs. The pre-determined number of SNPs can be any integer. For example, the pre-determined number can be at least 2, 5, 10, 15, 20, 25, 30, 35, 40 and the like. In some cases, the pre-determined number may be determined according to a desired prediction accuracy or computation cost. Different number of SNPs included in the subset for calculating a frailty index may be associated with different level of accuracy. For example, a subset of 10 SNPs may produce a frailty prediction assessment more accurate than a subset of 2 SNPs. In some cases, an optimal number of SNPs (e.g., 20) may be determined such that the frailty index can be computed with low computation cost and a sufficient accuracy.
[75] Further analysis may be performed to identify and verify the plurality SNPs. For example, in silico functional genomic analysis of associated regions may be performed. The list of SNPs may be prepared for functional annotation. Genetic correlation between the frailty index and a plurality of complex traits may be analyzed using blood parameters including white blood cell (white blood cell (leukocyte) count, red blood cell (erythrocyte) count, mean corpuscular volume, mean corpuscular haemoglobin, platelet count. FIG. 5A shows a plurality of traits having significant genetic correlation with frailty. Next these genetic correlated traits and the identified SNPs may be analyzed to identify associated regions. Additionally, HEIDI test may be performed to verify and further identify the longevity associated genes.
[76] Genomic wide association study
[77] In an embodiment, the plurality of SNPs may be identified using the genome- wide association study (GW AS). GW AS can be used to measure and analyze nucleic acid sequence variations (genetic variations) across the human genome to determine specific genetic risk factors observed in conditions that are shared among the a group of individuals. Genetic variations can comprise mutations, such as single nucleotide polymorphisms (SNP). SNPs may be a unique unit of genetic variation, which can function as a marker of the genomic region. Common conditions, such as diseases and frailty, may be impacted by genetic variations that are shared by a group of individuals. Aside from SNPs, other genetic variants may be a genetic polymorphism observed in the cohort studied, including but not limited to Copy Number Variation (CNV), Chromosomal inversions, and any type of epigenetic variations. Genetic variations may be investigated at the level of haplotypes where sets of genetic variations are co-inherited. Associations between a phenotypical trait and a genetic variant may not necessarily mean that the variant is causative for the trait.
[78] Several methods for identifying a genetic variant may be known to a person skilled in the art. Any of such method may be employed to determine if a genetic variation is associated with the phenotypical observation. Both discreet phenotypical observation such as eye color as well as continuous observations such as height may associate to a genetic variation, being a single SNP or a set of different genetic variations.
[79] In some embodiments, the sources of data, such as phenotypes and genotypes, for GWAS may be obtained from biobanks. Phenotypes and genotypes may be obtained directly from biobanks, such as the United Kingdom biobank (UKB). In the discovery phase, select samples
may be analyzed genome-wide for association with the estimated frailty in the GW AS. Select groups of samples may be removed as determined by factors such as quality control testing, genetically inferred sex mismatch, UKB recommended genomic analysis exclusions and samples that comprise genetic relatedness pairing. During such discovery phase, the logarithm of the hazard ratio prediction for each sample in the testing cohort may be determined as disclosed herein and can represent a measure of an individual’s frailty and susceptibility to premature death relative to a population’s mean risks and may be further used as a phenotype for GWAS. GWAS may be conducted for a select list of traits from the constructed Cox LH model and from other traits such as white blood cell (leukocyte) count, red blood cell (erythrocyte) count, mean corpuscular volume, mean corpuscular haemoglobin, and platelet count.
[80] When identifying frailty associations in GWAS, certain factors may be analyzed, such as how well the phenotypical descriptor corresponds to the genetic components, how frequent the genetic markers occur in the investigated cohort, the penetrance of the genetic marker(s), and a well stratified population in which phenotypical traits are not confounded with origin of decent.
[81] Any method known to a person skilled in the art can be used for the GWAS. This may, for example, be any method that may be used in the identification of genetic markers on a genome-wide level including but not limited to genome-wide arrays or any form of DNA sequencing. Genome-wide microarray arrays for GWAS, can include but are not limited to Affymetrix Genome-Wide Human SNP 6.0 arrays and Affymetrix Genome-Wide Human SNP 5.0 arrays, lllumina HD BeadChip, NimbleGen CGH Microarrays, Agilent GCH.
[82] In one embodiment the GWAS may comprise the steps of analyzing the genetic variations of the genetic profiles and then identifying the genetic variants that throughout the sample cohort correlate and associate with the frailty index, thereby identifying a genetic variation that is associated with a phenotypical feature. When identifying the genetic variants that associate with the frailty index, imputation may be used to determine the statistical inference of unobserved genotypes.
[83] In some instances, the imputation of SNPs not on the genotype chip may be utilized for the study. Such a process may increase the number of SNPs that can be tested for association, increases the power of the study, and facilitates meta-analysis of GWAS across distinct cohorts. During imputation, known genetic variants in a population may be first obtained from sources such as the Haplotype Map of the Human Genome or the 1000 Genomes Project. Imputation may utilize information from the linkage disequilibrium (LD) structure in a sequence to infer the alleles of SNPs not directly genotyped in the study (hidden SNPs). LD is a property of SNPs on a contiguous stretch of genomic sequence and may be the non-random association of alleles at different loci in a given population. Additionally, LD can indicate the degree to which an allele of one SNP is inherited or correlated with an allele of another SNP within a population.
[84] Genotype imputation may be performed by statistical methods that combine the GWAS data together with a reference panel of haplotypes. Such methods can be conducted by sharing haplotypes between individuals over portions of sequence to impute alleles. Examples of software packages available to impute genotypes from a genotyping array to reference panels, such as 1000 Genomes Project haplotypes may include MaCH, Minimac, IMPUTE2, and Beagle. Prior to imputation, phasing tools such as SHAPEIT2 can allow for pre-phasing of input genetic variations, for improved imputation accuracy and computational performance. In other embodiments, the genotyping and imputation data may be obtained from the biobank. After groups of phenotypes have been selected for a study population, and the genotypes have been obtained using methods known to those skilled in the art, the statistical analysis of genetic data may then be obtained.
[85] In some embodiments, the analysis of the genome-wide association data may be a series of single-locus statistic tests, examining each SNP independently for association to the phenotype. In other embodiments when there may be two groups of phenotypes, the quantitative traits may be analyzed using generalized linear model (GLM) approaches, such as the Analysis of Variance (ANOVA). Alternatively, the categorical case-control traits may be analyzed using
either contingency table methods or logistic regression. The statistical tests may be adjusted for factors that are known to influence the trait, such as sex, age, study site, and known clinical covariates. Covariate adjustments of this sort may be important as they reduce the misleading associations due to sampling artifacts or biases in study design.
[86] During GWAS, certain statistical parameters serves are indicators of the degree of association between the genetic variant and frailty, such as the effect allele frequency, the p-value, and the regression coefficient estimate (b-value). For each SNP, an effective allele and a reference allele may be determined. The effective allele may be the coded allele and the reference allele may be the non-coded allele. Most chips used in GWAS can distinguish between two genotypes at a given locus, which are the two alleles. As a result, the frequency of these alleles in the total population from their frequency in a sample population may be determined. The frequency of the effective allele may comprise a value of at most about 0.8, at most about 0.5, at most about 0.3, or at most about 0.1. The p-value may refer to the statistical significance and probability of the association to a particular SNP. The p-value may be a threshold set at most about lxlO 3, at most about lxlO 4, at most about lxlO 5, at most about lxlO 6, at most about lxlO 7, at most about 1x10 8, at most about lxlO 9, at most about lxlO 10, or at most about lxlO 15. The regression coefficient may be another important parameter to determine during statistical analysis in GWAS. The regression coefficient is a parameter that represents the strength of association between the SNP and the frailty index. Examples of softwares for analyzing quantitative phenotypic data and for association testing may comprise RegScan, SNPTEST, or PLINK.
[87] Result of longevity or frailty associated SNPs
[88] A plurality of SNPs is identified to be associated with longevity or frailty using the method herein. For example, these SNPs can comprise one or more of those as listed in the table of FIG. 4. One or more of these SNPs can be used in the frailty prediction model as described herein. In some cases, the plurality of SNPs may comprise one or more of rs76207570, rs3811444, rs2250l27, rs9272588, rs559648l8, and rs4332427. For example, ancestral allele C (frequency 0.67) of rs3811444 is associated with increased frailty (p = 1.6 x 10 20). In some cases, the plurality of SNPs comprises each of the SNPs rs76207570, rs38H444, rs2250l27, rs9272588, rs559648l8, and rs4332427. In some cases, the plurality of SNPs comprises only some of rs76207570, rs38H444, rs2250l27, rs9272588, rs559648l8, and rs4332427. In some cases, the plurality of SNPs comprises one or more of rs76207570, rs3811444, rs2250l27, rs9272588, rs559648l8, rs4332427, rs674195l, rsl4376l99l, rs3580H34, rs3465l, rs689162l, rsl0947428, rs7808664, rsl3282l06, rsl0793962, rsl500804l5, rsl l852372, rs3743445, rs9892942, rs7502233, rs4633 l2, rs622l7799, and rsl43728l89. In some cases, the plurality of SNPs comprises each of the rs76207570, rs38H444, rs2250l27, rs9272588, rs559648l8, rs4332427, rs674l951, rsl4376l99l, rs3580H34, rs3465l, rs689l62l, rsl0947428, rs7808664, rsl3282l06, rs 10793962, rsl50080415, rsl l852372, rs3743445, rs9892942, rs7502233, rs4633l2, rs62217799, and rs 143728189. In some cases the plurality of SNPs comprises only some of rs76207570, rs38H444, rs2250l27, rs9272588, rs559648l8, rs4332427, rs674l951, rsl4376l99l, rs3580H34, rs3465l, rs689l62l, rsl0947428, rs7808664, rsl3282l06, rs 10793962, rsl50080415, rsl l852372, rs3743445, rs9892942, rs7502233, rs4633l2, rs62217799, and rs 143728189.
[89] In some cases, the plurality of SNPs can comprise one or more of rs76207570, rs38H444, rs2250l27, rs9272588, rs55964818, rs4332427, rs3580H34, rs689l621, rs7808664, rsl3282l06, rsl0793962, rsl50080415, rs3743445, rs9892942, rs7502233 and rs4633 l2. In some cases, the plurality of SNPs can comprise each of rs76207570, rs38H444, rs2250l27, rs9272588, rs559648l8, rs4332427, rs3580H34, rs689l62l, rs7808664, rsl3282l06, rsl0793962, rs 150080415, rs3743445, rs9892942, rs7502233 and rs4633l2.
[90] The plurality of SNPs may comprise at least two SNPs such as rs76207570 and rs38l 1444, rs76207570 and rs2250l27, rs76207570 and rs9272588, rs76207570 and rs559648l8, rs76207570 and rs4332427, rs38H444 and rs2250127, rs38H444 and rs9272588, rs38H444 and rs559648l8, rs3811444 and rs4332427, rs2250!27 and rs9272588, rs2250!27 and rs559648l8,
rs2250l27 and rs4332427, rs9272588 and rs559648l8, rs9272588 and rs4332427, rs559648l8 and rs4332427. The plurality of SNPs may comprise three SNPs selected from one of the following groups: {rs76207570, rs38H444, rs2250l27}, {rs76207570, rs38H444, rs9272588}, {rs76207570, rs38H444, rs559648l8}, {rs76207570, rs38H444, rs4332427}, {rs38H444, rs2250l27, rs9272588}, {rs38l 1444, rs2250127, rs559648l8},{rs38H444, rs2250l27, rs559648l8},{rs38H444, rs2250127,rs4332427}, or {rs9272588, rs559648l8, rs4332427}. The plurality of SNPs may comprise four SNPs selected from one of the following groups: {rs2250l27, rs9272588, rs559648l8, rs4332427}, {rs76207570, rs9272588, rs559648l8, rs4332427}, {rs76207570, rs38H444, rs559648l8, rs4332427}, {rs76207570, rs38H444, rs2250l27, rs4332427}, {rs76207570, rs38H444, rs2250127, rs9272588}, {rs38H444, rs9272588, rs559648l8, rs4332427}, {rs38H444, rs2250l27, rs559648l8, rs4332427}, {rs38H444, rs2250l27, rs9272588, rs4332427}, {rs38l 1444, rs2250l27, rs9272588, rs559648l8},
{rs76207570, rs2250l27, rs559648l8, rs4332427}, {rs76207570, rs2250127, rs9272588, rs4332427}, {rs76207570, rs2250l27, rs9272588, rs559648l8}, {rs76207570, rs38H444, rs9272588, rs4332427}, {rs76207570, rs38H444, rs9272588, rs55964818}, or {rs76207570, rs38H444, rs2250l27, rs55964818}. The plurality of SNPs may comprise five SNPs selected from one of the following groups: {rs38H444, rs2250l27, rs9272588, rs559648l8, rs4332427}, {rs76207570, rs2250l27, rs9272588, rs559648l8, rs4332427}, {rs76207570, rs38H444, rs9272588, rs55964818, rs4332427}, {rs76207570, rs38H444, rs2250127, rs559648l8, rs4332427}, {rs76207570, rs38H444, rs2250l27, rs9272588, rs4332427}, {rs76207570, rs38H444, rs2250127, rs9272588, rs559648l8}.
[91] In some cases, one or more genetic markers having increased association with the frailty phenotype may be located within 10 kb, 50 kb, 100 kb, 200 kb, 300 kb, 400 kb, or 500 kb of any one of the SNPs described herein, such as SNPs selected from rs76207570, rs3811444, rs2250l27, rs9272588, rs559648l8, rs4332427, rs674195l, rsl4376l99l, rs3580H34, rs34651, rs689162l, rsl0947428, rs7808664, rsl3282106, rsl0793962, rsl500804l5, rsl l852372, rs3743445, rs9892942, rs7502233, rs4633l2, rs622l7799, rsl43728l89. For example, the one or more genetic markers can be used in the frailty prediction model instead of and/or in addition to one or more SNPs.
[92] Referring back FIG. 2, multiple linear regression analysis is applied to the selected set of associated SNPs and frailty index to construct the frailty prediction model, adjusted for sex, age and the first ten genetics principal components 209. A set of coefficients associated the SNPs are produced and a frailty index can be calculated based on an individual’s genetic data. In the example, the frailty index may be the logarithm of the hazard ratio which is calculated as the weighted sum of the set of SNPs.
[93] Next, a variety of longevity or frailty related metrics for an individual may be generated using the frailty prediction model 211, for example providing a frailty assessment for the individual. In some cases, the frailty assessment may comprise various longevity or frailty related metrics derived from the hazard ratio. The various longevity or frailty related metrics may include, for example, expected life span, rate of death, genetic predisposition to premature death, frailty risk relative to average group and the like.
[94] In some cases, a plurality of genes for human predicted frailty is identified using the provided method. The plurality of genes may comprise MACF1 and TRIM58. FIG. 11 shows a list of candidate genes in regions associated with predicted frailty among which MACF1 and TRIM58 are closely related to frailty.
[95] A system and method for predicting frailty or longevity of a user based on the frailty prediction model may be provided. FIG. 6 shows an example of a network layout 600 comprising frailty prediction systems 610, in accordance with some embodiments. In one aspect, network layout 600 may include one or more user devices 602, a server 604, a network 606, one or more databases 608, and a frailty prediction system 610. Each of the components 602, 604, 608, and 610 may be operatively connected to one another via the network 606. The network 606 may
comprise any type of communication links that allows transmission of data from one electronic component to another.
[96] A user device may be, for example, one or more computing devices configured to perform one or more operations consistent with the disclosed embodiments. For example, a user device may be a computing device that is capable of executing software or applications provided by one or more frailty prediction systems. In some embodiments, the software and/or applications may provide to a user frailty to longevity related result. The user may or may not be asked to provide user information via the software or applications. The software and/or applications may be provided on a frailty prediction server or locally on the user device. The server or the software may retrieve genetic data associated with the user stored in a database. The genetic data may be processed by the software or application to generate frailty or longevity prediction result. Computation of the frailty or longevity prediction result may or may not require user input (e.g., sex, age, ethnics, etc). The frailty prediction software or application is designed to allow the user to obtain accurate frailty risk or longevity related information with minimum user input. The frailty prediction calculation may be hosted by the server on one or more interactive webpages, and accessed by one or more users.
[97] A user device can include, among other things, desktop computers, laptops or notebook computers, mobile devices (e.g., smart phones, cell phones, personal digital assistants (PDAs), and tablets), or wearable devices (e.g., smartwatches). A user device can also include any other media content player, for example, a set-top box, a television set, a video game system, or any electronic device capable of providing or rendering data. A user device may include known computing components, such as one or more processors, and one or more memory devices storing software instructions executed by the processor(s) and data.
[98] In some embodiments, the network layout may include a plurality of user devices. Each user device may be associated with a user. Users may include any individual or groups of individuals using software or applications provided by the frailty prediction system. For example, the users may access a user device or a web account using an application programmable interface (API) provided by the frailty prediction system. In some embodiments, more than one user may be associated with a user device. Alternatively, more than one user device may be associated with a user. The users may be located geographically at a same location, for example users working in a same office or a same geographical location. In some instances, some or all of the users and user devices may be at remote geographical locations (e.g., different cities, countries, etc.), although this is not a limitation of the invention.
[99] The network layout may include a plurality of nodes. Each user device in the network layout may correspond to a node. If a“user device 602” is followed by a number or a letter, it means that the“user device 602” may correspond to a node sharing the same number or letter. For example, as shown in FIG. 6, user device 602-1 may correspond to node 1 which is associated with user 1, user device 602-2 may correspond to node 2 which is associated with user 2, and user device 602 -k may correspond to node k which is associated with user k, where k may be any integer greater than 1.
[100] A node may be a logically independent entity in the network layout. Therefore, the plurality of nodes in the network layout can represent different entities. For example, each node may be associated with a user, a group of users, or groups of users. For example, in one embodiment, a node may correspond to an individual entity (e.g., an individual). In some particular embodiments, a node may correspond to multiple entities (e.g., a group of individuals).
[101] A user may be registered or associated with an entity that provides services associated with one or more operations performed by the disclosed embodiments. For example, the user may be a registered user of an entity (e.g., a company, an organization, an individual, etc.) that provides one or more of servers 604, databases 608, and/or frailty prediction systems 610 for frailty risk prediction consistent with certain disclosed embodiments. The disclosed embodiments are not limited to any specific relationships or affiliations between the users and an entity, person(s), or entities providing server 604, databases 608, and frailty prediction systems 610.
[102] A user device may be configured to receive input from one or more users. A user may provide an input to a user device using an input device, for example, a keyboard, a mouse, a touch-screen panel, voice recognition and/or dictation software, or any combination of the above. The input may include a user performing various virtual actions during a frailty risk prediction session. The input may include, for example, a user selecting a desired frailty or longevity related result to view from a plurality of options that are presented to the user during a frailty risk prediction session. In another example, the input may include a user providing permission to the server to access genetic data of the user. In a further example, the input may include a user providing user credentials such as password or biometrics to verify the identity of the user in order to user the software or application.
[103] In the embodiment of FIG. 6, two-way data transfer capability may be provided between the server and each user device. The user devices can also communicate with one another via the server (i.e., using a client-server architecture). In some embodiments, the user devices can communicate directly with one another via a peer-to-peer communication channel. The peer-to- peer communication channel can help to reduce workload on the server by utilizing resources (e.g., bandwidth, storage space, and/or processing power) of the user devices.
[104] A server may comprise one or more server computers configured to perform one or more operations consistent with disclosed embodiments. In one aspect, a server may be implemented as a single computer, through which a user device is able to communicate with other components of the network layout. In some embodiments, a user device may communicate with the server through the network. In other embodiments, the server may communicate on behalf of a user device with the frailty prediction system(s) or the database through the network. In some embodiments, the server may embody the functionality of one or more frailty prediction system(s). In some embodiments, the frailty prediction system(s) may be implemented inside and/or outside of the server. For example, the frailty prediction system(s) may be software and/or hardware components included with the server or remote from the server.
[105] In some embodiments, a user device may be directly connected to the server through a separate link (not shown in FIG. 6). In certain embodiments, the server may be configured to operate as a front-end device configured to provide access to one or more frailty prediction system(s) consistent with certain disclosed embodiments. The server may, in some embodiments, utilize the frailty prediction system(s) to process input data from a user device in order to retrieve genetic data from a database to compute a frailty risk prediction or longevity related result. The server may be configured to store the users’ frailty prediction result data in the database. The server may also be configured to search, retrieve, and analyze (compare) genetic data and log-in information stored in the database. In some cases, the data and information may include a user’s previous frailty calculation result or user input non-genetic information.
[106] A server may include a web server, an enterprise server, or any other type of computer server, and can be computer programmed to accept requests (e.g., HTTP, or other protocols that can initiate data transmission) from a computing device (e.g., a user device) and to serve the computing device with requested data. In addition, a server can be a broadcasting facility, such as free-to-air, cable, satellite, and other broadcasting facility, for distributing data. A server may also be a server in a data network (e.g., a cloud computing network).
[107] A server may include known computing components, such as one or more processors, one or more memory devices storing software instructions executed by the processor(s), and data. A server can have one or more processors and at least one memory for storing program instructions. The processor(s) can be a single or multiple microprocessors, field programmable gate arrays (FPGAs), or digital signal processors (DSPs) capable of executing particular sets of instructions. Computer-readable instructions can be stored on a tangible non- transitory computer-readable medium, such as a flexible disk, a hard disk, a CD-ROM (compact disk-read only memory), and MO (magneto-optical), a DVD-ROM (digital versatile disk-read only memory), a DVD RAM (digital versatile disk-random access memory), or a semiconductor memory. Alternatively, the methods disclosed herein can be implemented in hardware
components or combinations of hardware and software such as, for example, ASICs, special purpose computers, or general purpose computers. While FIG. 6 illustrates the server as a single server, in some embodiments, multiple devices may implement the functionality associated with the server.
[108] The network may be configured to provide communication between various components of the network layout depicted in FIG. 6. The network may be implemented, in some embodiments, as one or more networks that connect devices and/or components in the network layout for allowing communication between them. For example, as one of ordinary skill in the art will recognize, the network may be implemented as the Internet, a wireless network, a wired network, a local area network (LAN), a Wide Area Network (WANs), Bluetooth, Near Field Communication (NFC), or any other type of network that provides communications between one or more components of the network layout. In some embodiments, the network may be implemented using cell and/or pager networks, satellite, licensed radio, or a combination of licensed and unlicensed radio. The network may be wireless, wired, or a combination thereof.
[109] The frailty prediction system(s) may be implemented as one or more computers storing instructions that, when executed by one or more processors), generate a plurality of frailty or longevity related results from which a user can select to view conform to a format that is defined by the user. The frailty prediction system(s) may compute a frailty index of the user by retrieving genetic data from a database associated with the user, and may further calculate longevity related result according to a user selection (e.g., lifespan, frailty risk, etc). The frailty prediction system(s) may further display the frailty prediction result to the user in a format predetermined by the frailty prediction system or by the user. The frailty prediction system(s) may or may not require user identification information in order to verify or authenticate the user to obtain the associated genetic data of the user or perform the frailty prediction functions. In some embodiments, the server may be the computer in which the frailty prediction system(s) are implemented.
[110] However, in some embodiments, at least some of the frailty prediction system(s) may be implemented on separate computers. For example, a user device may send a user input to the server, and the server may connect to other frailty prediction system(s) over the network. In some embodiments, the frailty prediction system(s) may comprise software that, when executed by processor(s), perform processes for computing a frailty risk or longevity related result for a user. In some cases, the frailty prediction system(s) may further perform analysis of the frailty prediction results and provide recommendations or insights on the frailty prediction results.
[111] The server may access and execute the frailty prediction system(s) to perform one or more processes consistent with the disclosed embodiments. In certain configurations, the frailty prediction system(s) may be software stored in memory accessible by the server (e.g., in a memory local to the server or remote memory accessible over a communication link, such as the network). Thus, in certain aspects, the frailty prediction system(s) may be implemented as one or more computers, as software stored on a memory device accessible by the server, or a combination thereof. For example, one frailty prediction system may be computer hardware executing one or more frailty prediction calculations, and another frailty prediction system may be software that, when executed by the server, performs further analysis of the frailty prediction results such as providing recommendations or insights on the frailty prediction results.
[112] The frailty prediction system(s) can be used to provide frailty risk or longevity related information to users in a variety of different ways. For example, the frailty prediction system(s) may store and/or execute software that performs a computation of frailty index of the user based on user genetic data retrieved from a database and a frailty prediction model. The frailty prediction system(s) may also store and/or execute software that performs further analysis of the frailty prediction results of the user or may provide lifestyle or clinical insights based on the results. The frailty prediction system(s) may store and/or execute software that performs an algorithm to dynamically select a frailty prediction model according to sex, age, or ethnicity from user input data. The frailty prediction system(s) may further store and/or execute software that performs an algorithm for dynamically updating the frailty prediction model when more training
data becoming available. The frailty prediction system(s) may further store and/or execute software that performs process to construct a frailty prediction model consistent with the method disclosed herein.
[113] The disclosed embodiments may be configured to implement the frailty prediction system(s) such that a variety of algorithms may be performed for performing frailty prediction analysis and/or constructing a frailty prediction model. Although a plurality of frailty prediction systems have been described for performing the above algorithms, it should be noted that some or all of the algorithms may be performed using a single frailty prediction system, consistent with disclosed embodiments.
[114] The user devices, the server, and the frailty prediction system(s) may be connected or interconnected to one or more database(s) 608-1, 608-2. The database(s) may be one or more memory devices configured to store data (e.g., genetic data, frailty prediction models, historical frailty prediction result, etc.). Additionally, the database(s) may also, in some embodiments, be implemented as a computer system with a storage device. In one aspect, the database(s) may be used by components of the network layout to perform one or more operations consistent with the disclosed embodiments. In certain embodiments, one or more the database(s) may be co-located with the server, or may be co-located with one another on the network. One of ordinary skill will recognize that the disclosed embodiments are not limited to the configuration and/or arrangement of the database(s).
[115] Any of the user devices, the server, the database(s), and/or the frailty prediction system(s) may, in some embodiments, be implemented as a computer system. Additionally, while the network is shown in FIG. 6 as a "central" point for communications between components of the network layout, the disclosed embodiments are not limited thereto. For example, one or more components of the network layout may be interconnected in a variety of ways, and may in some embodiments be directly connected to, co-located with, or remote from one another, as one of ordinary skill will appreciate. Additionally, while some disclosed embodiments may be implemented on the server, the disclosed embodiments are not so limited. For instance, in some embodiments, other devices (such as one or more user devices) may be configured to perform one or more of the processes and functionalities consistent with the disclosed embodiments, including embodiments described with respect to the server and the frailty prediction system.
[116] Although particular computing devices are illustrated and networks described, it is to be appreciated and understood that other computing devices and networks can be utilized without departing from the spirit and scope of the embodiments described herein. In addition, one or more components of the network layout may be interconnected in a variety of ways, and may in some embodiments be directly connected to, co-located with, or remote from one another, as one of ordinary skill will appreciate.
[117] FIG. 7 shows an example of a user device 700 by which a user may access frailty assessment. A user device 700 may be, for example, one or more computing devices configured to perform one or more operations consistent with the disclosed embodiments. For example, a user device may be a computing device that is capable of executing software or applications provided by one or more frailty prediction systems. The user device may comprise a display screen 701 to display various longevity or frailty related metrics to the user. In some cases, the display screen 701 may display input from the user to the user to facilitate use of the device to input information used to generate and display the desired frailty assessment parameter.
[118] A user device can include, among other things, desktop computers, laptops or notebook computers, mobile devices (e.g., smart phones, cell phones, personal digital assistants (PDAs), and tablets), or wearable devices (e.g., smartwatches). A user device can also include any other media content player, for example, a set-top box, a television set, a video game system, or any electronic device capable of providing or rendering data. A user device may include known computing components, such as one or more processors, and one or more memory devices storing software instructions executed by the processor(s) and data. The user device may optionally be portable. The user device may be handheld.
[119] The user device may include a display 701. The display may visually illustrate information. The information shown on the display may be changeable. The display may include a screen, such as a liquid crystal display (LCD) screen, light-emitting diode (LED) screen, organic light-emitting diode (OLED) screen, plasma screen, electronic ink (e-ink) screen, touchscreen, or any other type of screen or display. The display may or may not accept user input.
[120] The display may show a graphical user interface. The graphical user interface may be part of a browser, software, or application that may aid in the user performing a frailty prediction function using the device. The interface may allow the user to run the application using the device. The interface may be configured to receive user input as described elsewhere herein. Using the graphical user interface may or may not require user identification and/or authentication.
[121] The graphical user interface may allow a user to view frailty or longevity related metrics. The user may be allowed to select one or more frailty or longevity related metrics to view. The one or more frailty or longevity related metrics may include, for example, lifespan, frailty risk, frailty risk or lifespan relative to a group average value and the like. The group or the reference group may be predetermined by the user. For instance, a user may be permitted to determine to view a frailty risk relative to a group in a pre-determined geographic location, a group with a particular cohort, a group with the same ethnicity and the like. The graphical user interface may allow the user to set up a format of the frailty or longevity related metrics. For instance, the user may be allowed to select a user preferred format to view the result by age group, by timeline, in the form of bar graphs, pie chart, histograms, line charts, numerical numbers (e.g., risk score) or percentage (e.g., percentage of risk relative to the group), or various other forms.
[122] The user device may be capable of accepting inputs via a user interactive device 703. Examples of such user interactive devices may include a keyboard, button, mouse, touchscreen, touchpad, joystick, trackball, camera, microphone, motion sensor, heat sensor, inertial sensor, or any other type of user interactive device. For instance, a user may input user information such as command to initiate the frailty prediction calculation, non-genetic information (e.g., sex, age, ethnicity, etc) through the user interactive device.
[123] The user device may comprise one or more memory storage units which may comprise non-transitory computer readable medium comprising code, logic, or instructions for performing one or more steps. The user device may comprise one or more processors capable of executing one or more steps, for instance in accordance with the non-transitory computer readable media. The one or more memory storage units may store one or more software applications or commands relating to the software applications. The one or more processors may, individually or collectively, execute steps of the software application.
[124] A communication unit may be provided on the device. The communication unit may allow the user device to communicate with an external device. The external device may be a device of a transaction entity, server, or may be a cloud-based infrastructure. The communications may include communications over a network or a direct communication. The communication unit may permit wireless or wired communications. Examples of wireless communications may include, but are not limited to WiFi, 3G, 4G, LTE, radiofrequency, Bluetooth, infrared, or any other type of communications.
[125] The device may have an on-board power source. Alternative, an external power source may provide power to power the user device. An external power source may provide power to the user device via a wired or wireless connection. An on-board power source may power an entirety of the user device, or one or more individual components of the wireless device. In some embodiments, multiple on-board power sources may be provided that may power different components of the device. For instance, one or more sensor of the device may be powered using a separate source from one or more memory storage unit, processors, communication unit, and/or display of the device.
[126] FIG. 8 shows an exemplary process 800 of displaying longevity related metrics on a user device, such as a frailty assessment parameter, in response to a user input, in accordance with embodiments of the invention. In some cases, a user may input information into a user device
801. The input can include information to initiate a process for generating a frailty assessment for the individual. The user input information may or may not be used as part of the calculation of the frailty prediction to generate the frailty assessment parameter for the individual. In some cases, the user input provides access to genetic information of the individual used in the calculation of the frailty assessment. In some cases, a user may be prompted to provide information such as sex, age, ethnicity that may be processed during the frailty prediction. In some cases, a user may be allowed to select one or more longevity related metrics to be displayed on the user device as a result of the computation. For instance, a user may be allowed to select to view lifespan, or frailty risk relative to a group from a plurality of options. In some cases, a user may be allowed to select a format of displaying the longevity related result. The result may be displayed in text, graph, diagram, and/or numerical value form. For instance, a user may be permitted to view the predicted frailty risk by age group, by timeline, in the form of bar graphs, pie chart, histograms, line charts, numerical numbers (e.g., risk score) or percentage (e.g., percentage of risk relative to the group), or any other visual representation may be used to show the lifespan or frailty risk. Alternatively, a user may not need to provide such information in order to view the result.
[127] In some cases, the user input information may comprise confirmation indicating user consent to access genetic data of the associated user. For example, a user may be prompted to confirm whether grant access to the frailty prediction system to access genetic data of the user. In some cases, a user may be required to log into the application running on the user device by providing user credentials such as password, PIN or fingerprint. After the user is authenticated and a user command for initiating the frailty prediction calculation is received by the user device, the user identity information may be transmitted to a genetic database for retrieving genetic data associated with the user 803. The genetic data may be processed by a frailty prediction model 805. The frailty prediction model may be locally stored with the user device. Alternatively, the frailty prediction model may be stored remotely from the user device. In some cases, the frailty prediction model may be selected from a plurality of frailty prediction models according to one or more factors provided by the user input information 807. A plurality of frailty prediction models may be constructed and stored in a databased. The plurality of frailty prediction models may be constructed using datasets from different cohorts (e.g., sex, ethnicity, age, etc). For example, a frailty prediction model may be selected according to the sex, ethnicity or age of the user. Alternatively, a general frailty prediction model is used for all users.
[128] A frailty index may be calculated by the frailty prediction model 809. The calculation may be performed in part or in whole on the user device, and / or in part or in whole on a server in remote communication with the user device. For example, the server may perform the computation and transmit the computed frailty index to the user device. In another example, the server may be configured to retrieve genetic data from a database and transmit the genetic data to the user device for computation. The computed frailty index may or may not be transmitted to the server for further analysis or calculation. The frailty index may be a logarithm of the hazard ratio of the user. The frailty index may or may not be displayed to the user on the user device. In some cases, when a user input indicating a user desired longevity related result is provided, the associated longevity related result may be calculated 811 based on the frailty index. A user may be permitted to provide user options for the longevity related result at any stage. For example, a user may provide input to view lifespan or frailty risk at the initial of the process or after calculation of the frailty index. Finally, the longevity related result is displayed to the user on the user device 813.
[129] The genetic data may be retrieved from a third party database. The genetic data may be provided by a sample of tissue, blood, urine, or other substances in the body of the user and transported to a genome test site for producing the genetic data.
[130] A sample can be any biological sample isolated from a subject. For example, a sample can comprise, without limitation, bodily fluid, whole blood, platelets, serum, plasma, stool, red blood cells, white blood cells or leukocytes, endothelial cells, tissue biopsies, synovial fluid, lymphatic fluid, ascites fluid, interstitial or extracellular fluid, the fluid in spaces between cells,
including gingival crevicular fluid, bone marrow, cerebrospinal fluid, saliva, mucous, sputum, semen, sweat, urine, fluid from nasal brushings, fluid from a pap smear, or any other bodily fluids.
[131] A sample can comprise nucleic acids from different sources. For example, a sample can comprise germline DNA or somatic DNA. A sample can comprise nucleic acids carrying mutations. For example, a sample can comprise DNA carrying germline mutations and/or somatic mutations. A sample can also comprise DNA carrying cancer-associated mutations (e.g., cancer- associated somatic mutations). In some embodiments, a sample comprises one or more of: a single base substitution, a copy number variation, an indel, a gene fusion, a transversion, a translocation, an inversion, a deletion, aneuploidy, partial aneuploidy, polyploidy, chromosomal instability, chromosomal structure alterations, chromosome fusions, a gene truncation, a gene amplification, a gene duplication, a chromosomal lesion, a DNA lesion, abnormal changes in nucleic acid chemical modifications, abnormal changes in epigenetic patterns, abnormal changes in distributions of nucleic acid (e.g., cfDNA) fragments across genomic regions, abnormal changes in distributions of nucleic acid (e.g., cfDNA) fragment lengths, and abnormal changes in nucleic acid methylation.
[132] Methods herein can comprise obtaining certain amounts of nucleic acid molecules. For example, the method can comprise obtaining up to about 600 ng, up to about 500 ng, up to about 400 ng, up to about 300 ng, up to about 200 ng, up to about 100 ng, up to about 50 ng, or up to about 20 ng of nucleic acid molecules from a sample. The method can comprise obtaining at least 1 femtogram (fg), at least 10 fg, at least 100 fg, at least 1 picogram (pg), at least 10 pg, at least 100 pg, at least 1 ng, at least 10 ng, at least 100 ng, at least 150 ng, or at least 200 ng of nucleic acid molecules. The method can comprise obtaining at most 1 femtogram (fg), at most 10 fg, at most 100 fg, at most 1 picogram (pg), at most 10 pg, at most 100 pg, at most 1 ng, at most 10 ng, at most 100 ng, at most 150 ng, or at most 200 ng of nucleic acid molecules. The method can comprise obtaining 1 femtogram (fg) to 200 ng, 1 picogram (pg) to 200 ng, 1 ng to 100 ng, 10 ng to 150 ng, 10 ng to 200 ng, 10 ng to 300 ng, 10 ng to 400 ng, 10 ng to 500 ng, 10 ng to 600 ng, 10 ng to 700 ng, 10 ng to 800 ng, 10 ng to 900 ng, or 10 ng to 1000 ng of nucleic acid molecules.
[133] Isolation and extraction of polynucleotides may be performed through collection of bodily fluids using a variety of techniques. In some cases, collection may comprise aspiration of a bodily fluid from a subject using a syringe. In other cases collection may comprise pipetting or direct collection of fluid into a collecting vessel.
[134] After collection of bodily fluid, polynucleotides may be isolated and extracted using a variety of techniques utilized in the art. In some cases, DNA may be isolated, extracted and prepared using commercially available kits such as the Qiagen Qiamp® Circulating Nucleic Acid Kit protocol. In other examples, Qiagen Qubit™ dsDNA HS Assay kit protocol, Agilent™ DNA 1000 kit, or TruSeq™ Sequencing Library Preparation; Low-Throughput (LT) protocol may be used.
[135] Purification of DNA may be accomplished using any methodology, including, but not limited to, the use of commercial kits and protocols provided by companies such as Sigma Aldrich, Life Technologies, Promega, Affymetrix, IBI or the like. Kits and protocols may also be non-commercially available.
[136] After isolation, in some cases, the polynucleotides may be pre-mixed with one or more additional materials, such as one or more reagents (e.g., ligase, protease, polymerase) prior to determining the genetic variant.
[137] SNP genotyping may be accomplished using methods selected from the group consisting of hybridization methods, enzyme based methods, post amplification methods, and/or sequencing. Hybridization-based methods may comprise dynamic allele- specific hybridization, molecular beacons, and SNP microarrays. Enzyme based methods may comprise one or more of restriction fragment length polymorphism, PCR-based methods, FLAP endonuclease, primer extension, 5’nuclease, and oligonucleotide ligation assay. Post amplification methods may comprise one or more of single strand conformation polymorphism, temperature gradient gel
electrophoresis, denaturing high performance liquid chromatography, high resolution melting of the entire amplicon, use of DNA mismatch-binding proteins, SNPlex, and surveyor nuclease assay.
Computer control systems
[138] The present disclosure provides computer control systems that are programmed to implement methods of the disclosure. FIG. 9 shows a computer system 901 that is programmed or otherwise configured to perform frailty prediction. The computer system 901 can regulate various aspects of sequence analysis of the present disclosure, such as, for example, matching data against known sequences and variants. The computer system 901 can be an electronic device of a user or a computer system that is remotely located with respect to the electronic device. The electronic device can be a mobile electronic device.
[139] The computer system 901 includes a central processing unit (CPU, also“processor” and“computer processor” herein) 905, which can be a single core or multi core processor, or a plurality of processors for parallel processing. The computer system 901 also includes memory or memory location 910 (e.g., random-access memory, read-only memory, flash memory), electronic storage unit 915 (e.g., hard disk), communication interface 920 (e.g., network adapter) for communicating with one or more other systems, and peripheral devices 925, such as cache, other memory, data storage and/or electronic display adapters. The memory 910, storage unit 915, interface 920 and peripheral devices 925 are in communication with the CPU 905 through a communication bus (solid lines), such as a motherboard. The storage unit 915 can be a data storage unit (or data repository) for storing data. The computer system 901 can be operatively coupled to a computer network (“network”) 930 with the aid of the communication interface 920. The network 930 can be the Internet, an internet and/or extranet, or an intranet and/or extranet that is in communication with the Internet. The network 930 in some cases is a telecommunication and/or data network. The network 930 can include one or more computer servers, which can enable distributed computing, such as cloud computing. The network 930, in some cases with the aid of the computer system 901, can implement a peer-to-peer network, which may enable devices coupled to the computer system 901 to behave as a client or a server.
[140] The CPU 905 can execute a sequence of machine-readable instructions, which can be embodied in a program or software. The instructions may be stored in a memory location, such as the memory 910. The instructions can be directed to the CPU 905, which can subsequently program or otherwise configure. The CPU 905 to implement methods of the present disclosure. Examples of operations performed by the CPU 905 can include fetch, decode, execute, and writeback.
[141] The CPU 905 can be part of a circuit, such as an integrated circuit. One or more other components of the system 901 can be included in the circuit. In some cases, the circuit is an application specific integrated circuit (ASIC).
[142] The storage unit 915 can store files, such as drivers, libraries and saved programs. The storage unit 915 can store user data, e.g., user preferences and user programs. The computer system 901 in some cases can include one or more additional data storage units that are external to the computer system 901, such as located on a remote server that is in communication with the computer system 901 through an intranet or the Internet.
[143] The computer system 901 can communicate with one or more remote computer systems through the network 930. For instance, the computer system 901 can communicate with a remote computer system of a user (e.g., a physician). Examples of remote computer systems include personal computers (e.g., portable PC), slate or tablet PC’s (e.g., Apple® iPad, Samsung® Galaxy Tab), telephones, Smart phones (e.g., Apple® iPhone, Android-enabled device, Blackberry®), or personal digital assistants. The user can access the computer system 901 via the network 930.
[144] Methods as described herein can be implemented by way of machine (e.g., computer processor) executable code stored on an electronic storage location of the computer system 901, such as, for example, on the memory 910 or electronic storage unit 915. The machine
executable or machine readable code can be provided in the form of software. During use, the code can be executed by the processor 905. In some cases, the code can be retrieved from the storage unit 915 and stored on the memory 910 for ready access by the processor 905. In some situations, the electronic storage unit 915 can be precluded, and machine-executable instructions are stored on memory 910.
[145] The code can be pre-compiled and configured for use with a machine having a processer adapted to execute the code, or can be compiled during mntime. The code can be supplied in a programming language that can be selected to enable the code to execute in a pre compiled or as-compiled fashion.
[146] Aspects of the systems and methods provided herein, such as the computer system 901, can be embodied in programming. Various aspects of the technology may be thought of as “products” or“articles of manufacture” typically in the form of machine (or processor) executable code and/or associated data that is carried on or embodied in a type of machine readable medium. Machine-executable code can be stored on an electronic storage unit, such as memory (e.g., read only memory, random-access memory, flash memory) or a hard disk. “Storage” type media can include any or all of the tangible memory of the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide non-transitory storage at any time for the software programming. All or portions of the software may at times be communicated through the Internet or various other telecommunication networks. Such communications, for example, may enable loading of the software from one computer or processor into another, for example, from a management server or host computer into the computer platform of an application server. Thus, another type of media that may bear the software elements includes optical, electrical and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links. The physical elements that carry such waves, such as wired or wireless links, optical links or the like, also may be considered as media bearing the software. As used herein, unless restricted to non-transitory, tangible“storage” media, terms such as computer or machine“readable medium” refer to any medium that participates in providing instmctions to a processor for execution.
[147] Hence, a machine readable medium, such as computer-executable code, may take many forms, including but not limited to, a tangible storage medium, a carrier wave medium or physical transmission medium. Non-volatile storage media include, for example, optical or magnetic disks, such as any of the storage devices in any computer(s) or the like, such as may be used to implement the databases, etc. shown in the drawings. Volatile storage media include dynamic memory, such as main memory of such a computer platform. Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that comprise a bus within a computer system. Carrier- wave transmission media may take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and infrared (IR) data communications. Common forms of computer-readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a ROM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer may read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution.
[148] The computer system 901 can include or be in communication with an electronic display 935 that comprises a user interface (UI) 940 for providing, for example, information about cancer diagnosis. Examples of UI’s include, without limitation, a graphical user interface (GUI) and web-based user interface.
[149] Methods and systems of the present disclosure can be implemented by way of one or more algorithms. An algorithm can be implemented by way of software upon execution by the central processing unit 905. The algorithm can, for example, determine whether a cancer is present and/or progressing.
EXAMPLES
[150] Example 1 EiK Biobank
[151] The datasets for constructing the death prediction model and frailty prediction model are from UK Biobank. UK Biobank is a prospective cohort study of over 500,000 individuals from across the United Kingdom. UK Biobank is an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. Participants, aged between 40 and 69, were invited to one of 22 centres across the UK between 2006 and 2010. Blood, urine and saliva samples were collected, physical measurements were taken, and each individual answered an extensive questionnaire focused on questions of health and lifestyle. All participants gave written informed consent and the study was approved by the North West Multicentre Research Ethics Committee. UK Biohank has Human Tissue Authority research tissue bank approval, meaning separate ethical approvals are not required to use the existing data. Genotyping is in progress, with a wave 1 public release of genotypes for -150,000 participants in June/July 2015. Phenotypes and genotypes data are available directly from UK Biobank.
[152] Example 2 Genotyping and Imputations
[153] Genotyping and impuations were conducted by UK Biobank. UK Biobank participants were genotyped on two slightly different arrays and quality control was performed by UK Biobank. 49,982 samples were genotyped as part of the UK BiLEVE study using a newly designed array, with 102,754 remaining samples genotyped on an updated version (UK Biobank Axiom array), both manufactured by Affymetrix (96% of SNPs overlap between the arrays). Samples were processed and genotyped in batches (used as covariates to control for confounding due to batch effects). In brief, SNPs or samples with high missingness, multi-allelic SNPs and SNPs with batchwise departures from Hardy-Weinberg equilibrium were removed from the data set. After quality control, genotypes were available for 152,736 subjects at 847,441 sites, of which 815,490 were located on autosomes, 21,231, 1,041 and 310 at X, Y and MT chromosomes respectively. UK Biobank provided 15 principal components of genetic relatedness (UK Biobank field id 22009) and a binary assessment of whether subjects were genomically British (UK Biobank field id 22006), based on principal components analysis of their genetic data.
[154] Imputed data were prepared by UK Biobank. In summary, autosomal phasing was carried out using a version of SHAPEIT2 modified to allow for large sample sizes. Imputation was carried out using IMPUTE2 using the merged UK10K and 1,000 Genomes Phase 3 reference panels to yield higher imputation accuracy of British haplotypes. The imputations resulted in 73,355,362 SNPs, short indels and large structural variants, imputed in 152,727 individuals.
[155] Example 3 UK Biobank data used in constructing frailty prediction model
[156] From total of 502,639 individuals, 322,412 non-genotyped self-identified British were analyzed to develop the death prediction model. Of the 152,727 genotyped participants, 120,286 were self-identifying British, and these samples were assessed as genetically British by UK Biobank and had related quality tests passed (data-field 22006). These were analyzed genome wide for association with the estimated frailty (i.e., frailty index) in a discovery phase for identifying the longevity associated genes. In the GWAS replication phase to verify the identified longevity associated genes, genotyped subjects not included in the discovery phase were analyzed, which were determined as possibly genetically not British and not self-identified as‘British’. Also, samples that had reported vs. genetically inferred sex mismatch were excluded following UK Biobank recommended genomic analysis exclusions (data-field 22010 and samples that have
genetic relatedness pairing datafield 22011 were both excluded). Remaining self-declared ethnicities that had fewer than 300 subjects or were ambiguous (Other ethnic group’,‘Prefer not to answer’) were not analyzed. This resulted in 27,686 subjects.
[157] Example 4 Data Preparation for Constructing Death Prediction Model and Frailty Prediction Model
[158] A diverse set of measurements from UK Biobank study were used to construct a Cox PH model that has a good fit on mortality data. The following parameters were used: blood assay results, cognitive tests, physical measures (e.g., spirometry, anthropometry, blood pressure, grip strength), touchscreen questionnaires related the following life factors: lifestyle and environment, health and medical history, family history, psychosocial factors, early life factors, and other data (124 data-fields in total in Table 1). Also, death register data required to build Cox PH model was used, which represents right-censored mortality data of two parameters: 1) follow up time in years for each person (defined as difference between date of person was last observed or the date of death and the date the person took the blood tests), 2) Boolean marker indicating whether death occurred. The mean follow-up time was 7 years with max of 10 years. In the full cohort of 502,639 individuals, 14,419 events were observed. For data-field representing continuous or integer values, Box-Cox transformation were applied for normalization of the date- field. Then, 442,698 samples that have reported British ethnicity were selected. These data were split into train datasets and test datasets based on whether genetics data are available or not for a given sample. 322,412 individuals who did not have genetic data were chosen as train cohort and 120,286 samples with genetics data available were chosen for testing cohort. In each cohort, missing data were imputed with mean values and normalized to zero mean and unit variance of 1.
[159] Table 1 Cox PH model data fields
[160] Cox PH model was implemented in R script. The Cox PH model performance was evaluated by calculating a concordance index between the predicted and actual mortality data. Fitted Cox PH model was then used for prediction on discovery cohort. Logarithm of hazard ratio prediction for each sample in testing cohort was obtained (lnHR, defined as ln(Hi= < Hi >), where Hi represents individual hazard rate and < Hi > is the mean value of hazard rate for this cohort). These values represent a measure of one’s frailty and susceptibility to premature death relative to population’s mean risks and were further used as a phenotype for GWAS. GWAS was performed for six traits: LnHR and five blood tratis: white blood cell (leukocyte) count (denoted as ukb30000), red blood cell (erythrocyte) count (denoted as ukb300l0), mean corpuscular volume (denoted as ukb30040), mean corpuscular haemoglobin (denoted as ukb30050) and platelet count (denoted as ukb30080).
[161] Example 5 Association Testing
[162] RegScan was used for genome-wide association testing. RegScan is a command line tool for performing fast association analysis between allele frequencies and continuous traits. It uses linear regression to estimate marker effects on continuous traits. The traits analyzed were adjusted for sex, age, genotyping array platform (Axiom or Affymetrix) and the first 10 genetic principal components (UK Biobank data field 22009) were used in the linear regression model. Residuals were inverse normal transformed with customized R script to zero mean and unit variance and used as an input to RegScan. RegScan was run in GWAS mode with default parameters, for each chromosome file separately.
[163] Imputed variants with minor allele count (MAC) more than 50 and imputation information score more than 0.3 were used for discovery cohort genotypes. Variants that led to
more than two alleles in the same genomic position were excluded, leaving only bialleleic SNPs. Also a few SNPs that had the same rsID in different genomic locations were excluded. Call files for X, Y and MT chromosomes were converted to ped+map file format using tool provided by UK Biobank. PLINK which is a free, open-source whole genome association analysis toolset was used to convert ped+map to general file format suitable for RegScan as input.
[164] Example 6 Conditional and joint multi-SNP analysis
[165] Conditional and joint analysis implemented in the program genome -wide complex trait analysis (GCTA) was used to select a subset of associated SNPs with p-values less than rq = 5 x 10-8 and rq = 5 x 10-5. GCTA estimates the variance explained by all the SNPs on a chromosome or on the whole genome for a complex trait rather than testing the association of any particular SNP to the trait. As input, this method uses meta-analysis summary statistics and a reference panel of SNPs to estimate the linkage disequilibrium (LD). The method starts with the "top SNP"(the one with smallest p-value, conditional that p < pO, where pO is specific threshold defined by user) in the meta-analysis and then the p-values for all the remaining SNPs are calculated conditional on the selected SNP. It then selects the next top SNP in the conditional analysis (p < pO) and proceeds to fit all the selected SNPs in the model meanwhile dropping all those SNPs with p-values > pO. The iteration continues until no SNP is added or dropped from the model thus finding a subset of associated SNPs with a threshold for LD (r2 < 0:9) between SNPs. Finally, a joint analysis of the subset of associated SNPs is performed.
As the LD reference, sub-sample of randomly chosen 10,000 people from the total set of 120,286 people were used for GW AS discovery phase. In addition to previous SNP filters described above, in selecting LD reference data, SNPs with imputation information score less than 0.7 and minor allele frequency (MAF) less than 0.002 were filtered out.
[166] In the replication phase, the SNPs from conditional and joint analysis were analyzed for association with frailty estimated for 27,686 UKB participants that were not analyzed in the discovery phase. The association with frailty was assessed using the methods used for discovery phase, including adjustment for sex, ethnic group, 10 genetic principal components, genotyping batch and the like.
[167] Example 7 Heritability and genetic correlation analyses
[168] LD hub (a centralised database of summary -level GW AS results and a web interface for LD score regression) tools were used for estimation of captured heritability and genetic correlations for six traits (logarithm of hazard ratio and five blood traits) and 170 human traits and common diseases. LD score regression tool were used for estimation the genetics correlations between logarithm of hazard ratio and the five blood traits. All GW AS summary statistics were filtered by SNP quality r2>0.7 and MAF>0.05 (7001988 SNPs in total). For further calculations, 1,162,742 SNPs defined by overlap between the identified set of SNPs using the disclosed method and’high quality SNPs’ as suggested by authors of LD hub (these represent common HapMap3 SNPs that usually have high imputation quality; also, this set excludes HLA region) were used. These 1,162,742 SNPs were used for further analysis of heritability and genetics correlations, and also for estimation of genomic control inflation factor.
[169] FIG. 5B shows genetic correlation matrix and clustering for traits having high genetic correlations with predicted frailty. To build matrix of genetic correlations, the matrix of genetic correlations for 196 traits provided by LD hub tool were downloaded and selected all traits that overlapped with the 170 human traits used for calculation of genetic correlations with predicted frailty and removed all duplicated traits by using only the most recent study (as indicated by the largest PMID number). This filtering has led to the total of 123 traits. Then traits with genetic correlation significance p-value<0.0l/l75 and |pg| > 0.3 were selected leading to 19 traits (including predicted frailty).
[170] Example 8 In silico functional analysis
[171] For prioritizing genes in associated regions, gene set enrichment and tissue or cell type enrichment analyses, DEPICT software (an integrative tool that based on predicted gene functions systematically prioritizes the most likely causal genes at associated loci, highlights enriched pathways, and identifies tissues/cell types where genes from associated loci are highly expressed) were used. Independent (as selected by Conditional and joint analysis procedure) variants with p < 5xl0 8 (23 SNPs) and p < lxlO 5 (185 SNPs) were included into analysis. A subset of 10,000 individuals from UK Biobank were used for computations of LD (the same subset as used for Conditional and joint analysis analysis).
[172] PAINTOR software (a probabilistic framework that integrates association strength with genomic functional annotation data to improve accuracy in selecting plausible causal variants for functional validation) was used to prepare the set of SNPs for functional annotation. To make this analysis clumping results, LD matrices and annotation files to PAINTOR were provided. Using PLINK and 10,000 samples reference set described above (the same subset as used in Conditional and joint analysis and DEPICT analyses), clumping analysis was set with pi and p2 p-value thresholds as p < 5xl0 8, r2 as 0.1 and MAF>0.002. Then, using the same reference set, pair-wise correlation matrix was generated for all SNPs in each region in clumping analysis results using PLINK— r option. Text files filled with ones were used as annotation files. In the next step, all output results were aggregated into one file and SNPs marked by PAINTOR as 99% credible set were chosen for functional annotation by VEP version with GRCH37 genomic reference.
[173] To test for potential pleiotropic effects of identified variants, following procedure was conducted. Starting with significant and replicated index SNP, traits which may be affected by variation in the same region were screened, following by pleiotropy vs. linkage disequilibrium test (HEIDI test). In the screening stage, phenoscanner was used to look up complex traits that have demonstrated p < 5xl0 8 for association with the index SNP (or with a SNP having r2 > 0.7 with the index SNP). Regional summary level GWAS results, including signed regression coefficient estimates and standard error of these estimates were required to perform HEIDI testing. For complex traits identified, latest and biggest GWAS results were collected to perform HEIDI test. When summary-level GWAS data (which should minimally include effective allele, p-value for association, and regression coefficient estimate) were available, it was verified whether p < 5xl0 8 was indeed observed for the index SNP (or one of the strongly associated’proxy’ SNPs in case the index SNP was missing in the secondary GWAS) before performing HEIDI test. Overlap between the results using the provided method and blood and GTEx eQTLs, using similar procedure with modifications were compared. For blood eQTLs, phenoscanner was not used in the screening phase, but rather tested directly if association results for specific probe are reported in the region of interest and, if positive, tested whether index/proxy SNP had p < 5xl0 8 in eQTL analysis; if positive, the HEIDI test was performed. For analysis involving GTEx data, phenoscanner was used to identify the tissues of interest, and, with selected tissues, performed the same analysis as for blood eQTLs described above.
[174] The implementation of the HEIDI test follows the method was described by Zhu et al. (l0.l038/ng.3538). For specific region, statistics TH =
were computed, where m is the number of markers selected for the test and za© is the scaled measure of deviation of the ratio of the regression coefficients from the primary and the secondary GWAS from that at the top associated SNP. In analysis, the same matrices of LD as these used of the PAINTOR analysis were used. For testing, up to 20 SNPs were selected in the following manner:
1. Define the set of eligible SNPs as these that fall into +/- 250 kb from the top primary GWAS association, having c2 > 10 in primary GWAS, and having results reported in secondary GWAS
2. Make empty’’target” (St) and’’rejected” (St) SNP sets
3. Select current SNP with the lowest p from the primary GWAS that is neither in the target nor in the rejected set
4. If current SNPs has r2 > 0.9 (computed with PLINK 1.9) with any SNP in the target SNP set, add current SNP to the rejected set
5. Otherwise add current SNP to the target set
6. Repeat from the third step until either the eligible SNP set is exhausted or the target set has 20 SNPs
7. Use target set for HEIDI test
[175] In some cases, when the number of SNPs was less than 3, no test was performed.
[176] Before contrasting regional patterns of association of two traits using the HEIDI test which contrasts the pleiotropy vs. linkage disequilibrium hypotheses, it was first required that strong criteria are met for association of these two traits to the region. Namely, for the index SNP, p < 5xl0 8 should have been observed for both traits, moreover, it was requested pSMR < 10 6 (which translated into a requirement of p < 1027 for the second GWAS if the first GW AS had p = 5 8 or a requirement of p < l0 14 for the second GWAS if the first GWAS had p = 5 10). Multiple HEIDI testing will necessarily generate relatively low p-values even when hypothesis of pleiotropy is true; additionally, differences between patterns of association may be generated (and/or exaggerated) by local differences in LD between two populations where GWAS were performed. Patterns of association between two GWAS are sufficiently dissimilar if it was observed pHEIDI
< 104 (hypothesis of pleiotropy rejected). However, because the cost of error (i.e. costs of functional follow-up in the wet lab) are high, it may be that patterns of association are sufficiently similar (and hence considered pleiotropy as the likely explanation) if it was observed pHEIDI < 0.05.
[177] Example 9 Genome Wide Association Study Results
[178] The frailty for the cohort of 120,286 genetically British UKB participants was computed. The frailty was adjusted for sex, age, SNP array, and 10 first genetic principal components and used as a phenotype in GWAS with 31,637,279 imputed autosomal and 21,231 directly typed X-chromosome variants. Using a median estimator, the genomic control inflation parameter l was estimated to be 1.25. Using the LD score regression, the trait heritability was estimated to be 0.118 (s.e. =0.0076), and the LD score intercept was estimated to be 1.0248. The results may not corrected for l, so the test statistic may be slightly inflated. The GWAS analysis resulted in 23 genome-wide significant (defined at p < 5xl0~8) loci, all located on autosomes (FIG 4 and FIG. 10 the Manhattan plot for GWAS on frailty phenotype). FIG. 10 shows that certain SNPs on certain chromosomes have increased association with the frailty phenotype, such as SNPs on one or more of chromosomes 1, 2, 3, 5, 6, 7, 8, 9, 10, 12, 15, 17, 20 and 22. For example, SNPs with -log io(p- value) value greater than a threshold value, such as the threshold value as shown by the horizontal line in FIG. 10, can be designated as SNPs with increased association the frailty phenotype. The SNPs with the increased association as shown from FIG. 10 correspond to those 23 SNPs as listed in the table of FIG. 4.
[179] Next, the obtained results were replicated using the data from the 27,686 genotyped UKB participants not included in the discovery phase (FIG 4). Of the 23 SNPs with values of p
< 5 x 10 8 in discovery set, all but two have demonstrated consistent sign of association in replication (binomial p < 104). From the two SNPs, rsl4376l99l on chromosome 3 at 95 Mb and rs 11852372 on chromosome 15 at 79 Mb, that did not have consistent sign in replication, the former was relatively rare in both replication (MAF=0.006) and discovery (MAF=0.007) and had relatively low imputation quality of 0.74. The latter had high imputation quality (0.99) and high MAF (~0.3) in both replication and discovery. Of the 21 SNPs with consistent sign, six associations were found to be significant p < (0.05/23) after multiple testing correction, and were labeled as’replicated’ in the following examples.
[180] Example 10 In Silico Functional Genomic Analysis of Associated Regions
[181] Using the LD hub resource and results of GWAS for five blood parameters: white blood cell (leukocyte) count, red blood cell (erythrocyte) count, mean corpuscular volume, mean corpuscular haemoglobin, platelet count from UKB, the genetic correlation was estimated to be between estimated frailty and 175 complex traits. Of these, significant genetic correlations (defined as p < 0.01/170) were observed with 42 traits (See FIG. 5A). Of notice, high (|pg|>0.3)
and strong (all p < 10 5) positive (consistent with increased frailty) genetic correlations were observed with waist-to-hip ratio (25673412), depressive symptoms (27089181), lung cancer (27488534, 24880342), asthma (17611496), and cigarette smoking (20418890). Negative (protective) genetic correlation was observed with education (27225129, 23722424, 25201988, 23722424), age at first birth (27798627), parental age-at-death (27015805), and former vs. current smoking (20418890).
[182] DEPICT was run with all independent variants with p < 10 5 (185 SNPs in total). Significant enrichment of expression have not been identified across all tissues and cell types (FDR>=0.2), but the tendency for components of hemic and immune systems to have significant nominal p- values (p-value<=0.0l) were detected. Similar results were observed when analyzing enrichment of the expression of genes located around 23 independent genome-wide significant variants. A credible set analysis with was determined using PAINTOR to identify the sets of SNPs comprising the high potential to be causative. The SNPs from 99% credible set were annotated using the Variant Effect Predictor (REF). The locus on chromosome 1 at 248 Mb displayed a very interesting result. According to VEP analysis, it was observed that the credible set may have included a damaging variant in the TRIM58 gene and the set for chromosome 20 at 58 Mb may have included a damaging variant in TUBB 1 gene.
[183] Next, the overlap between associations obtained were investigated, using a phenoscaner (ref) software. For the top associated index SNPs (FIG 4), traits that have demonstrated genome-wide significant (p < 5 xlO 8) association at the same or at strongly (r2 < 0.7) linked SNP. For 12 out of 23 significantly associated regions, we observed co-associations with a number of complex traits. Some of these traits (or similar) were already indicated in the genetic correlation analyses (e.g. smoking, lung cancer, asthma, type 2 diabetes).
[184] Credible set analysis with PAINTOR was used to identify sets of SNPs with high potential to be causative. The SNPs from 99% credible set were annotated using Variant Effect Predictor. Among the most interesting results, for the locus on chromosome 1 at 248 Mb, it has observed that credible set included probably damaging variant in TRIM58 gene; the set for chromosome 20 at 58 Mb included probably damaging variant in TUBB1.
[185] Next, the regulatory potential of discovered loci using a collections of ’omics’ GWAS was investigated, including multi-tissue GTEx eQTLs, blood eQTLs estimated in large meta- analysis. It was checked first if the index (or its proxy) SNP associated with predicted frailty (FIG. 4) had p < 5 x 10-8 for association with the expression level as well. If this was the case, next tests that assessed significance of the the Mendelian Randomisation (MR) effect of expression was performed onto predicted frailty (SMR) and the HEIDI test of pleiotropy versus linkage disequilibrium (REF-ZHU) was performed. For 15 regions, analysis was performed with at least one expression probe. A threshold of pSMR < 10-6 was used to decide that the twosample Mendelian randomisation (MR) relation between two traits is significant; in case of significant relation, hypothesis of pleiotropy was considered as likely explanation if pHEIDI > 0.05, and this hypothesis was rejected if pHEIDI < 10-4. Using these criteria strong candidate genes were suggested for at least five regions (see FIG. 11).
[186] FIG. 11 shows a list of candidate genes in regions associated with predicted frailty, as suggested by presence of missense mutations and/or SMR HEIDI.
[187] Among five regions for which strong candidate genes were found via SMR/HEIDI analysis, for two (containing the MACF1 and the TRIM58 candidate genes) association to the predicted frailty was replicated. Additional analyses were performed, trying to understand effects of these two regions on other complex phenotypes. According to phenoscanner analysis, chromosome 1 region at 40 Mb (the MACFlregion) is associated to levels of C reactive protein, HDF and TG (24097068). The second region on chromosome 1 (at 248 Mb, TRIM58 region), was previously associated with blood parameters, such as platelet count and variability in red blood cell volumes (distribution width, RDW). Also, association between this region and levels of Dynein Fight Chain Roadblock-Type 1 protein (DYNFRB 1 gene) were reported in a recent study (REF-SUHREproteinsNatComm). To test the hypothesis whether the same functional variant is
responsible for association of predicted frailty and these traits to the TRIM58 region, SMR/HEIDI analysis was performed.
[188] For the MR relation between predicted frailty and plasma levels of Dynein Light Chain Roadblock-Type 1 protein, strong pSMR = 2 xlO-7 suggesting significant MR coefficient was obtained. Further, it was detected that while SMR test is rather significant for platelet count (pSMR = 10-10) and RDW (pSMR = 1.6 x 10-15), the HEIDI test is not significant (pHEIDI > 0.1), indicating likely pleiotropy. Note, however, that for the latter two analyses platelet count was used and RDW GWAS was performed on the UKB data.
[189] Example 11 Possible Candidate Genes (TRIM 58 and MACF1)
[190] It was discovered that at least MACF1 and TRIM58 are as candidate genes for human predicted frailty. It was discovered that the ancestral allele C (frequency 0.67) of rs3811444 is associated with increased frailty (p = 1.6 xlO 20). The derived allele T results in the amino acid substitution T374M in the TRIM58 gene, which encodes for a ubiquitin ligase induced during late erythropoiesis. Such an amino acid change was predicted to be "probably damaging" by the Polymorphism Phenotyping v2 tool. Contrarily, it is predicted to be neutral by the Protein Variation Effect Analyzer software and was classified as“tolerated” by the sorting intolerant from tolerant tool. The current methods for computational prediction of mutation functionality may not be perfect. Additionally, for late-onset, largely evolutionary neutral traits, the prediction of SNP functionality based on the level of evolutionary conservation may not be useful. The SNP rs38l 1444 allele C is associated with increased platelet count (22139419), decreased variability in red blood cell (RBC) volume, and increased abundance of RBC oleic acid (25500335). This SNP also associates with whole blood concentration of stearoylcarnitine levels, and expression of nearby genes SMYD3 and OR2W3 and trans-located gene JAM3. The association of rs3811444 with expression of OR2W3 is also reported by the Blood eQTL browser. Additionally, many of these parameters, such as increased and decreased or only decreased platelet count, increased variability in the RBC volume and increased stearoylcarnitine levels were associated to general or specific mortality in general or specific populations. It was observed that the allele C of rs3811444 is significantly associated with decreased levels of expression of the TRIMP58 gene as accessed with the ILMN_l705458 probe, and the hypothesis of pleiotropic effect could not be rejected. It is also interesting to note that the CpG site associated with TRIM58 is subject to hypermethilation upon aging. In mice, it was demonstrated that (JAM)-C (homologue of human JAM3) deficient animals have leukocytic pulmonary infiltrates, disturbed neutrophil homeostasis, and increased postnatal mortality. It has been suggested that JAM-C deficiency affects the adaptive humoral immune response against pathogens, in addition to the innate immune system.
[191] MACF1 is a ubiquitously expressed cytoskeletal linker and is considered as anti longevity candidate gene. It was discovered that the frailty-increasing allele T of the the rs 17513135 SNP that has the strongest association with predicted frailty in chromosome 1 region at 40Mb, is likely exhibiting a pleiotropic effect onto (or acts through) increasing the expression of MACF1 gene. It was known that MACF1 plays a huge role in different development processes as well as in parthenogenesis of wide spectrum of diseases, particularly ageing and chronic inflammatory diseases. Experimental validation also showed high expression of MACF1 in several lung cancer subtypes, especially in lung adenocarcinoma and squamous cell carcinomas. MACF1 knockdown dramatically impaired the reproductively of the solid tumors. MACF1 may nave evolutionary conservative role in ontogenesis of evolutionary distant organisms.
[192] Of the six best candidate genes (FIG. 11) MSRA has the one of most significant expression level alteration in our analysis (proxy SNP rs7832708, p = 5.7 xlO 60). MSRA encodes methionine sulfoxide reductase A, which is involved in damage repair resulting from oxidative stress. According to GeneAge database, longevity studies in from invertebrates suggest a role for MSRA in ageing. Over-expression of a MSRA homologue in fmit flies extends, and disruption of MSRA in mice decreases lifespan, respectively. In humans, MSRA has been associated with
agerelated diseases, such as Alzheimer’s disease. MSRA expression level is negatively associated with frailty in the disclosed analysis.
[193] While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.
Claims
1. A method of determining frailty, comprising: receiving an input from a user, the input comprising a request for a frailty assessment; and displaying a frailty assessment parameter in response to a frailty prediction model based on a plurality of SNPs, wherein the plurality of SNPs comprises at least one of rs76207570, rs38H444, rs2250l27, rs9272588, rs55964818, or rs4332427.
2. The method of claim 1, wherein the plurality of SNPs comprises at least one of rs76207570, rs3811444, rs22450l27, rs9272588, rs559648l8, and rs4332427.
3. The method of claim 1, wherein the plurality of SNPs further comprises at least one of rs674195l, rsl4376l991, rs3580H34, rs3465l, rs689l62l, rsl0947428, rs7808664, rsl3282l06, rs 10793962, rsl50080415, rsl l852372, rs3743445, rs9892942, rs7502233, rs4633l2, rs62217799, or rsl43728l89.
4. The method of claim 1, wherein the plurality of SNPs further comprises at least one of rs674195l, rsl4376l991, rs3580H34, rs3465l, rs689l62l, rsl0947428, rs7808664, rsl3282l06, rs 10793962, rsl50080415, rsl l852372, rs3743445, rs9892942, rs7502233, rs4633l2, rs62217799, and rs 143728189.
5. The method of claim 1, wherein the frailty prediction model is configured to provide a frailty indicator corresponding to an explained variance of 1.5.
6. The method of claim 1, further comprising generating a frailty indicator for the frailty assessment in response to the frailty prediction model, wherein the frailty prediction model comprises a plurality of coefficients for the plurality of SNPs, and wherein generating the frailty indicator comprises performing a calculation using the plurality of coefficients.
7. The method of claim 0, wherein the plurality of coefficients comprises a respective coefficient corresponding to each of the plurality of SNPs.
8. The method of claim 1, further comprising: receiving a plurality of coefficients for the plurality of SNPs; and generating a frailty indicator for the frailty assessment in response to the frailty prediction model using the plurality of coefficients.
9. The method of claim 0, wherein the plurality of coefficients comprises a respective coefficient corresponding to each of the plurality of SNPs.
10. The method of claim 1, further comprising receiving a frailty indicator and generating the frailty assessment parameter based on the frailty indicator.
11. The method of claim 0, wherein the frailty indicator is indicative of a relative hazard value.
12. The method of claim 0, wherein the frailty indicator is indicative of an expected lifespan at a predetermined age of the individual.
13. The method of claim 0, wherein the frailty indicator is indicative of a rate of death at a predetermined age of the individual relative to a rate of death at the predetermined age of a predetermined group of individuals.
14. The method of claim 0, wherein the frailty indicator is indicative of an assessment of genetic predisposition for premature death.
15. The method of claim 1, further comprising receiving data indicative of genetic information of the user for the frailty assessment.
16. A tangible storage medium comprising instructions configured to:receive a user input comprising a request for a frailty assessment; and display a frailty assessment parameter in response to a frailty prediction model based on a plurality of SNPs, wherein the plurality of SNPs comprises at least one of rs76207570, rs38H444, rs2250l27, rs9272588, rs559648l8, or rs4332427.
17. The tangible storage medium of claim 0, further comprising the frailty prediction model, wherein the model is configured to output the frailty assessment parameter in response to the plurality of SNPs.
18. The tangible storage medium of claim 0, wherein the plurality of SNPs comprises at least one of rs76207570, rs38H444, rs22450l27, rs9272588, rs559648l8, and rs4332427.
19. The tangible storage medium of claim 0, wherein the plurality of SNPs further comprises at least one of rs674l95l, rsl4376l99l, rs3580H34, rs34651, rs689l621, rsl0947428, rs7808664, rsl3282l06, rsl0793962, rsl500804l5, rsl l852372, rs3743445, rs9892942, rs7502233, rs4633 l2, rs622l7799, or rsl43728189.
20. The tangible storage medium of claim 0, wherein the plurality of SNPs further comprises at least one of rs674l95l, rsl4376l99l, rs3580H34, rs34651, rs689l621, rsl0947428, rs7808664, rsl3282l06, rsl0793962, rsl500804l5, rsl l852372, rs3743445, rs9892942, rs7502233, rs4633 l2, rs622l7799, and rsl43728l89.
21. The tangible storage medium of claim 0, further comprising instructions to receive a plurality of coefficients for the plurality of SNPs and to store the plurality of coefficients in the tangible storage medium.
22. The tangible storage medium of claim 0, wherein the plurality of coefficients comprises a respective coefficient for each of the plurality of coefficients.
23. The tangible storage medium of claim 0, wherein the frailty prediction model is configured to provide a frailty indicator corresponding to an explained variance of at least 1.5.
24. The tangible storage medium of claim 0, further comprising instmctions to generate a frailty indicator for the frailty assessment in response to the frailty prediction model.
25. The tangible storage medium of claim 0, further comprising instructions to generate the frailty indicator using a plurality of coefficients for the plurality of SNPs.
26 The tangible storage medium of claim 0, wherein the plurality of coefficients comprises a respective coefficient for each of the plurality of coefficients.
27. The tangible storage medium of claim 0, further comprising instructions to receive a frailty indicator and to generate the frailty assessment based on the frailty indicator.
28. The tangible storage medium of claim 0, wherein the frailty indicator is indicative of a relative hazard value.
29. The tangible storage medium of claim 0, wherein the frailty indicator is indicative of an expected lifespan at a predetermined age of the individual.
30. The tangible storage medium of claim 0, wherein the frailty indicator is indicative of a rate of death at a predetermined age of the individual relative to a rate of death at the predetermined age of a predetermined group of individuals.
31. The tangible storage medium of claim 0, wherein the frailty indicator is indicative of an assessment of genetic predisposition for premature death.
32. The tangible storage medium of claim 0, further comprising instructions to receive data indicative of genetic information of the user for the frailty assessment.
33. A system for determining frailty, comprising the tangible storage medium of any one of claims 0 to 0, and a processor configured to execute the instructions.
34. A method for generating a frailty prediction model, comprising: constructing a training model based on mortality data of a first population of deceased individuals; determining, using the training model, a frailty index for individuals of a second population; identifying a plurality of SNPs associated with human frailty based at least on (1) genetic data of the individuals of the second population and the (2) frailty index for the individuals of the second population; and generating the frailty prediction model based on the plurality of SNPs, wherein the frailty prediction model is configured to determine a frailty indicator of an individual.
35. The method of claim 0, wherein the plurality of SNPs comprises at least one of rs76207570, rs38H444, rs2250127, rs9272588, rs559648l8, and rs4332427.
36. The method of claim 0, wherein the plurality of SNPs comprises at least one of rs76207570, rs38H444, rs22450l27, rs9272588, rs559648l8, or rs4332427.
37. The method of claim 34, wherein the plurality of SNPs further comprises at least one of rs3811444, rs22450127, rs9272588, rs559648l8, rs4332427, rs674l951, rsl4376l991, rs3580H34, rs3465l, rs689l62l, rsl0947428, rs7808664, rsl3282l06, rsl0793962, rs 150080415, rsl 1852372, rs3743445, rs9892942, rs7502233, rs4633l2, rs62217799, or rsl43728l89.
38. The method of claim 34, wherein the plurality of SNPs further comprises at least one of rs38H444, rs22450127, rs9272588, rs559648l8, rs4332427, rs674l951, rsl4376l991, rs3580H34, rs3465l, rs689l62l, rsl0947428, rs7808664, rsl3282l06, rsl0793962, rsl 50080415, rsl l852372, rs3743445, rs9892942, rs7502233, rs4633l2, rs622l7799, and rsl43728l89.
39. The method of claim 0, wherein constructing the training model comprises constructing a COX proportional hazard model.
40. The method of claim 0, wherein the second population of individuals comprises living individuals.
41. The method of claim 0, wherein generating the frailty prediction model comprises generating a linear regression model.
42. The method of claim 0, wherein generating the frailty prediction model comprises generating a weighted sum of the plurality of SNPs.
43. The method of claim 0, wherein the frailty prediction model is configured to receive as an input genetic data of the individual and provide as an output the frailty indicator in response to the input.
44. The method of claim 0, wherein the frailty indicator is indicative of a relative hazard value.
45. The method of claim 0, wherein the frailty indicator is indicative of an expected lifespan at a predetermined age of the individual.
46. The method of claim 0, wherein the frailty indicator is indicative of a rate of death at a predetermined age of the individual relative to a rate of death at the predetermined age of a predetermined group of individuals.
47. The method of claim 0, wherein the frailty indicator is indicative of an assessment of genetic predisposition for premature death.
48. The method of claim 0, wherein the mortality data comprises death register data of the deceased individuals of the first population and non-genetic data for the deceased individuals of the first population.
49. The method of claim 0, wherein the non-genetic data comprises a plurality of non-genetic traits.
50. The method of claim 0, wherein the plurality of non-genetic traits is selected from the group consisting of white blood cell count, red blood cell count, mean corpuscular volume, mean corpuscular hemoglobin, and platelet count.
51. The method of claim 0, further comprising receiving the mortality data from a preexisting database.
52. The method of claim 0, wherein the frailty index is a logarithm of a hazard ratio value.
53. The method of claim 0, wherein the hazard ratio value is ratio between a hazard rate of an individual and a mean hazard rate of a predetermined group of individuals.
54. The method of claim 0, wherein generating the frailty prediction model comprises generating a coefficient for each of the plurality of SNPs, wherein the coefficient is indicative of an association with human frailty.
55. The method of claim 0, wherein generating the coefficient comprises multiple linear regression analysis.
56. The method of claim 0, wherein the frailty indicator determined by the frailty prediction model corresponds an explained variance of at least 1.5.
57. The method of claim 0, wherein identifying the plurality of SNPs comprises a genome- wide association study (GWAS).
58. The method of claim 0, wherein identifying the plurality of SNPs comprises identifying a set of SNPs having a predetermined correlation with human frailty and further selecting a subset of SNPs from the set of SNPs.
59. The method of claim 0, wherein conditional and joint analysis is used to select for the subset of SNPs.
60. The method of claim 0, wherein the GWAS is configured to produce an effect allele frequency value of at most about 0.8.
61. A tangible storage medium comprising instructions configured to: construct a training model based on mortality data of a first population of deceased individuals; determine, using the training model, a frailty index for individuals of a second population; identify a plurality of SNPs associated with human frailty based at least on (1) genetic data of the individuals of the second population and the (2) frailty assessment for the individuals of the second population; and generate a frailty prediction model based on the plurality of SNPs, wherein the frailty prediction model is configured to determine a frailty indicator of an individual.
62. The tangible medium of claim 0, wherein the plurality of SNPs comprises at least one of rs76207570, rs38H444, rs2250l27, rs9272588, rs559648l8, and rs4332427.
63. The tangible medium of claim 0, wherein the plurality of SNPs comprises at least one of rs76207570, rs38H444, rs22450l27, rs9272588, rs559648l8, or rs4332427.
64. The tangible medium of claim 0, wherein the plurality of SNPs comprises at least one of rs76207570, rs38H444, rs2250l27, rs9272588, rs559648l8, rs4332427, rs674l951, rsl4376l99l, rs3580H34, rs3465l, rs689l62l, rsl0947428, rs7808664, rsl3282l06, rs 10793962, rsl50080415, rsl l852372, rs3743445, rs9892942, rs7502233, rs4633l2, rs62217799, or rsl43728l89.
65. The tangible medium of claim 0, wherein the plurality of SNPs comprises at least one of rs76207570, rs38H444, rs2250l27, rs9272588, rs559648l8, rs4332427, rs674l951, rsl4376l99l, rs3580H34, rs3465l, rs689l62l, rsl0947428, rs7808664, rsl3282l06, rs 10793962, rsl50080415, rsl l852372, rs3743445, rs9892942, rs7502233, rs4633l2, rs62217799, and rs 143728189.
66. The tangible medium of claim 0, wherein the instructions configured to construct the training model comprises instructions to construct a COX proportional hazard model.
67. The tangible medium of claim 0, wherein the second population of individuals comprises living individuals.
68. The tangible medium of claim 0, wherein the instructions configured to generate the frailty prediction model comprises instructions to generate a linear regression model.
69. The tangible medium of claim 0, wherein the instructions configured to generate the frailty prediction model comprises instructions to generate a weighted sum of the plurality of SNPs.
70. The tangible medium of claim 0, wherein the frailty prediction model is configured to receive as an input genetic data of the individual and provide as an output the frailty indicator in response to the input.
71. The tangible medium of claim 0, wherein the frailty indicator is indicative of a relative hazard value.
72. The tangible medium of claim 0, wherein the frailty indicator is indicative of an expected lifespan at a predetermined age of the individual.
73. The tangible medium of claim 0, wherein the frailty indicator is indicative of a rate of death at a predetermined age of the individual relative to a rate of death at the predetermined age of a predetermined group of individuals.
74. The tangible medium of claim 0, wherein the frailty indicator is indicative of an assessment of genetic predisposition for premature death.
75. The tangible medium of claim 0, wherein the mortality data comprises death register data of the deceased individuals of the first population and non-genetic data for the deceased individuals of the first population.
76. The tangible medium of claim 0, wherein the non-genetic data comprises a plurality of non- genetic traits.
77. The tangible medium of claim 0, wherein the plurality of non-genetic traits is selected from the group consisting of white blood cell count, red blood cell count, mean corpuscular volume, mean corpuscular hemoglobin, and platelet count.
78. The tangible medium of claim 0, further comprising receiving the mortality data from a preexisting database.
79. The tangible medium of claim 0, wherein the frailty index is a logarithm of a hazard ratio value.
80. The tangible medium of claim 0, wherein the hazard ratio value is ratio between a hazard rate of an individual and a mean hazard rate of a predetermined group of individuals.
81. The tangible medium of claim 0, wherein instmctions configured to generate the frailty prediction model comprises instructions configured to generate a coefficient for each of the plurality of SNPs, wherein the coefficient is indicative of an association with human frailty.
82. The tangible medium of claim 0, wherein instructions configured to generate the coefficient comprises instmctions for multiple linear regression analysis.
83. The tangible medium of claim 0, wherein the frailty indicator determined by the frailty prediction model corresponds to an explained variance of at least 1.5.
84. The tangible medium of claim 0, wherein instmctions configured to identify the plurality of SNPs comprises instmctions for a genome -wide association study (GWAS).
85. The tangible medium of claim 0, wherein instmctions configured to identify the plurality of
SNPs comprises instructions configured to identify a set of SNPs having a predetermined correlation with human frailty and to select a subset of SNPs from the set of SNPs.
86. The tangible medium of claim 0, wherein conditional and joint analysis is used to select for the subset of SNPs.
87. The tangible medium of claim 0, wherein the GW AS is configured to produce an effect allele frequency value of at most about 0.8.
88. A system for generating a frailty prediction model, comprising the tangible storage medium of any one of claims 0 to 0, and a processor configured to execute the instructions.
89. A method of determining frailty, comprising: receiving an input request for a frailty indicator; and generating the frailty indicator in response to a frailty prediction model based on a plurality of SNPs, wherein the plurality of SNPs comprises at least one of rs76207570, rs3811444, rs2250l27, rs9272588, rs559648l8, or rs4332427.
90. The method of claim 0, wherein the plurality of SNPs comprises at least one of rs76207570, rs3811444, rs22450l27, rs9272588, rs559648l8, and rs4332427.
91. The method of claim 0, wherein the plurality of SNPs further comprises at least one of rs674195l, rsl4376l991, rs3580H34, rs3465l, rs689l62l, rsl0947428, rs7808664, rsl3282l06, rs 10793962, rsl50080415, rsl l852372, rs3743445, rs9892942, rs7502233, rs4633l2, rs62217799, or rsl43728l89.
92. The method of claim 0, wherein the plurality of SNPs further comprises at least one of rs674195l, rsl4376l991, rs3580H34, rs3465l, rs689l62l, rsl0947428, rs7808664, rsl3282l06, rs 10793962, rsl50080415, rsl l852372, rs3743445, rs9892942, rs7502233, rs4633l2, rs62217799, and rs 143728189.
93. A method of determining frailty, comprising: receiving a plurality of coefficients for a plurality of SNPs; and generating a frailty indicator in response to a frailty prediction model based on the plurality of SNPs, wherein the plurality of SNPs comprises at least one of rs76207570, rs38l 1444, rs2250127, rs9272588, rs559648l8, or rs4332427.
94. The method of claim 0, wherein the plurality of SNPs comprises at least one of rs76207570, rs38H444, rs22450l27, rs9272588, rs559648l8, and rs4332427.
95. The method of claim 0, wherein the plurality of SNPs further comprises at least one of rs674195l, rsl4376l991, rs3580H34, rs3465l, rs689l62l, rsl0947428, rs7808664, rsl3282l06, rs 10793962, rsl50080415, rsl l852372, rs3743445, rs9892942, rs7502233, rs4633l2, rs622l7799 and rsl43728189.
96. The method of claim 0, wherein the plurality of SNPs further comprises at least one of rs674195l, rsl4376l991, rs3580H34, rs3465l, rs689l62l, rsl0947428, rs7808664, rsl3282l06, rs 10793962, rsl50080415, rsl l852372, rs3743445, rs9892942, rs7502233, rs4633l2, rs622l7799 or rs 143728189.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
RU2017142380 | 2018-02-12 | ||
RU2017142380 | 2018-02-12 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019156591A1 true WO2019156591A1 (en) | 2019-08-15 |
Family
ID=67549009
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/RU2018/050155 WO2019156591A1 (en) | 2018-02-12 | 2018-12-05 | Methods and systems for prediction of frailty background |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2019156591A1 (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070105109A1 (en) * | 2003-07-02 | 2007-05-10 | Geesaman Bard J | Sirt1 and genetic disorders |
EP2901345A2 (en) * | 2012-09-27 | 2015-08-05 | The Children's Mercy Hospital | System for genome analysis and genetic disease diagnosis |
WO2015168252A1 (en) * | 2014-04-29 | 2015-11-05 | The Johns Hopkins University | Mitochondrial dna copy number as a predictor of frailty, cardiovascular disease, diabetes, and all-cause mortality |
AU2009282172B2 (en) * | 2008-08-10 | 2016-06-02 | Kuakini Medical Center | Method of using FOXO3A polymorphisms and haplotypes to predict and promote healthy aging and longevity |
-
2018
- 2018-12-05 WO PCT/RU2018/050155 patent/WO2019156591A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070105109A1 (en) * | 2003-07-02 | 2007-05-10 | Geesaman Bard J | Sirt1 and genetic disorders |
AU2009282172B2 (en) * | 2008-08-10 | 2016-06-02 | Kuakini Medical Center | Method of using FOXO3A polymorphisms and haplotypes to predict and promote healthy aging and longevity |
EP2901345A2 (en) * | 2012-09-27 | 2015-08-05 | The Children's Mercy Hospital | System for genome analysis and genetic disease diagnosis |
WO2015168252A1 (en) * | 2014-04-29 | 2015-11-05 | The Johns Hopkins University | Mitochondrial dna copy number as a predictor of frailty, cardiovascular disease, diabetes, and all-cause mortality |
Non-Patent Citations (1)
Title |
---|
RALPH BURKHARDT ET AL.: "INTEGRATION OF GENOME-WIDE SNP DATA AND GENE EXPRESSION PROFILES REVEALS SIX NOVEL LOCI AND REGULATORY MECHANISMS FOR AMINO ACIDS AND ACYLCARNITINES IN WHOLE BLOOD", PLOS GENETICS, vol. 11, no. 9, 24 September 2015 (2015-09-24), pages 2 - 25, XP055629537, Retrieved from the Internet <URL:https://doi.org/10.1371/journal.pgen.1005510> * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6854272B2 (en) | Methods and treatments for non-invasive evaluation of gene mutations | |
JP7368483B2 (en) | An integrated machine learning framework for estimating homologous recombination defects | |
Jiang et al. | FetalQuant: deducing fractional fetal DNA concentration from massively parallel sequencing of DNA in maternal plasma | |
US11164655B2 (en) | Systems and methods for predicting homologous recombination deficiency status of a specimen | |
KR102665592B1 (en) | Methods and processes for non-invasive assessment of genetic variations | |
US20200027557A1 (en) | Multimodal modeling systems and methods for predicting and managing dementia risk for individuals | |
JP6971845B2 (en) | Methods and treatments for non-invasive assessment of genetic variation | |
JP6431769B2 (en) | Diagnostic process including experimental conditions as factors | |
ES2886508T3 (en) | Methods and procedures for the non-invasive evaluation of genetic variations | |
US20190065670A1 (en) | Predicting disease burden from genome variants | |
JP2015513392A5 (en) | ||
JP2021101629A5 (en) | ||
EP3588506B1 (en) | Systems and methods for genomic and genetic analysis | |
CN113053460A (en) | Systems and methods for genomic and genetic analysis | |
Zhao et al. | Calling small variants using universality with Bayes-factor-adjusted odds ratios | |
WO2019156591A1 (en) | Methods and systems for prediction of frailty background | |
Crockett et al. | Bioinformatics tools in clinical genomics | |
Emami et al. | Association Study of Over 200,000 Subjects Detects Novel Rare Variants, Functional Elements, and Polygenic Architecture of Prostate Cancer Susceptibility | |
Schlesner | Data Analysis in Genomic Medicine: Status, Challenges, and Developments |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18905349 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18905349 Country of ref document: EP Kind code of ref document: A1 |