WO2021083572A1 - Determining the sensitivity of skin to uv radiation - Google Patents
Determining the sensitivity of skin to uv radiation Download PDFInfo
- Publication number
- WO2021083572A1 WO2021083572A1 PCT/EP2020/075047 EP2020075047W WO2021083572A1 WO 2021083572 A1 WO2021083572 A1 WO 2021083572A1 EP 2020075047 W EP2020075047 W EP 2020075047W WO 2021083572 A1 WO2021083572 A1 WO 2021083572A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- genes
- cpg sites
- subject
- features
- rna expression
- Prior art date
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/154—Methylation markers
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/158—Expression markers
Definitions
- the present invention relates to determining the sensitivity of a subject's skin to UV radiation by analysing gene expression or epigenetic markers.
- the level of RNA expression of particular genes and/or the methylation level of particular CpG sites in a skin sample from the subject are analysed to accurately predict the minimal erythema dose (MED) of a subject's skin, which is an indicator of the UV sensitivity of the subject's skin.
- Methods of the invention can achieve a mean absolute error (MAE) as low as 7.85 mJ/cm 2 (gene expression) or 4.18 mJ/cm 2 (methylation).
- UV ultraviolet
- the sensitivity/tolerance of the skin to UV irradiation can vary widely between individuals. Stratifying individuals by the sensitivity of skin to UV can be useful for assessing their risk of skin damage and cancer, for determining appropriate UV protection strategies, and for determining appropriate doses of therapeutic UV (e.g. PUVA therapy).
- the Fitzpatrick phototyping scale categorises subjects as phototype l-VI based on their skin's complexion and propensity to tanning and burning in response to UV radiation (Fitzpatrick (1975) Soleil et ashamed, J Med Esthet. (2):33-34; Eilers et al (2013) Accuracy of Self- report in Assessing Fitzpatrick Skin Phototypes I through VI, JAMA Dermatol, 149(11):1289- 1294).
- This is a semi-quantitative and highly subjective way of predicting an individual's skin's sensitivity to UV and is generally only used when a very fast assessment is required.
- the sensitivity of the skin to UV radiation can vary significantly even between people with the same phenotypic skin type.
- MED minimal erythema dose
- MED minimal Erythema dose
- One MED corresponds to the lowest UV dose (measured in mJ/cm 2 ), which causes erythema (redness) or oedema (swelling) of the skin 24-48 hours after UV exposure. This is typically determined by irradiating several patches of skin with different doses of UV light and assessing 24 hours later which was the lowest dose causing erythema.
- This method has several disadvantages: the method requires the skin to be irradiated and burned (risking permanent skin and DNA damage); the precision of the method is low and depends on the doses tested; the method is subjective; the method is slow and requires repeated patient visits to a clinic; the method cannot be performed on skin already exposed to UV radiation.
- a method for predicting the minimal erythema dose (MED) of a subject's skin comprising: a) i) determining the RNA expression levels of at least two genes selected from the genes in Table 1 in a skin sample obtained from the subject, and ii) predicting the subject's MED based on the determined RNA expression levels using a machine learning model trained on data comprising known MEDs and corresponding known RNA expression levels of the at least two genes; b) i) determining the methylation levels of at least two CpG sites selected from the genes in Table 3 in a skin sample obtained from the subject, and ii) predicting the subject's MED based on the determined methylation levels using a machine learning model trained on data comprising known MEDs and corresponding known methylation levels of the at least two CpG sites; and/or c) i) determining the methylation level(s) and RNA expression level(s) of at least 2 features selected
- a computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out the following steps: a) i) inputting the RNA expression levels of at least two genes selected from the genes in Table 1 in a skin sample obtained from the subject; ii) predicting the subject's MED based on the determined RNA expression levels using a machine learning model trained on data comprising known MEDs and corresponding known RNA expression levels of the at least two genes; and iii) outputting the predicted MED; b) i) inputting the methylation levels of at least two CpG sites selected from the genes in Table 3 in a skin sample obtained from the subject; ii) predicting the subject's MED based on the determined methylation levels using a machine learning model trained on data comprising known MEDs and corresponding known methylation levels of the at least two CpG sites; and iii) outputting the predicted MED; and/or c)
- a method for preventing damage to a subject's skin by UV radiation comprising:
- a computer-readable medium comprising instructions which, when executed by a computer, cause the computer to carry out the following steps: a) i) inputting the RNA expression levels of at least two genes selected from the genes in Table 1 in a skin sample obtained from the subject; ii) predicting the subject's MED based on the determined RNA expression levels using a machine learning model trained on data comprising known MEDs and corresponding known RNA expression levels of the at least two genes, and iii) outputting the predicted MED; b) i) inputting the methylation levels of at least two CpG sites selected from the genes in Table 3 in a skin sample obtained from the subject; ii) predicting the subject's MED based on the determined methylation levels using a machine learning model trained on data comprising known MEDs and corresponding known methylation levels of the at least two CpG sites; and iii) outputting the predicted MED; and/or c) i) inputting
- Figure 1 shows the correlation between MED determined using the known, standardised method (x-axis) and predicted MED based on the RNA expression levels of the 20 genes in Table 2, set A (y-axis).
- the MAE of this model was 7.85 mJ/cm 2 .
- Figure 2 shows the correlation between MED determined using the known, standardised method (x-axis) and predicted MED (y-axis) based on the methylation levels of the 18 CpG sites in Table 4.
- the MAE of this model was 4.18 mJ/cm 2 .
- the present invention is based on the finding that the RNA expression level of particular genes and/or the methylation level of particular CpG sites can be used to determine the sensitivity of a subject's skin to UV radiation by predicting the MED of a subject's skin.
- the sensitivity of a subject's skin to UV radiation refers to how readily a subject's skin (including DNA) is damaged by UV radiation.
- the sensitivity of a subject's skin to UV radiation can be represented by a minimal erythema dose (MED).
- MED minimal erythema dose
- the “minimal erythema dose” is the minimal dose of UV (measured in mJ/cm 2 ) which results in perceptible erythema (redness) and/or oedema (swelling) of the skin 24 to 48 hours after exposure to UV radiation.
- the known, standardised method for determining MED is provided in ISO 2444:2010.
- UV radiation refers to electromagnetic radiation with a wavelength of from about 100 nm to about 400 nm, including UVC radiation (from about 100 to about 280nm), UVB radiation (from about 280 to about 315nm), and UVA radiation (from about 315nm to about 400nm). Subjects and sampling
- a subject refers to a human subject. In certain embodiments, the subject may be at least 16, 18, or 30 years old.
- the subject may be phototype l-VI on the Fitzpatrick scale (I meaning always burns, never tans; II meaning burns easily, then develops a light tan; III meaning burns moderately, then develops a light tan; IV meaning burns minimally to rarely, then develops a moderate tan; V meaning never burns, always develops a dark tan; VI meaning never burns, no noticeable change in appearance).
- the subject may be phototype l-IV on the Fitzpatrick scale.
- the subject's skin may have previously been exposed to UV radiation and may be tanned or burnt at the time of sampling.
- the subject had/has not taken anti-histamine or anti-inflammatory drugs within two weeks prior to skin sampling.
- a skin sample refers to a sample comprising skin cells.
- the skin cells may have been/may be obtained by harvesting the entire skin sample required from the individual.
- Harvesting a sample from the individual may be carried out using suction blistering, punch biopsy, shave biopsy or during any surgical procedure such as plastic surgery, lifting, grafting, or the like.
- the sample may have been/may be obtained by suction blistering.
- the skin cells may have been/may be obtained by culturing the skin cells using an in vitro method.
- Skin cells may have been/may be cultured from a small sample of skin cells harvested from an individual.
- the harvested human skin cells may have been/ may be grown in vitro in a vessel such as a petri dish in a medium or substrate that supplies essential nutrients.
- the skin samples may have been/may be obtained from the epidermis or dermis.
- the skin sample may comprise, consist, or consist essentially of epidermal cells and/or dermal cells.
- the skin sample may comprise, consist, or consist essentially of epidermal cells.
- the skin cells may comprise a mixture of harvested cells and cultured cells.
- RNA expression level of particular genes can be determined, for example, by RNA-Seq (e.g. using lllumina's ® TruSeq RNA Library Prep Kit and HiSeq system), RT- qPCR, SAGE, EST sequencing, or hybridisation-based methods such as microarrays.
- RNA-Seq e.g. using lllumina's ® TruSeq RNA Library Prep Kit and HiSeq system
- RT- qPCR e.g. using lllumina's ® TruSeq RNA Library Prep Kit and HiSeq system
- SAGE e.g. RNA sequencing
- hybridisation-based methods such as microarrays.
- the RNA expression level of a particular gene can be measured in transcripts per million (TPM). A value of x TPM means that for every 1 million RNA molecules in the sample, x came from the gene of interest.
- the inventors of the present invention identified 200 genes whose RNA expression levels individually exhibit a strong linear correlation with MED (see Table 1). The inventors of the present invention also identified sets of 18-20 genes whose RNA expression levels can be used to predict MED with high accuracy (see Table 2).
- RNA expression level of as few as 2 of the genes in Table 1 could be used to accurately predict MED.
- MAEs of less than about 25 mJ/cm 2 were consistently achieved when using from 2 to 200 genes selected from Table 1 to predict MED.
- the methods of the present invention may comprise determining the RNA expression level of at least 2 genes selected from Table 1 or 2.
- the methods of the present invention may comprise a step of determining the RNA expression level of from 2 to 200, 2 to 150, 2 to 100, 2 to 50, or 2 to 20 genes selected from Table 1 or 2.
- the genes may comprise or consist of: ENSG00000197978 and ENSG00000277060; ENSG00000197978 and ENSG00000100376; ENSG00000197978 and ENSG00000172799; ENSG00000197978 and ENSG00000166670; or ENSG00000172799 and ENSG00000159247.
- Such methods using at least 2 genes may achieve an absolute error of about 25 mJ/cm 2 or less, about 20 mJ/cm 2 or less, about 15 mJ/cm 2 or less, about 10 mJ/cm 2 or less, or about 5 mJ/cm 2 or less.
- the methods of the present invention may comprise determining the RNA expression level of at least 5 genes selected from Table 1 or 2. In certain embodiments, the methods of the present invention may comprise a step of determining the RNA expression level of from 5 to 200, 5 to 150, 5 to 100, 5 to 50, or 5 to 20 genes selected from Table 1 or 2. In certain embodiments, the genes may comprise or consist of: ENSG00000197978, ENSG00000277060, ENSG00000100376, ENSG00000166670, and
- ENSG00000166670 and ENSG00000172799; ENSG00000277060, ENSG00000100376,
- Such methods using at least 5 genes may achieve an absolute error of about 25 mJ/cm 2 or less, about 20 mJ/cm 2 or less, about 15 mJ/cm 2 or less, about 13 mJ/cm 2 or less, about 10 mJ/cm 2 or less, or about 5 mJ/cm 2 or less.
- the methods of the present invention may comprise determining the RNA expression level of at least 10 genes selected from Table 1 or 2. In certain embodiments, the methods of the present invention may comprise a step of determining the RNA expression level of from 10 to 200, 10 to 150, 10 to 100, 10 to 50, or 10 to 20 genes selected from Table 1 or 2. In certain embodiments, the genes may comprise or consist of:
- ENSG00000159247 and ENSG00000260075; ENSG00000197978, ENSG00000277060, ENSG00000166670, ENSG00000172799, ENSG00000106392, ENSG00000159247,
- Such methods using at least 10 genes may achieve an absolute error of about 25 mJ/cm 2 or less, about 20 mJ/cm 2 or less, about 15 mJ/cm 2 or less, about 10 mJ/cm 2 or less, or about 5 mJ/cm 2 or less.
- the methods of the present invention may comprise determining the RNA expression level of at least 18 genes selected from Table 1 or 2. In certain embodiments, the methods of the present invention may comprise a step of determining the RNA expression level of from 18 to 200, 18 to 150, 18 to 100, or 18 to 50 genes selected from Table 1 or 2. In certain embodiments, the genes may comprise or consist of 18 genes selected from set A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, or R in Table 2. In one example, the genes may comprise of consist of 18 genes selected from set A in Table 2.
- Such methods using at least 18 genes may achieve an absolute error of about 25 mJ/cm 2 or less, about 20 mJ/cm 2 or less, about 15 mJ/cm 2 or less, about 10 mJ/cm 2 or less, about 8 mJ/cm 2 or less, or about 5 mJ/cm 2 or less.
- the methods of the present invention may comprise determining the RNA expression level of at least 20 genes selected from Table 1 or 2. In certain embodiments, the methods of the present invention may comprise a step of determining the RNA expression level of from 20 to 200, 20 to 150, 20 to 100, or 20 to 50 genes selected from Table 1 or 2. In certain embodiments, the genes may comprise or consist of the genes provided in set A, D, G, H, N, O, P, or R in Table 2. In one example, the genes may comprise of consist of the genes provided in set A in Table 2.
- Such methods using at least 20 genes may achieve an absolute error of about 25 mJ/cm 2 or less, about 20 mJ/cm 2 or less, about 15 mJ/cm 2 or less, about 10 mJ/cm 2 or less, about 8 mJ/cm 2 or less, or about 5 mJ/cm 2 or less.
- Table 1 - 200 predictive genes :
- CpG site (also referred to as a CpG dinucleotide) is a cytosine nucleotide immediately followed by a guanine nucleotide in the 5' to 3' direction within a DNA molecule.
- CpG sites may be in coding or non-coding regions of the genome.
- CpG sites may be in CpG islands, which are regions having a high density of CpG sites.
- the cytosine in a CpG site can be methylated by DNA methyltransferases to become 5-methylcytosine. It is known that methylation of CpG sites within a gene can influence the transcriptional regulation and thus expression of the gene (epigenetic regulation).
- the methylation level of particular CpG sites can be determined, for example, by methylation specific PCR, sequence analysis of bisulfite treated DNA, CHIP-sequencing (lllumina Methylation BeadChip Technology), molecular inversion probe assay, Methyl-CAP- sequencing, Next-Generation-sequencing, COBRA-Assay, methylation specific restriction patterns, or MassARRAY assay.
- the methylation level of a particular gene can be represented by its M-value, which is the log2 ratio of the intensities of methylated probe versus unmethylated probe.
- M-value is the log2 ratio of the intensities of methylated probe versus unmethylated probe.
- the inventors of the present invention identified 200 specific CpG sites whose methylation levels individually exhibit a strong linear correlation with MED (see Table 3).
- the inventors of the present invention also identified sets of 18-20 CpG sites whose methylation levels can be used to predict MED with high accuracy (see Table 4).
- the methods of the present invention may comprise determining the methylation level of at least 2 CpG sites selected from Table 3 or 4. In certain embodiments, the methods of the present invention may comprise a step of determining the methylation level of from 2 to 200, 2 to 150, 2 to 100, 2 to 50, or 2 to 18 CpG sites selected from Table 3 or 4.
- the CpG sites may comprise or consist of: cg06376130 and cg03953789; cg18269134 and cg00613587; cg06376130 and cg01199135; cg03953789 and cg00492074; or cg06376130 cg18269134 (described further in Table 3).
- Such methods using at least 2 CpG sites may achieve an absolute error of about 20 mJ/cm 2 or less, about 15 mJ/cm 2 or less, about 10 mJ/cm 2 or less, or about 5 mJ/cm 2 or less.
- the methods of the present invention comprise determining the methylation level of at least 5 CpG sites selected from Table 3 or 4. In certain embodiments, the methods of the present invention comprise a step of determining the methylation level of from 5 to 200, 5, to 150, 5 to 100, 5 to 50, or 5 to 18 CpG sites selected from Table 3 or 4.
- the CpG sites may comprise or consist of: cg06376130, cg03953789, cg18269134, cg00613587, and cg01199135; cg06376130, eg 18269134, cg00613587, cg01199135, and cg00492074; cg03953789, cg00613587, cg01199135, cg00492074, and cg20271602; cg03953789, cg18269134, cg01199135, cg20707157, and cg22235661 ; or cg06376130, cg03953789, cg01199135, cg00492074, and cg10094916 (described further in Table 3).
- Such methods using at least 5 CpG sites may achieve an absolute error of about 20 mJ/cm 2 or less, about 15 mJ/cm 2 or less, about 12 mJ/cm 2 or less, about 10 mJ/cm 2 or less, or about 5 mJ/cm 2 or less.
- the methods of the present invention may comprise determining the methylation level of at least 10 CpG sites selected from T able 3 or 4. In certain embodiments, the methods of the present invention may comprise a step of determining the methylation level of from 10 to 200, 10 to 150, 10 to 100, 10 to 50, or 10 to 18 CpG sites selected from T able 3 or 4.
- the CpG sites may comprise or consist of: cg06376130, cg03953789, cg00613587, cg20707157, cg09218398, cg26096304, cg09174638, cg00492074, cg20271602, and cg22235661; cg03953789, cg18269134, cg00613587, cg01199135, cg20707157, cg00492074, cg20271602, cg22235661 , cg24688871, and cg10094916; cg06376130, cg03953789, cg01199135, cg20707157, cg09218398, cg00492074, cg20271602, cg22235661 , cg17972013, and cg15224600; or eg 18269134
- Such methods using at least 10 CpG sites may achieve an absolute error of about 20 mJ/cm 2 or less, about 15 mJ/cm 2 or less, about 10 mJ/cm 2 or less, about 7 mJ/cm 2 or less, or about 5 mJ/cm 2 or less.
- the methods of the present invention may comprise determining the methylation level of at least 18 CpG sites selected from T able 3 or 4. In certain embodiments, the methods of the present invention may comprise a step of determining the methylation level of from 18 to 200, 18 to 150, 18 to 100, or 18 to 50 CpG sites selected from Table 3 or 4. In certain embodiments, the CpG sites may be selected from the CpG sites provided in set A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, or R in Table 4. In one example, the CpG sites may comprise of consist of the 18 CpG sites provided in set A in Table 4.
- Such methods using at least 18 CpG sites may achieve an absolute error of about 20 mJ/cm 2 or less, about 15 mJ/cm 2 or less, about 10 mJ/cm 2 or less, or about 5 mJ/cm 2 or less.
- the methods of the present invention may comprise determining the methylation level of at least 20 CpG sites selected from Table 3 or 4. In certain embodiments, the methods of the present invention may comprise a step of determining the methylation level of from 20 to 200, 20 to 150, 20 to 100, or 20 to 50 CpG sites selected from Table 3 or 4. In certain embodiments, the CpG sites may comprise or consist of the CpG sites provided in set B, C, D, E, G, H, I, J, K, N, O, P, or R in Table 4.
- Such methods using at least 18 CpG sites may achieve an absolute error of about 20 mJ/cm 2 or less, about 15 mJ/cm 2 or less, about 10 mJ/cm 2 or less, or about 5 mJ/cm 2 or less.
- Table 3 - 200 predictive CpG sites Table 4 -sets of predictive CpG sites:
- RNA expression levels of genes and methylation levels of CpG sites can be used to predict MED.
- feature refers to a gene or CpG site. Accordingly, “features” refers to a plurality of genes and/or CpG sites.
- the inventors identified 200 features (including genes and CpG sites) whose RNA expression/methylation levels (as appropriate) individually exhibit a strong linear correlation with MED.
- the inventors also identified sets of 18-20 features (including genes and CpG sites) whose RNA expression/methylation levels can be used to predict MED with high accuracy.
- RNA expression level of 1 gene in Table 1 and the methylation level of 1 CpG site in Table 3 could be used to predict MED accurately.
- MAEs of less than about 25 mJ/cm 2 were consistently achieved when MED was predicted using from 2 to 200 features selected from Tables 1 and 3, wherein the features comprise at least one gene and at least one CpG site.
- the methods of the present invention may comprise determining the RNA expression level of at least one gene selected from Table 1 or 2 and the methylation level of at least one CpG site selected from Table 3 or 4.
- the methods of the present invention may comprise determining the RNA expression or methylation levels (as appropriate) of at least 2 features selected from Tables 1-4, wherein the features comprise at least one gene and at least one CpG site.
- the methods of the present invention may comprise a step of determining the RNA expression or methylation levels (as appropriate) of from 2 to 200, 2 to 150, 2 to 100, 2 to 50, 2 to 20, or 2 to 18 features selected from Tables 1-4, wherein the features comprise at least one gene and at least one CpG site.
- the features may comprise or consist of: ENSG00000197978 and cg06376130; ENSG00000172799 and cg00613587;
- Such methods using 2 features may achieve an absolute error of about 25 mJ/cm 2 or less, 20 mJ/cm 2 or less, about 15 mJ/cm 2 or less, about 10 mJ/cm 2 or less, or about 5 mJ/cm 2 or less.
- methods of the present invention may comprise determining the RNA expression or methylation levels (as appropriate) of at least 5 features selected from Tables 1-4, wherein the features comprise at least one gene and at least one CpG site.
- the methods of the present invention may comprise a step of determining the RNA expression or methylation levels (as appropriate) of from 5 to 200, 5 to 150, 5 to 100, 5 to 50, 5 to 20, or 5 to 18 features selected from Tables 1-4, wherein the features comprise at least one gene and at least one CpG site.
- the features may comprise or consist of: ENSG00000197978, ENSG00000277060, cg06376130, eg 18269134, and cg00613587; ENSG00000197978, cg06376130, cg00613587, cg03953789, and cg00492074; ENSG00000277060, cg03953789, cg18269134, cg01199135, and cg00492074; ENSG00000159247, ENSG00000100376, ENSG00000172799, cg06376130, and eg 18269134; or ENSG00000100376, cg06376130, cg18269134, cg00613587, and cg01199135.
- Such methods using 5 features may achieve an absolute error of about 25 mJ/cm 2 or less, 20 mJ/cm 2 or less, about 15 mJ/cm 2 or less, about 10 mJ/cm 2 or less, or about 5 mJ/cm 2 or less.
- methods of the present invention may comprise determining the RNA expression or methylation levels (as appropriate) of at least 10 features selected from Tables 1-4, wherein the features comprise at least one gene and at least one CpG site.
- the methods of the present invention may comprise a step of determining the RNA expression or methylation levels (as appropriate) of from 10 to 200, 10 to 150, 10 to 100, 10 to 50, 10 to 20, or 10 to 18 features selected from T ables 1 -4, wherein the features comprise at least one gene and at least one CpG site.
- the features may comprise or consist of: ENSG00000197978, ENSG00000277060, ENSG00000100376, ENSG00000166670, ENSG00000172799, cg03953789, cg18269134, cg00613587, cg09218398, and cg10094916; ENSG00000197978, ENSG00000166670, cg06376130, cg00613587, cg26096304, cg20707157, cg20271602, cg22235661, cg24688871, and cg09174638; ENSG00000277060, cg06376130, cg03953789, cg18269134, cg01199135, cg20707157, cg09218398, cg00492074, cg20271602, and cg22235661; or ENSG
- Such methods using 10 features may achieve an absolute error of about 20 mJ/cm 2 or less, about 15 mJ/cm 2 or less, about 10 mJ/cm 2 or less, or about 5 mJ/cm 2 or less, or about 3 mJ/cm 2 or less.
- methods of the present invention may comprise determining the RNA expression or methylation levels (as appropriate) of at least 18 features selected from Tables 1-4, wherein the features comprise at least one gene and at least one CpG site.
- the methods of the present invention may comprise a step of determining the RNA expression or methylation levels (as appropriate) of from 18 to 200, 18 to 150, 18 to 100, or 18 to 50 features selected from Tables 1-4, wherein the features comprise at least one gene and at least one CpG site.
- the CpG sites may comprise or consist of the features provided in sets A, D, E, F, H, or I of Table 5.
- Such methods using 18 features may achieve an absolute error of about 15 mJ/cm 2 or less, about 10 mJ/cm 2 or less, or about 5 mJ/cm 2 or less, or about 3 mJ/cm 2 or less.
- methods of the present invention may comprise determining the RNA expression or methylation levels (as appropriate) of at least 20 features selected from Tables 1-4, wherein the features comprise at least one gene and at least one CpG site.
- the methods of the present invention may comprise a step of determining the RNA expression or methylation levels (as appropriate) of from 20 to 200, 20 to 150, 20 to 100, or 20 to 50 features selected from Tables 1-4, wherein the features comprise at least one gene and at least one CpG site.
- the CpG sites may comprise or consist of the features provided in sets B, C, G, or J of T able 5. Such methods using 20 features may achieve an absolute error of about 15 mJ/cm 2 or less, about 10 mJ/cm 2 or less, or about 5 mJ/cm 2 or less, or about 3 mJ/cm 2 or less.
- the method may further comprise a step of predicting the subject's MED based on the determined RNA expression levels using a machine learning model, wherein the machine learning model has been trained on data comprising known MEDs and corresponding known RNA expression levels of the same genes.
- the method may further comprise a step of predicting the subject's MED based on the determined methylation levels using a machine learning model, wherein the machine learning model has been trained on data comprising known MEDs and corresponding known methylation levels of the same CpG sites.
- the methods of the present invention may comprise a step of training a machine learning model on data comprising known MEDs and corresponding known methylation levels of the same CpG sites.
- the method may further comprise a step of predicting the subject's MED based on the determined RNA expression and methylation levels using a machine learning model, wherein the machine learning model has been trained on data comprising known MEDs and corresponding known RNA expression and
- the methods of the present invention may comprise a step of training a machine learning model on data comprising known MEDs and corresponding known RNA expression and methylation levels of the same features.
- RNA expression and/or methylation levels and MEDs means RNA expression and/or methylation levels and MEDs determined from the same subject.
- known means previously determined.
- a “known MED and corresponding known RNA expression level” means an MED determined on a subject's skin and an RNA expression level determined using a skin sample from the same subject.
- the known RNA expression levels, the known methylation levels, and/or the known MEDs are derived from at least 5, 10, 15, 20, 30, or 32 subjects.
- the known RNA expression levels, the known methylation levels, and/or the known MEDs are derived from at least 20 subjects.
- the known RNA expression levels, the known methylation levels, and/or the known MEDs are derived from at least 32 subjects.
- Known MEDs may have been determined by any known method for determining MED, for example using the standardised method provided in ISO 2444:2010.
- Machine learning models may be used to determine predictive feature sets.
- the machine learning model may be a supervised learning model, for example a support vector machine (SVM).
- SVM support vector machine
- the machine learning model may perform sequential backward selection (SBS; also referred to as sequential feature elimination), sequential forward selection (SFS), exhaustive search, random search, or search using genetic algorithms.
- SBS sequential backward selection
- FSS sequential forward selection
- exhaustive search random search, or search using genetic algorithms.
- the machine learning model may use a regression model, for example a Lasso regression model, a general linear model, a Lasso/ridge regression and elastic nets, decision trees, random forests, gradient boosting, or deep learning and neural networks.
- a regression model for example a Lasso regression model, a general linear model, a Lasso/ridge regression and elastic nets, decision trees, random forests, gradient boosting, or deep learning and neural networks.
- absolute error refers to the difference between a subject's MED determined by the known, standardised method and the subject's MED predicted using a method of the invention.
- MAE mean absolute error
- the absolute error of a predicted MED may be about 25 mJ/cm 2 or less, about 20 mJ/cm 2 or less, about 15 mJ/cm 2 or less, about 10 mJ/cm 2 or less, or about 5 mJ/cm 2 or less.
- this information may be used to provide recommendations that are personalised to the subject.
- the methods of the invention may comprise a step of determining the maximum dose of UV radiation the subject can be exposed to before experiencing negative effects thereof.
- the negative effects may include skin damage (for example tanning, burning, premature aging) or DNA damage.
- the methods of the invention may comprise a step of determining which UV protection substances or strategies are appropriate for the subject, for example sunscreens with particular SPFs or sunlight avoidance.
- the methods of the invention may comprise a step of determining the minimum or optimal dose of UV radiation the subject should be exposed to in order to experience positive effects thereof.
- the positive effects may include vitamin D synthesis or treatment of a disease or condition.
- the disease or condition may be selected from the list consisting of: vitamin D deficiency; eczema; acne; psoriasis; graft-versus-host disease; vitiligo; mycosis fungoides; large-plaque parapsoriasis; and cutaneous T-cell lymphoma.
- the treatment may be in the presence of a psoralen (i.e. PUVA therapy).
- the methods of the invention may comprise a step of administering an effective amount of a UV protectant to the subject.
- an effective amount means an amount effective to prevent or reduce damage to the subject's skin by UV radiation, including DNA damage.
- a “UV protection substance” or “UV protectant” may be a chemical absorber (i.e. an organic chemical compound that absorbs UV light, for example salicalate, cinnimate, or benzophenone) or a physical blocker (i.e. inorganic particulates that reflect, scatter, or absorb UV light, for example Titanium Dioxide or Zinc Oxide).
- a chemical absorber i.e. an organic chemical compound that absorbs UV light, for example salicalate, cinnimate, or benzophenone
- a physical blocker i.e. inorganic particulates that reflect, scatter, or absorb UV light, for example Titanium Dioxide or Zinc Oxide
- a method for predicting the minimal erythema dose (MED) of a subject's skin comprising: i. determining the RNA expression levels of at least 2 genes selected from the genes in Table 1 in a skin sample obtained from the subject, and ii. predicting the subject's MED based on the determined RNA expression levels using a machine learning model trained on data comprising known MEDs and corresponding known RNA expression levels of the at least two genes.
- MED minimal erythema dose
- a method for preventing damage to a subject's skin by UV radiation comprising: i. receiving or obtaining a skin sample from a subject; ii. determining the RNA expression levels of at least 2 genes selected from the genes in Table 1 in a skin sample obtained from the subject; iii. predicting the subject's MED based on the determined RNA expression levels using a machine learning model trained on data comprising known MEDs and corresponding known RNA expression levels of the at least two genes; and iv. administering an effective amount of a UV protectant to the subject.
- genes comprise: i. ENSG00000197978 and ENSG00000277060; ii. ENSG00000197978 and ENSG00000100376; iii. ENSG00000197978 and ENSG00000172799; iv. ENSG00000197978 and ENSG00000166670; or v. ENSG00000172799 and ENSG00000159247.
- the genes comprise: i. ENSG00000197978, ENSG00000277060, ENSG00000100376, ENSG00000166670, and ENSG00000172799; ii.
- ENSG00000197978 ENSG00000159247, ENSG00000100376, ENSG00000166670, and ENSG00000172799; iii. ENSG00000277060, ENSG00000100376, ENSG00000166670, ENSG00000172799, and ENSG00000159247; iv. ENSG00000197978, ENSG00000100376, ENSG00000166670, ENSG00000106392, and ENSG00000159247; or v. ENSG00000197978, ENSG00000100376, ENSG00000172799, ENSG00000159247, and ENSG00000260075.
- ENSG00000277060 ENSG00000166670, ENSG00000106392, ENSG00000159247, ENSG00000100376, ENSG00000260075, ENSG00000169282, ENSG00000126890, ENSG00000176933, and ENSG00000134184.
- genes comprise the genes in set A, B, C, D, E, F, G, H, I , J, K L, M, N, O, P, Q, or R in Table 2.
- genes comprise the genes in set A in Table 2. 16. The method of any preceding paragraph, wherein the genes comprise up to 200,150, 100, 50, or 20 genes.
- the sample comprises, consists, or consists essentially of epidermis cells and/or dermis cells.
- machine learning model is a support vector machine (SVM).
- a computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out the following steps: i. inputting the RNA expression levels of at least two genes selected from the genes in Table 1 in a skin sample obtained from the subject; ii. predicting the subject's MED based on the determined RNA expression levels using a machine learning model trained on data comprising known MEDs and corresponding known RNA expression levels of the at least two genes; and iii. outputting the predicted MED.
- the at least 2 genes are at least 5 genes.
- ENSG00000166670 and ENSG00000172799; ii. ENSG00000197978, ENSG00000159247, ENSG00000100376,
- ENSG00000166670 and ENSG00000172799; iii. ENSG00000277060, ENSG00000100376, ENSG00000166670,
- ENSG00000106392 and ENSG00000159247; or v. ENSG00000197978, ENSG00000100376, ENSG00000172799,
- ENSG00000112139 ENSG00000130779, ENSG00000159247, and ENSG00000260075; ii. ENSG00000197978, ENSG00000277060, ENSG00000166670,
- ENSG00000115602 ENSG00000112139, ENSG00000224472, and ENSG00000233913; or iv. ENSG00000277060, ENSG00000166670, ENSG00000106392,
- ENSG00000169282 ENSG00000126890, ENSG00000176933, and ENSG00000134184.
- a computer-readable medium comprising the computer program of any of paragraphs 22-41.
- a method for predicting the minimal erythema dose (MED) of a subject's skin comprising: i. determining the methylation levels of at least 2 CpG sites selected from the CpG sites in Table 3 in a skin sample obtained from the subject, and ii. predicting the subject's MED based on the determined methylation levels using a machine learning model trained on data comprising known MEDs and corresponding known methylation levels of the at least two CpG sites.
- MED minimal erythema dose
- a method for preventing damage to a subject's skin by UV radiation comprising: i. receiving or obtaining a skin sample from a subject; ii. determining the methylation levels of at least 2 CpG sites selected from the CpG sites in Table 3 in a skin sample obtained from the subject; iii. predicting the subject's MED based on the determined methylation levels using a machine learning model trained on data comprising known MEDs and corresponding known methylation levels of the at least two CpG sites; and iv. administering an effective amount of a UV protectant to the subject.
- cg06376130 cg03953789, cg01199135, cg20707157, cg09218398, cg00492074, cg20271602, cg22235661, cg17972013, and cg15224600; or iv. eg 18269134, cg00613587, cg01199135, cg20707157, cg00492074, cg20271602, cg22235661, cg24688871, and cg10094916.
- CpG sites comprise up to 200, 150, 100, 50, 20, or 18 CpG sites.
- the sample comprises, consists, or consists essentially of epidermis cells and/or dermis cells.
- machine learning model is a support vector machine (SVM).
- a computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out the following steps: i. inputting the methylation levels of at least two CpG sites selected from the CpG sites in Table 3 in a skin sample obtained from the subject; ii. predicting the subject's MED based on the determined methylation levels using a machine learning model trained on data comprising known MEDs and corresponding known methylation levels of the at least two CpG sites; and iii. outputting the predicted MED.
- cg06376130 cg03953789, cg01199135, cg20707157, cg09218398, cg00492074, cg20271602, cg22235661, cg17972013, and cg15224600; or iv. eg 18269134, cg00613587, cg01199135, cg20707157, cg00492074, cg20271602, cg22235661, cg24688871, and cg10094916.
- a computer-readable medium comprising the computer program of any of paragraphs 66-85.
- a method for predicting the minimal erythema dose (MED) of a subject's skin comprising: i. determining the methylation level(s) and RNA expression level(s) of at least 2 features selected from the CpG sites in Table 3 and the genes in Table 1 in a skin sample obtained from the subject, wherein the features comprise at least one CpG site in Table 3 and at least one gene in Table 1, and ii. predicting the subject's MED based on the determined methylation level(s) and RNA expression level(s) using a machine learning model trained on data comprising known MEDs and corresponding known methylation level(s) and RNA expression level(s) of the at least 2 features.
- MED minimal erythema dose
- a method for preventing damage to a subject's skin by UV radiation comprising: i. receiving or obtaining a skin sample from a subject; ii. determining the methylation level(s) and RNA expression level(s) of at least 2 features selected from the CpG sites in Table 3 and the genes in Table 1 in a skin sample obtained from the subject, wherein the features comprise at least one CpG site in Table 3 and at least one gene in Table 1; iii.
- ENSG00000159247 ENSG00000100376, ENSG00000172799, cg06376130, and eg 18269134; or v. ENSG00000100376, cg06376130, cg18269134, cg00613587, and cg01199135.
- ENSG00000166670 ENSG00000172799, cg03953789, cg18269134, cg00613587, cg09218398, and eg 10094916; ii. ENSG00000197978, ENSG00000166670, cg06376130, cg00613587, cg26096304, cg20707157, cg20271602, cg22235661, cg24688871, and cg09174638; iii.
- ENSG00000277060 cg06376130, cg03953789, cg18269134, cg01199135, cg20707157, cg09218398, cg00492074, cg20271602, and cg22235661; or iv. ENSG00000197978, ENSG00000277060, ENSG00000172799,
- any of paragraphs 87-106 further comprising: i. determining the maximum dose of UV radiation the subject can be exposed to before experiencing negative effects thereof; ii. determining an appropriate UV protection substance or strategy for the subject; iii. determining the minimum dose of UV radiation the subject should be exposed to in order to experience positive effects thereof; and/or iv. determining the appropriate UV dose for treating a disease or condition susceptible to UV therapy in the subject.
- a computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out the following steps: i. inputting the methylation level(s) and RNA expression level(s) of at least 2 features selected from the CpG sites in Table 3 and the genes in Table 1 in a skin sample obtained from the subject, wherein the features comprise at least one CpG site in Table 3 and at least one gene in Table 1, ii. predicting the subject's MED based on the determined methylation level(s) and RNA expression level(s) using a machine learning model trained on data comprising known MEDs and corresponding known methylation level(s) and RNA expression level(s) of the at least 2 features, and iii. outputting the predicted MED.
- ENSG00000159247 ENSG00000100376, ENSG00000172799, cg06376130, and eg 18269134; or v. ENSG00000100376, cg06376130, cg18269134, cg00613587, and cg01199135.
- ENSG00000197978 ENSG00000166670, cg06376130, cg00613587, cg26096304, cg20707157, cg20271602, cg22235661, cg24688871, and cg09174638; iii. ENSG00000277060, cg06376130, cg03953789, cg18269134, cg01199135, cg20707157, cg09218398, cg00492074, cg20271602, and cg22235661; or iv.
- the present invention may comprise any combination of features and/to limitations referred to herein, except for combinations of such features which are mutually exclusive.
- the foregoing description is directed to particular embodiments of the present invention for the purpose of illustrating it. It will be apparent, however, to one skilled in the art, that many modifications and variations to the embodiments described herein are possible. All such modifications and variations are intended to be within the scope of the present invention, as defined in the appended claims.
- MED was determined as previously described (Heckman et al (2013) Minimal Erythema Dose (MED) Testing, J Vis Exp. (75): 50175) and outlined below. This method has been standardised as International Standard ISO 2444:2010.
- the study sites were located on the subjects' lower backs since this area is rarely exposed to sunlight. The sites were split into control and test areas.
- the first irradiation of the test sites was performed using a SOL 500 full spectrum solar simulator (Honle UV Technology). Intensities were chosen individually to reach 0.9 MED for all subjects (i.e. 90% of the MED in a given test subject).
- RNA and DNA samples were used for transcriptome sequencing and methylation profiling, respectively, as described below.
- Transcriptome libraries were prepared using TruSeq Library Prep Kit (lllumina ® ) and sequencing performed at 1x50 bp on lllumina's ® HiSeq system to a final sequencing depth of 100 million reads per sample.
- Sequencing data was processed using a custom pipeline including Fastqc vO.11.767 for quality control, Trimmomatic v0.3668 for trimming and Salmon v0.8.169 for read mapping and quantification.
- Methylation profiling was performed using lllumina's ® Infinium Methylation EPIC arrays (Love et al. (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2, Genome Biol. 15:550).
- Methylation data was processed using the minfi package (Aryee (2014) Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays, Bioinformatics 30:1363-1369) in R. Normalization was carried out using the funnorm normalization method.
- Sequential backwards selection also referred to as sequential feature elimination was used in a support vector model (SVM) to reduce the sets of 2000 predictive features to sets of fewer features, for example 20 features.
- SVM support vector model
- Parameters to be set for SBS are the maximum number of features to retain and a threshold parameter determining the minimal value of improvement needed for a feature to be eliminated from the model.
- the inventors' models did not use the first parameter, but the second parameter was set to 0.01.
- Model predictions and accuracy scores were extracted from leave-one-out-cross- validation (LOOCV) to avoid overfitting.
- the feature selection and training included data from both test and control (irradiated and non-irradiated) samples, in order to allow accurate MED predictions irrespective of previous UV exposure of the sample.
- This study recruited 32 healthy female subjects belonging to Fitzpatrick phototypes I to IV (12 subjects belonging to phototype I and II, 10 to phototype III; and 10 to phototype IV). The subjects were aged between 30 and 65 years, with homogeneous age distributions in each phototype group.
- MED values ranged from approximately 50 mJ/cm 2 to approximately 210 mJ/cm 2 . As expected from previously published data, stratification of donors using the Fitzpatrick classification system was an inaccurate predictor of MED. For example, the measured MED values for the subjects of phototype IV varied from 99.7 to 210.4 mJ/cm 2 .
- a machine learning model was trained on data comprising (i) the MEDs and (ii) the corresponding RNA expression levels of the 200 genes in Table 1, from all 32 subjects. This model was then used to predict MED based on RNA expression levels of the 200 genes in Table 1 in skin samples obtained from subjects.
- the set of 20 genes in set A in T able 2 achieved a very low MAE of 7.85 mJ/cm 2 (see Figure 1).
- ENSG00000112139, ENSG00000130779, ENSG00000159247, and ENSG00000260075 achieved an MAE of 9.71 mJ/cm 2 .
- ENSG00000277060 (NLRP2)
- ENSG00000100376 (FAM118A)
- the set of 2 genes ENSG00000197978 and ENSG00000277060 achieved an MAE of 19.62 mJ/cm2.
- Example 3b - predicting MED using 200 CpG sites [128] A machine learning model was trained on data comprising (i) the MEDs and (ii) the corresponding methylation levels of the 200 CpG sites in Table 3, from all 32 subjects. This model was then used to predict MED based on methylation levels of the 200 CpG sites in Table 3 in skin samples obtained from subjects.
- the set of 5 CpG sites cg06376130, cg03953789, cg18269134, cg00613587, and cg01199135 achieved an MAE of 11.01 mJ/cm 2 .
- the set of 2 CpG sites cg06376130 and cg03953789 achieved an MAE of 15.58 mJ/cm 2 .
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- Wood Science & Technology (AREA)
- Analytical Chemistry (AREA)
- Zoology (AREA)
- Genetics & Genomics (AREA)
- Engineering & Computer Science (AREA)
- Pathology (AREA)
- Immunology (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Physics & Mathematics (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Apparatus Associated With Microorganisms And Enzymes (AREA)
Abstract
The present invention concerns methods for predicting MED of a subject's skin, comprising either determining the methylation levels of at least 2 CpG sites selected from certain genes in a skin sample obtained from the subject, and predicting the subject's MED based on the determined methylation levels using a machine learning model trained on data comprising known MEDs and corresponding known methylation levels of the at least 2 CpG sites, and/or determining the RNA expression levels of at least 2 genes selected from certain genes in a skin sample obtained from the subject, and predicting the subject's MED based on the determined RNA expression levels using a machine learning model trained on data comprising known MEDs and corresponding known RNA expression levels of the at least 2 genes, and/or determining the methylation level(s) and RNA expression level(s) of at least 2 features selected from the CpG sites in Table 3 and the genes in Table 1 in a skin sample obtained from the subject, wherein the features comprise at least one CpG site in Table 3 and at least one gene in Table 1, and predicting the subject's MED based on the determined methylation level(s) and RNA expression level(s) using a machine learning model trained on data comprising known MEDs and corresponding known methylation level(s) and RNA expression level(s) of the at least 2 features. The invention further relates to computer programs for carrying out the methods of the invention.
Description
DETERMINING THE SENSITIVITY OF SKIN TO UV RADIATION
FIELD OF THE INVENTION
[01] The present invention relates to determining the sensitivity of a subject's skin to UV radiation by analysing gene expression or epigenetic markers. The level of RNA expression of particular genes and/or the methylation level of particular CpG sites in a skin sample from the subject are analysed to accurately predict the minimal erythema dose (MED) of a subject's skin, which is an indicator of the UV sensitivity of the subject's skin. Methods of the invention can achieve a mean absolute error (MAE) as low as 7.85 mJ/cm2 (gene expression) or 4.18 mJ/cm2 (methylation).
BACKGROUND OF THE INVENTION
[02] It is known that exposure of the skin to radiation from the ultraviolet (UV) region of the light spectrum can have harmful effects on the skin, including permanent skin damage, discoloration, and premature aging, as well as DNA damage, which can lead to the development of skin cancer.
[03] The sensitivity/tolerance of the skin to UV irradiation can vary widely between individuals. Stratifying individuals by the sensitivity of skin to UV can be useful for assessing their risk of skin damage and cancer, for determining appropriate UV protection strategies, and for determining appropriate doses of therapeutic UV (e.g. PUVA therapy).
[04] The Fitzpatrick phototyping scale categorises subjects as phototype l-VI based on their skin's complexion and propensity to tanning and burning in response to UV radiation (Fitzpatrick (1975) Soleil et peau, J Med Esthet. (2):33-34; Eilers et al (2013) Accuracy of Self- report in Assessing Fitzpatrick Skin Phototypes I through VI, JAMA Dermatol, 149(11):1289- 1294). This is a semi-quantitative and highly subjective way of predicting an individual's skin's sensitivity to UV and is generally only used when a very fast assessment is required. Moreover, the sensitivity of the skin to UV radiation can vary significantly even between people with the same phenotypic skin type.
[05] The current gold-standard way of determining the sensitivity of an individual's skin to UV irradiation is measuring their minimal erythema dose (MED) (Heckman et al (2013) Minimal Erythema Dose (MED) Testing, J Vis Exp. (75): 50175). One MED corresponds to the lowest UV dose (measured in mJ/cm2), which causes erythema (redness) or oedema (swelling) of the skin 24-48 hours after UV exposure. This is typically determined by irradiating several patches of skin with different doses of UV light and assessing 24 hours later which was the lowest dose causing erythema.
[06] This method has several disadvantages: the method requires the skin to be irradiated and burned (risking permanent skin and DNA damage); the precision of the method is low and depends on the doses tested; the method is subjective; the method is slow and requires repeated patient visits to a clinic; the method cannot be performed on skin already exposed to UV radiation.
[07] So far, there is no precise way to determine the tolerance of an individual's skin to UV radiation without irradiating the skin. It is desirable to provide a method for determining an individual's tolerance/sensitivity to UV radiation without prior irradiation.
SUMMARY OF THE INVENTION
[08] The present invention is defined in the appended claims.
[09] In accordance with a first aspect, there is provided a method for predicting the minimal erythema dose (MED) of a subject's skin comprising: a) i) determining the RNA expression levels of at least two genes selected from the genes in Table 1 in a skin sample obtained from the subject, and ii) predicting the subject's MED based on the determined RNA expression levels using a machine learning model trained on data comprising known MEDs and corresponding known RNA expression levels of the at least two genes; b) i) determining the methylation levels of at least two CpG sites selected from the genes in Table 3 in a skin sample obtained from the subject, and ii) predicting the subject's MED based on the determined methylation levels using a machine learning model trained on data comprising known MEDs and corresponding known methylation levels of the at least two CpG sites; and/or c) i) determining the methylation level(s) and RNA expression level(s) of at least 2 features selected from the CpG sites in Table 3 and the genes in Table 1 in a skin sample obtained from the subject, wherein the features comprise at least one CpG site in Table 3 and at least one gene in Table 1, and ii) predicting the subject's MED based on the determined methylation level(s) and RNA expression level(s) using a machine learning model trained on data comprising known MEDs and corresponding known methylation level(s) and RNA expression level(s) of the at least 2 features.
[10] In accordance with a second aspect, there is provided a computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out the following steps: a) i) inputting the RNA expression levels of at least two genes selected from the genes in Table 1 in a skin sample obtained from the subject;
ii) predicting the subject's MED based on the determined RNA expression levels using a machine learning model trained on data comprising known MEDs and corresponding known RNA expression levels of the at least two genes; and iii) outputting the predicted MED; b) i) inputting the methylation levels of at least two CpG sites selected from the genes in Table 3 in a skin sample obtained from the subject; ii) predicting the subject's MED based on the determined methylation levels using a machine learning model trained on data comprising known MEDs and corresponding known methylation levels of the at least two CpG sites; and iii) outputting the predicted MED; and/or c) i) inputting the methylation level(s) and RNA expression level(s) of at least 2 features selected from the CpG sites in Table 3 and the genes in Table 1 in a skin sample obtained from the subject, wherein the features comprise at least one CpG site in Table 3 and at least one gene in Table 1, ii) predicting the subject's MED based on the determined methylation level(s) and RNA expression level(s) using a machine learning model trained on data comprising known MEDs and corresponding known methylation level(s) and RNA expression level(s) of the at least 2 features, and iii) outputting the predicted MED.
[11] In a further aspect, there is provided a method for preventing damage to a subject's skin by UV radiation, the method comprising:
A) receiving or obtaining a skin sample from a subject;
B) i) a) determining the RNA expression levels of at least two genes selected from the genes in Table 1 in a skin sample obtained from the subject; and b) predicting the subject's MED based on the determined RNA expression levels using a machine learning model trained on data comprising known MEDs and corresponding known RNA expression levels of the at least two genes; ii) a) determining the methylation levels of at least two CpG sites selected from the genes in Table 3 in a skin sample obtained from the subject; and b) predicting the subject's MED based on the determined methylation levels using a machine learning model trained on data comprising known MEDs and corresponding known methylation levels of the at least two CpG sites; and/or iii) a) determining the methylation level(s) and RNA expression level(s) of at least 2 features selected from the CpG sites in Table 3 and the genes in Table 1 in a skin sample obtained from the subject, wherein the features comprise at least one CpG site in Table 3 and at least one gene in Table 1, and
b) predicting the subject's MED based on the determined methylation level(s) and RNA expression level(s) using a machine learning model trained on data comprising known MEDs and corresponding known methylation level(s) and RNA expression level(s) of the at least 2 features, and C) administering an effective amount of a UV protectant to the subject.
[12] In a further aspect, there is provided a computer-readable medium comprising instructions which, when executed by a computer, cause the computer to carry out the following steps: a) i) inputting the RNA expression levels of at least two genes selected from the genes in Table 1 in a skin sample obtained from the subject; ii) predicting the subject's MED based on the determined RNA expression levels using a machine learning model trained on data comprising known MEDs and corresponding known RNA expression levels of the at least two genes, and iii) outputting the predicted MED; b) i) inputting the methylation levels of at least two CpG sites selected from the genes in Table 3 in a skin sample obtained from the subject; ii) predicting the subject's MED based on the determined methylation levels using a machine learning model trained on data comprising known MEDs and corresponding known methylation levels of the at least two CpG sites; and iii) outputting the predicted MED; and/or c) i) inputting the methylation level(s) and RNA expression level(s) of at least 2 features selected from the CpG sites in Table 3 and the genes in Table 1 in a skin sample obtained from the subject, wherein the features comprise at least one CpG site in Table 3 and at least one gene in Table 1 , ii) predicting the subject's MED based on the determined methylation level(s) and RNA expression level(s) using a machine learning model trained on data comprising known MEDs and corresponding known methylation level(s) and RNA expression level(s) of the at least 2 features, and iii) outputting the predicted MED.
[13] Certain aspects and embodiments of the present invention may provide one or more of the following advantages:
• desired objectivity of UV sensitivity determination via MED prediction;
• desired precision of UV sensitivity determination via MED prediction;
• desired reproducibility of UV sensitivity determination via MED prediction;
• desired ability to determine UV sensitivity without harmful exposure of subjects to UV radiation;
• desired ability to determine UV sensitivity without subject being present throughout process.
[14] The details, examples, and preferences provided in relation to any particular one or more of the stated aspects of the present invention apply equally to all aspects of the present invention. Any combination of embodiments, examples, and preferences described herein in all possible variations thereof is encompassed by the present invention unless otherwise indicated herein, or otherwise clearly contradicted by context.
BRIEF DESCRIPTION OF THE DRAWINGS
[15] The invention will further be illustrated by reference to the following figures:
[16] Figure 1 shows the correlation between MED determined using the known, standardised method (x-axis) and predicted MED based on the RNA expression levels of the 20 genes in Table 2, set A (y-axis). The MAE of this model was 7.85 mJ/cm2.
[17] Figure 2 shows the correlation between MED determined using the known, standardised method (x-axis) and predicted MED (y-axis) based on the methylation levels of the 18 CpG sites in Table 4. The MAE of this model was 4.18 mJ/cm2.
[18] It is understood that the following description and references to the figures concern exemplary embodiments of the present invention and shall not be limited on the scope of the claims.
DETAILED DESCRIPTION
[19] The present invention is based on the finding that the RNA expression level of particular genes and/or the methylation level of particular CpG sites can be used to determine the sensitivity of a subject's skin to UV radiation by predicting the MED of a subject's skin.
Minimal Erythema Dose (MED)
[20] As used herein, “the sensitivity of a subject's skin to UV radiation” refers to how readily a subject's skin (including DNA) is damaged by UV radiation.
[21] The sensitivity of a subject's skin to UV radiation can be represented by a minimal erythema dose (MED). The “minimal erythema dose” is the minimal dose of UV (measured in mJ/cm2) which results in perceptible erythema (redness) and/or oedema (swelling) of the skin 24 to 48 hours after exposure to UV radiation. The known, standardised method for determining MED is provided in ISO 2444:2010.
[22] As used herein, “ultraviolet radiation” (“UV radiation”) refers to electromagnetic radiation with a wavelength of from about 100 nm to about 400 nm, including UVC radiation (from about 100 to about 280nm), UVB radiation (from about 280 to about 315nm), and UVA radiation (from about 315nm to about 400nm).
Subjects and sampling
[23] As used herein, “a subject” refers to a human subject. In certain embodiments, the subject may be at least 16, 18, or 30 years old.
[24] In certain embodiments, the subject may be phototype l-VI on the Fitzpatrick scale (I meaning always burns, never tans; II meaning burns easily, then develops a light tan; III meaning burns moderately, then develops a light tan; IV meaning burns minimally to rarely, then develops a moderate tan; V meaning never burns, always develops a dark tan; VI meaning never burns, no noticeable change in appearance). In certain embodiments, the subject may be phototype l-IV on the Fitzpatrick scale.
[25] In the present invention, it is not necessary for the subject's skin to have been exposed to UV radiation before sampling. Nevertheless, the subject's skin may have previously been exposed to UV radiation and may be tanned or burnt at the time of sampling.
[26] Ideally, the subject had/has not taken anti-histamine or anti-inflammatory drugs within two weeks prior to skin sampling.
[27] As used herein, “a skin sample” refers to a sample comprising skin cells.
[28] In certain embodiments, the skin cells may have been/may be obtained by harvesting the entire skin sample required from the individual. Harvesting a sample from the individual may be carried out using suction blistering, punch biopsy, shave biopsy or during any surgical procedure such as plastic surgery, lifting, grafting, or the like. In certain embodiments, the sample may have been/may be obtained by suction blistering.
[29] In certain embodiments, the skin cells may have been/may be obtained by culturing the skin cells using an in vitro method. Skin cells may have been/may be cultured from a small sample of skin cells harvested from an individual. The harvested human skin cells may have been/ may be grown in vitro in a vessel such as a petri dish in a medium or substrate that supplies essential nutrients.
[30] The skin samples may have been/may be obtained from the epidermis or dermis. Hence, the skin sample may comprise, consist, or consist essentially of epidermal cells and/or dermal cells. In certain embodiments, the skin sample may comprise, consist, or consist essentially of epidermal cells. The skin cells may comprise a mixture of harvested cells and cultured cells.
RNA expression level of genes
[31] The RNA expression level of particular genes can be determined, for example, by RNA-Seq (e.g. using lllumina's® TruSeq RNA Library Prep Kit and HiSeq system), RT- qPCR, SAGE, EST sequencing, or hybridisation-based methods such as microarrays.
[32] The RNA expression level of a particular gene can be measured in transcripts per million (TPM). A value of x TPM means that for every 1 million RNA molecules in the sample, x came from the gene of interest.
[33] The inventors of the present invention identified 200 genes whose RNA expression levels individually exhibit a strong linear correlation with MED (see Table 1). The inventors of the present invention also identified sets of 18-20 genes whose RNA expression levels can be used to predict MED with high accuracy (see Table 2).
[34] The inventors surprisingly and advantageously found that the RNA expression level of as few as 2 of the genes in Table 1 could be used to accurately predict MED. MAEs of less than about 25 mJ/cm2 were consistently achieved when using from 2 to 200 genes selected from Table 1 to predict MED.
[35] Accordingly, in certain embodiments, the methods of the present invention may comprise determining the RNA expression level of at least 2 genes selected from Table 1 or 2. In certain embodiments, the methods of the present invention may comprise a step of determining the RNA expression level of from 2 to 200, 2 to 150, 2 to 100, 2 to 50, or 2 to 20 genes selected from Table 1 or 2. In certain embodiments, the genes may comprise or consist of: ENSG00000197978 and ENSG00000277060; ENSG00000197978 and ENSG00000100376; ENSG00000197978 and ENSG00000172799; ENSG00000197978 and ENSG00000166670; or ENSG00000172799 and ENSG00000159247. Such methods using at least 2 genes may achieve an absolute error of about 25 mJ/cm2 or less, about 20 mJ/cm2 or less, about 15 mJ/cm2 or less, about 10 mJ/cm2 or less, or about 5 mJ/cm2 or less.
[36] In certain embodiments, the methods of the present invention may comprise determining the RNA expression level of at least 5 genes selected from Table 1 or 2. In certain embodiments, the methods of the present invention may comprise a step of determining the RNA expression level of from 5 to 200, 5 to 150, 5 to 100, 5 to 50, or 5 to 20 genes selected from Table 1 or 2. In certain embodiments, the genes may comprise or consist of: ENSG00000197978, ENSG00000277060, ENSG00000100376, ENSG00000166670, and
ENSG00000172799; ENSG00000197978, ENSG00000159247, ENSG00000100376,
ENSG00000166670, and ENSG00000172799; ENSG00000277060, ENSG00000100376,
ENSG00000166670, ENSG00000172799, and ENSG00000159247; ENSG00000197978,
ENSG00000100376, ENSG00000166670, ENSG00000106392, and ENSG00000159247; or ENSG00000197978, ENSG00000100376, ENSG00000172799, ENSG00000159247, and
ENSG00000260075. Such methods using at least 5 genes may achieve an absolute error of about 25 mJ/cm2 or less, about 20 mJ/cm2 or less, about 15 mJ/cm2 or less, about 13 mJ/cm2 or less, about 10 mJ/cm2 or less, or about 5 mJ/cm2 or less.
[37] In certain embodiments, the methods of the present invention may comprise determining the RNA expression level of at least 10 genes selected from Table 1 or 2. In certain
embodiments, the methods of the present invention may comprise a step of determining the RNA expression level of from 10 to 200, 10 to 150, 10 to 100, 10 to 50, or 10 to 20 genes selected from Table 1 or 2. In certain embodiments, the genes may comprise or consist of:
ENSG00000197978, ENSG00000277060, ENSG00000100376, ENSG00000166670,
ENSG00000172799, ENSG00000106392, ENSG00000112139, ENSG00000130779,
ENSG00000159247, and ENSG00000260075; ENSG00000197978, ENSG00000277060, ENSG00000166670, ENSG00000172799, ENSG00000106392, ENSG00000159247,
ENSG00000100376, ENSG00000260075, ENSG00000188818, and ENSG00000169282; ENSG00000166670, ENSG00000172799, ENSG00000106392, ENSG00000100376,
ENSG00000260075, ENSG00000188818, ENSG00000115602, ENSG00000112139,
ENSG00000224472, and ENSG00000233913; or ENSG00000277060, ENSG00000166670, ENSG00000106392, ENSG00000159247, ENSG00000100376, ENSG00000260075,
ENSG00000169282, ENSG00000126890, ENSG00000176933, and ENSG00000134184. Such methods using at least 10 genes may achieve an absolute error of about 25 mJ/cm2 or less, about 20 mJ/cm2 or less, about 15 mJ/cm2 or less, about 10 mJ/cm2 or less, or about 5 mJ/cm2 or less.
[38] In certain embodiments, the methods of the present invention may comprise determining the RNA expression level of at least 18 genes selected from Table 1 or 2. In certain embodiments, the methods of the present invention may comprise a step of determining the RNA expression level of from 18 to 200, 18 to 150, 18 to 100, or 18 to 50 genes selected from Table 1 or 2. In certain embodiments, the genes may comprise or consist of 18 genes selected from set A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, or R in Table 2. In one example, the genes may comprise of consist of 18 genes selected from set A in Table 2. Such methods using at least 18 genes may achieve an absolute error of about 25 mJ/cm2 or less, about 20 mJ/cm2 or less, about 15 mJ/cm2 or less, about 10 mJ/cm2 or less, about 8 mJ/cm2 or less, or about 5 mJ/cm2 or less.
[39] In certain embodiments, the methods of the present invention may comprise determining the RNA expression level of at least 20 genes selected from Table 1 or 2. In certain embodiments, the methods of the present invention may comprise a step of determining the RNA expression level of from 20 to 200, 20 to 150, 20 to 100, or 20 to 50 genes selected from Table 1 or 2. In certain embodiments, the genes may comprise or consist of the genes provided in set A, D, G, H, N, O, P, or R in Table 2. In one example, the genes may comprise of consist of the genes provided in set A in Table 2. Such methods using at least 20 genes may achieve an absolute error of about 25 mJ/cm2 or less, about 20 mJ/cm2 or less, about 15 mJ/cm2 or less, about 10 mJ/cm2 or less, about 8 mJ/cm2 or less, or about 5 mJ/cm2 or less.
Table 1 - 200 predictive genes:
Methylation level of CpG sites [40] As used herein, a “CpG site” (also referred to as a CpG dinucleotide) is a cytosine nucleotide immediately followed by a guanine nucleotide in the 5' to 3' direction within a DNA molecule. CpG sites may be in coding or non-coding regions of the genome. CpG sites may be in CpG islands, which are regions having a high density of CpG sites.
[41] The cytosine in a CpG site can be methylated by DNA methyltransferases to become 5-methylcytosine. It is known that methylation of CpG sites within a gene can influence the transcriptional regulation and thus expression of the gene (epigenetic regulation).
[42] The methylation level of particular CpG sites can be determined, for example, by methylation specific PCR, sequence analysis of bisulfite treated DNA, CHIP-sequencing (lllumina Methylation BeadChip Technology), molecular inversion probe assay, Methyl-CAP- sequencing, Next-Generation-sequencing, COBRA-Assay, methylation specific restriction patterns, or MassARRAY assay.
[43] The methylation level of a particular gene can be represented by its M-value, which is the log2 ratio of the intensities of methylated probe versus unmethylated probe. Hence, positive M-values mean that more molecules are methylated than unmethylated, while negative M-values mean the opposite.
[44] The inventors of the present invention identified 200 specific CpG sites whose methylation levels individually exhibit a strong linear correlation with MED (see Table 3). The inventors of the present invention also identified sets of 18-20 CpG sites whose methylation levels can be used to predict MED with high accuracy (see Table 4).
[45] The inventors surprisingly and advantageously found that the methylation level of as few as 2 of the CpG sites in Table 3 could be used to predict MED accurately. MAEs of less than 21 mJ/cm2 were consistently achieved when MED was predicted using from 2 to 200 CpG sites selected from Table 3.
[46] Accordingly, in certain embodiments, the methods of the present invention may comprise determining the methylation level of at least 2 CpG sites selected from Table 3 or 4. In certain embodiments, the methods of the present invention may comprise a step of determining the methylation level of from 2 to 200, 2 to 150, 2 to 100, 2 to 50, or 2 to 18 CpG sites selected from Table 3 or 4. In certain embodiments, the CpG sites may comprise or consist of: cg06376130 and cg03953789; cg18269134 and cg00613587; cg06376130 and cg01199135; cg03953789 and cg00492074; or cg06376130 cg18269134 (described further in Table 3). Such methods using at least 2 CpG sites may achieve an absolute error of about 20 mJ/cm2 or less, about 15 mJ/cm2 or less, about 10 mJ/cm2 or less, or about 5 mJ/cm2 or less.
[47] In certain embodiments, the methods of the present invention comprise determining the methylation level of at least 5 CpG sites selected from Table 3 or 4. In certain embodiments, the methods of the present invention comprise a step of determining the methylation level of from 5 to 200, 5, to 150, 5 to 100, 5 to 50, or 5 to 18 CpG sites selected from Table 3 or 4. In certain embodiments, the CpG sites may comprise or consist of: cg06376130, cg03953789, cg18269134, cg00613587, and cg01199135; cg06376130, eg 18269134, cg00613587, cg01199135, and cg00492074; cg03953789, cg00613587, cg01199135, cg00492074, and cg20271602; cg03953789, cg18269134, cg01199135, cg20707157, and cg22235661 ; or cg06376130, cg03953789, cg01199135, cg00492074, and cg10094916 (described further in Table 3). Such methods using at least 5 CpG sites may achieve an absolute error of about 20 mJ/cm2 or less, about 15 mJ/cm2 or less, about 12 mJ/cm2 or less, about 10 mJ/cm2 or less, or about 5 mJ/cm2 or less.
[48] In certain embodiments, the methods of the present invention may comprise determining the methylation level of at least 10 CpG sites selected from T able 3 or 4. In certain embodiments, the methods of the present invention may comprise a step of determining the methylation level of from 10 to 200, 10 to 150, 10 to 100, 10 to 50, or 10 to 18 CpG sites selected from T able 3 or 4. In certain embodiments, the CpG sites may comprise or consist of: cg06376130, cg03953789, cg00613587, cg20707157, cg09218398, cg26096304, cg09174638, cg00492074, cg20271602, and cg22235661; cg03953789, cg18269134, cg00613587, cg01199135, cg20707157, cg00492074, cg20271602, cg22235661 , cg24688871, and cg10094916; cg06376130, cg03953789, cg01199135, cg20707157, cg09218398, cg00492074, cg20271602, cg22235661 , cg17972013, and cg15224600; or eg 18269134, cg00613587, cg01199135, cg20707157, cg00492074, cg20271602, cg22235661, cg24688871, and cg10094916 (described further in Table 3). Such methods
using at least 10 CpG sites may achieve an absolute error of about 20 mJ/cm2 or less, about 15 mJ/cm2 or less, about 10 mJ/cm2 or less, about 7 mJ/cm2 or less, or about 5 mJ/cm2 or less.
[49] In certain embodiments, the methods of the present invention may comprise determining the methylation level of at least 18 CpG sites selected from T able 3 or 4. In certain embodiments, the methods of the present invention may comprise a step of determining the methylation level of from 18 to 200, 18 to 150, 18 to 100, or 18 to 50 CpG sites selected from Table 3 or 4. In certain embodiments, the CpG sites may be selected from the CpG sites provided in set A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, or R in Table 4. In one example, the CpG sites may comprise of consist of the 18 CpG sites provided in set A in Table 4. Such methods using at least 18 CpG sites may achieve an absolute error of about 20 mJ/cm2 or less, about 15 mJ/cm2 or less, about 10 mJ/cm2 or less, or about 5 mJ/cm2 or less.
[50] In certain embodiments, the methods of the present invention may comprise determining the methylation level of at least 20 CpG sites selected from Table 3 or 4. In certain embodiments, the methods of the present invention may comprise a step of determining the methylation level of from 20 to 200, 20 to 150, 20 to 100, or 20 to 50 CpG sites selected from Table 3 or 4. In certain embodiments, the CpG sites may comprise or consist of the CpG sites provided in set B, C, D, E, G, H, I, J, K, N, O, P, or R in Table 4. Such methods using at least 18 CpG sites may achieve an absolute error of about 20 mJ/cm2 or less, about 15 mJ/cm2 or less, about 10 mJ/cm2 or less, or about 5 mJ/cm2 or less. Table 3 - 200 predictive CpG sites:
Table 4 -sets of predictive CpG sites:
RNA expression level of genes and methylation level of CpG sites
[51] The inventors of the present invention also found that combinations of RNA expression levels of genes and methylation levels of CpG sites can be used to predict MED.
[52] As used herein, “feature” refers to a gene or CpG site. Accordingly, “features” refers to a plurality of genes and/or CpG sites.
[53] The inventors identified 200 features (including genes and CpG sites) whose RNA expression/methylation levels (as appropriate) individually exhibit a strong linear correlation with MED. The inventors also identified sets of 18-20 features (including genes and CpG sites) whose RNA expression/methylation levels can be used to predict MED with high accuracy.
[54] The inventors surprisingly and advantageously found that the RNA expression level of 1 gene in Table 1 and the methylation level of 1 CpG site in Table 3 could be used to predict MED accurately. MAEs of less than about 25 mJ/cm2 were consistently achieved when MED was predicted using from 2 to 200 features selected from Tables 1 and 3, wherein the features comprise at least one gene and at least one CpG site.
[55] Accordingly, in certain embodiments, the methods of the present invention may comprise determining the RNA expression level of at least one gene selected from Table 1 or 2 and the methylation level of at least one CpG site selected from Table 3 or 4. In other words, the methods of the present invention may comprise determining the RNA expression or methylation levels (as appropriate) of at least 2 features selected from Tables 1-4, wherein the
features comprise at least one gene and at least one CpG site. In certain embodiments, the methods of the present invention may comprise a step of determining the RNA expression or methylation levels (as appropriate) of from 2 to 200, 2 to 150, 2 to 100, 2 to 50, 2 to 20, or 2 to 18 features selected from Tables 1-4, wherein the features comprise at least one gene and at least one CpG site. In certain embodiments, the features may comprise or consist of: ENSG00000197978 and cg06376130; ENSG00000172799 and cg00613587;
ENSG00000197978 and cg03953789; ENSG00000277060 and cg06376130; or
ENSG00000166670 and cg03953789. Such methods using 2 features may achieve an absolute error of about 25 mJ/cm2 or less, 20 mJ/cm2 or less, about 15 mJ/cm2 or less, about 10 mJ/cm2 or less, or about 5 mJ/cm2 or less.
[56] In certain embodiments, methods of the present invention may comprise determining the RNA expression or methylation levels (as appropriate) of at least 5 features selected from Tables 1-4, wherein the features comprise at least one gene and at least one CpG site. In certain embodiments, the methods of the present invention may comprise a step of determining the RNA expression or methylation levels (as appropriate) of from 5 to 200, 5 to 150, 5 to 100, 5 to 50, 5 to 20, or 5 to 18 features selected from Tables 1-4, wherein the features comprise at least one gene and at least one CpG site. In certain embodiments, the features may comprise or consist of: ENSG00000197978, ENSG00000277060, cg06376130, eg 18269134, and cg00613587; ENSG00000197978, cg06376130, cg00613587, cg03953789, and cg00492074; ENSG00000277060, cg03953789, cg18269134, cg01199135, and cg00492074; ENSG00000159247, ENSG00000100376, ENSG00000172799, cg06376130, and eg 18269134; or ENSG00000100376, cg06376130, cg18269134, cg00613587, and cg01199135. Such methods using 5 features may achieve an absolute error of about 25 mJ/cm2 or less, 20 mJ/cm2 or less, about 15 mJ/cm2 or less, about 10 mJ/cm2 or less, or about 5 mJ/cm2 or less.
[57] In certain embodiments, methods of the present invention may comprise determining the RNA expression or methylation levels (as appropriate) of at least 10 features selected from Tables 1-4, wherein the features comprise at least one gene and at least one CpG site. In certain embodiments, the methods of the present invention may comprise a step of determining the RNA expression or methylation levels (as appropriate) of from 10 to 200, 10 to 150, 10 to 100, 10 to 50, 10 to 20, or 10 to 18 features selected from T ables 1 -4, wherein the features comprise at least one gene and at least one CpG site. In certain embodiments, the features may comprise or consist of: ENSG00000197978, ENSG00000277060, ENSG00000100376, ENSG00000166670, ENSG00000172799, cg03953789, cg18269134, cg00613587, cg09218398, and cg10094916; ENSG00000197978, ENSG00000166670, cg06376130, cg00613587, cg26096304, cg20707157, cg20271602, cg22235661, cg24688871, and cg09174638; ENSG00000277060, cg06376130, cg03953789, cg18269134,
cg01199135, cg20707157, cg09218398, cg00492074, cg20271602, and cg22235661; or ENSG00000197978, ENSG00000277060, ENSG00000172799, ENSG00000159247, cg03953789, cg00613587, cg01199135, cg26096304, cg00492074, and cg10094916. Such methods using 10 features may achieve an absolute error of about 20 mJ/cm2 or less, about 15 mJ/cm2 or less, about 10 mJ/cm2 or less, or about 5 mJ/cm2 or less, or about 3 mJ/cm2 or less.
[58] In certain embodiments, methods of the present invention may comprise determining the RNA expression or methylation levels (as appropriate) of at least 18 features selected from Tables 1-4, wherein the features comprise at least one gene and at least one CpG site. In certain embodiments, the methods of the present invention may comprise a step of determining the RNA expression or methylation levels (as appropriate) of from 18 to 200, 18 to 150, 18 to 100, or 18 to 50 features selected from Tables 1-4, wherein the features comprise at least one gene and at least one CpG site. In certain embodiments, the CpG sites may comprise or consist of the features provided in sets A, D, E, F, H, or I of Table 5. Such methods using 18 features may achieve an absolute error of about 15 mJ/cm2 or less, about 10 mJ/cm2 or less, or about 5 mJ/cm2 or less, or about 3 mJ/cm2 or less.
[59] In certain embodiments, methods of the present invention may comprise determining the RNA expression or methylation levels (as appropriate) of at least 20 features selected from Tables 1-4, wherein the features comprise at least one gene and at least one CpG site. In certain embodiments, the methods of the present invention may comprise a step of determining the RNA expression or methylation levels (as appropriate) of from 20 to 200, 20 to 150, 20 to 100, or 20 to 50 features selected from Tables 1-4, wherein the features comprise at least one gene and at least one CpG site. In certain embodiments, the CpG sites may comprise or consist of the features provided in sets B, C, G, or J of T able 5. Such methods using 20 features may achieve an absolute error of about 15 mJ/cm2 or less, about 10 mJ/cm2 or less, or about 5 mJ/cm2 or less, or about 3 mJ/cm2 or less.
Determining UV sensitivity by predicting MED
[60] Once the RNA expression level of the genes (i.e. the at least 2, 5, 10, 18, or 20 genes selected from Table 1 , Table 2, or set A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, or R in Table 2; or the 2 to 200, 2 to 150, 2 to 100, 2 to 50, 2 to 20, 5 to 200, to 150, 5 to 100, 5 to 50, 5 to 20, 10 to 200, 10 to 150, 10 to 100, 10 to 50, 10 to 20, 18 to 200, 18 to 150, 18 to 100, 18 to 50, 18 to 20, 20 to 200, 20 to 150, 20 to 100, or 20 to 50 genes selected from Table 1) has been determined, the method may further comprise a step of predicting the subject's MED based on the determined RNA expression levels using a machine learning model, wherein the machine learning model has been trained on data comprising known MEDs and corresponding known RNA expression levels of the same genes.
[61] In certain embodiments, the methods of the present invention may comprise a step of training a machine learning model on data comprising known MEDs and corresponding known RNA expression levels of the same genes.
[62] Once the methylation level of the CpG sites (i.e. the at least 2, 5, 10, 18, or 20 CpG sites selected from Table 3, Table 4, or set A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, or R in Table 4; or the 2 to 200, 2 to 150, 2 to 100, 2 to 50, 2 to 20, 5 to 200, to 150, 5 to 100, 5 to 50, 5 to 20, 10 to 200, 10 to 150, 10 to 100, 10 to 50, 10 to 20, 18 to 200, 18 to 150, 18 to 100, 18 to 50, 18 to 20, 20 to 200, 20 to 150, 20 to 100, or 20 to 50 CpG sites selected from Table 3) has been determined, the method may further comprise a step of predicting the subject's MED based on the determined methylation levels using a machine learning model, wherein the machine learning model has been trained on data comprising known MEDs and corresponding known methylation levels of the same CpG sites.
[63] In certain embodiments, the methods of the present invention may comprise a step of training a machine learning model on data comprising known MEDs and corresponding known methylation levels of the same CpG sites.
[64] Once the RNA expression and methylation levels (as appropriate) of the features (i.e. the at least 2, 5, 10, 18, or 20 features selected from Tables 1-4, or the features in set A, B, C, D, E, F, G, H, I, or J of Table 5, or the 2 to 200, 2 to 150, 2 to 100, 2 to 50, 2 to 20, 5 to 200, to 150, 5 to 100, 5 to 50, 5 to 20, 10 to 200, 10 to 150, 10 to 100, 10 to 50, 10 to 20, 18 to 200, 18 to 150, 18 to 100, 18 to 50, 18 to 20, 20 to 200, 20 to 150, 20 to 100, or 20 to 50 features selected from Tables 1 and 3, wherein the features comprise at least one gene and one CpG site) have been determined, the method may further comprise a step of predicting the subject's MED based on the determined RNA expression and methylation levels using a machine learning model, wherein the machine learning model has been trained on data comprising known MEDs and corresponding known RNA expression and methylation levels of the same features.
[65] In certain embodiments, the methods of the present invention may comprise a step of training a machine learning model on data comprising known MEDs and corresponding known RNA expression and methylation levels of the same features.
[66] As used herein, “corresponding” RNA expression and/or methylation levels and MEDs means RNA expression and/or methylation levels and MEDs determined from the same subject. As used herein, “known” means previously determined. Hence, a “known MED and corresponding known RNA expression level” means an MED determined on a subject's skin and an RNA expression level determined using a skin sample from the same subject. In some embodiments, the known RNA expression levels, the known methylation levels, and/or the known MEDs are derived from at least 5, 10, 15, 20, 30, or 32 subjects. Optionally, the known RNA expression levels, the known methylation levels, and/or the known MEDs are derived
from at least 20 subjects. Further optionally, the known RNA expression levels, the known methylation levels, and/or the known MEDs are derived from at least 32 subjects. Known MEDs may have been determined by any known method for determining MED, for example using the standardised method provided in ISO 2444:2010.
[67] Machine learning models may be used to determine predictive feature sets.
[68] In certain embodiments, the machine learning model may be a supervised learning model, for example a support vector machine (SVM).
[69] In certain embodiments, the machine learning model may perform sequential backward selection (SBS; also referred to as sequential feature elimination), sequential forward selection (SFS), exhaustive search, random search, or search using genetic algorithms.
[70] In certain embodiments, the machine learning model may use a regression model, for example a Lasso regression model, a general linear model, a Lasso/ridge regression and elastic nets, decision trees, random forests, gradient boosting, or deep learning and neural networks.
[71] As used herein, “absolute error” refers to the difference between a subject's MED determined by the known, standardised method and the subject's MED predicted using a method of the invention. As used herein, “MAE” (“mean absolute error”) refers to the mean of a plurality of absolute errors.
[72] In certain embodiments, the absolute error of a predicted MED may be about 25 mJ/cm2 or less, about 20 mJ/cm2 or less, about 15 mJ/cm2 or less, about 10 mJ/cm2 or less, or about 5 mJ/cm2 or less.
Personalised UV protection and therapy
[73] Once the MED of a subject's skin has been predicted, this information may be used to provide recommendations that are personalised to the subject.
[74] In certain embodiments, the methods of the invention may comprise a step of determining the maximum dose of UV radiation the subject can be exposed to before experiencing negative effects thereof. The negative effects may include skin damage (for example tanning, burning, premature aging) or DNA damage.
[75] In certain embodiments, the methods of the invention may comprise a step of determining which UV protection substances or strategies are appropriate for the subject, for example sunscreens with particular SPFs or sunlight avoidance.
[76] In certain embodiments, the methods of the invention may comprise a step of determining the minimum or optimal dose of UV radiation the subject should be exposed to in order to experience positive effects thereof. The positive effects may include vitamin D synthesis or treatment of a disease or condition. The disease or condition may be selected
from the list consisting of: vitamin D deficiency; eczema; acne; psoriasis; graft-versus-host disease; vitiligo; mycosis fungoides; large-plaque parapsoriasis; and cutaneous T-cell lymphoma. The treatment may be in the presence of a psoralen (i.e. PUVA therapy).
[77] In certain embodiments, the methods of the invention may comprise a step of administering an effective amount of a UV protectant to the subject. As used herein “an effective amount” means an amount effective to prevent or reduce damage to the subject's skin by UV radiation, including DNA damage.
[78] In certain embodiments, a “UV protection substance” or “UV protectant” may be a chemical absorber (i.e. an organic chemical compound that absorbs UV light, for example salicalate, cinnimate, or benzophenone) or a physical blocker (i.e. inorganic particulates that reflect, scatter, or absorb UV light, for example Titanium Dioxide or Zinc Oxide).
[79] Certain embodiments of the present invention may have one of more of the following effects:
• reduced risk of harm to the subject (no need for UV exposure) when determining UV sensitivity via MED;
• location-independent determination of UV sensitivity indicated by MED;
• increased objectivity in determining UV sensitivity indicated by MED;
• increased precision in determining UV sensitivity indicated by MED; and
• increased reproducibility in determining UV sensitivity indicated by MED (reduced bias from e.g. previous UV exposure).
[80] For the avoidance of doubt, the present application is directed to subject-matter described in the following numbered paragraphs:
1. A method for predicting the minimal erythema dose (MED) of a subject's skin, the method comprising: i. determining the RNA expression levels of at least 2 genes selected from the genes in Table 1 in a skin sample obtained from the subject, and ii. predicting the subject's MED based on the determined RNA expression levels using a machine learning model trained on data comprising known MEDs and corresponding known RNA expression levels of the at least two genes.
2. A method for preventing damage to a subject's skin by UV radiation, the method comprising: i. receiving or obtaining a skin sample from a subject;
ii. determining the RNA expression levels of at least 2 genes selected from the genes in Table 1 in a skin sample obtained from the subject; iii. predicting the subject's MED based on the determined RNA expression levels using a machine learning model trained on data comprising known MEDs and corresponding known RNA expression levels of the at least two genes; and iv. administering an effective amount of a UV protectant to the subject.
3. The method of paragraph 1 or 2, wherein the at least 2 genes comprise at least 5 genes.
4. The method of any preceding paragraph, wherein the at least 2 genes comprise at least 10 genes.
5. The method of any preceding paragraph, wherein the at least 2 genes comprise at least 18 genes.
6. The method of any preceding paragraph, wherein the at least 2 genes comprise at least 20 genes.
7. The method of any preceding paragraph, wherein the genes are selected from the genes in Table 2.
8. The method of any of paragraphs 1-7, wherein the genes are selected from the genes in set A, D, G, H, N, O, P, or R of Table 2.
9. The method of any of paragraphs 1-8, wherein the genes are selected from the genes in set A of Table 2.
10. The method of any of paragraphs 1-5, wherein the genes are selected from the genes in set B, C, E, F, I , J, K L, M, or Q in Table 2.
11. The method of any of paragraphs 1-7, wherein the genes comprise: i. ENSG00000197978 and ENSG00000277060; ii. ENSG00000197978 and ENSG00000100376; iii. ENSG00000197978 and ENSG00000172799; iv. ENSG00000197978 and ENSG00000166670; or v. ENSG00000172799 and ENSG00000159247.
12. The method of any of paragraphs 1-7, wherein the genes comprise: i. ENSG00000197978, ENSG00000277060, ENSG00000100376, ENSG00000166670, and ENSG00000172799; ii. ENSG00000197978, ENSG00000159247, ENSG00000100376, ENSG00000166670, and ENSG00000172799; iii. ENSG00000277060, ENSG00000100376, ENSG00000166670, ENSG00000172799, and ENSG00000159247; iv. ENSG00000197978, ENSG00000100376, ENSG00000166670, ENSG00000106392, and ENSG00000159247; or v. ENSG00000197978, ENSG00000100376, ENSG00000172799, ENSG00000159247, and ENSG00000260075.
13. The method of any of paragraphs 1-7, wherein the genes comprise: i. ENSG00000197978, ENSG00000277060, ENSG00000100376, ENSG00000166670, ENSG00000172799, ENSG00000106392, ENSG00000112139, ENSG00000130779, ENSG00000159247, and ENSG00000260075; ii. ENSG00000197978, ENSG00000277060, ENSG00000166670, ENSG00000172799, ENSG00000106392, ENSG00000159247, ENSG00000100376, ENSG00000260075, ENSG00000188818, and ENSG00000169282; iii. ENSG00000166670, ENSG00000172799, ENSG00000106392, ENSG00000100376, ENSG00000260075, ENSG00000188818, ENSG00000115602, ENSG00000112139, ENSG00000224472, and ENSG00000233913; or iv. ENSG00000277060, ENSG00000166670, ENSG00000106392, ENSG00000159247, ENSG00000100376, ENSG00000260075, ENSG00000169282, ENSG00000126890, ENSG00000176933, and ENSG00000134184.
14. The method of any of paragraphs 1-7, wherein the genes comprise the genes in set A, B, C, D, E, F, G, H, I , J, K L, M, N, O, P, Q, or R in Table 2.
15. The method of any of paragraphs 1-7, wherein the genes comprise the genes in set A in Table 2.
16. The method of any preceding paragraph, wherein the genes comprise up to 200,150, 100, 50, or 20 genes.
17. The method of any preceding paragraph, wherein the sample comprises, consists, or consists essentially of epidermis cells and/or dermis cells.
18. The method of any preceding paragraph, wherein the subject is phototype l-IV on the Fitzpatrick scale.
19. The method of any preceding paragraph, wherein the machine learning model is a support vector machine (SVM).
20. The method of any preceding paragraph, wherein the machine learning model performs sequential backward selection (SBS).
21. The method of any preceding paragraph, wherein the absolute error is about 25 mJ/cm2 or less, about 20 mJ/cm2 or less, about 15 mJ/cm2 or less, about 10 mJ/cm2 or less, or about 5 mJ/cm2 or less.
22. The method of any preceding paragraph, further comprising: i. determining the maximum dose of UV radiation the subject can be exposed to before experiencing negative effects thereof; ii. determining an appropriate UV protection substance or strategy for the subject; iii. determining the minimum dose of UV radiation the subject should be exposed to in order to experience positive effects thereof; and/or iv. determining the appropriate UV dose for treating a disease or condition susceptible to UV therapy in the subject.
23. A computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out the following steps: i. inputting the RNA expression levels of at least two genes selected from the genes in Table 1 in a skin sample obtained from the subject; ii. predicting the subject's MED based on the determined RNA expression levels using a machine learning model trained on data comprising known MEDs and corresponding known RNA expression levels of the at least two genes; and iii. outputting the predicted MED.
24. The computer program of paragraph 22, wherein the at least 2 genes are at least 5 genes.
25. The computer program of any of paragraphs 22-23, wherein the at least 2 genes are at least 10 genes.
26. The computer program of any of paragraphs 22-24, wherein the at least 2 genes are at least 18 genes.
27. The computer program of any of paragraphs 22-25, wherein the at least 2 genes are at least 20 genes.
28. The computer program of any of paragraphs 22-26, wherein the genes are selected from the genes in Table 2.
29. The computer program of any of paragraphs 22-27, wherein the genes are selected from the genes in set A, D, G, H, N, O, P, or R of Table 2.
30. The computer program of any of paragraphs 22-28, wherein the genes are selected from the genes in set A of Table 2.
31. The computer program of any of paragraphs 22-25, wherein the genes are selected from the genes in set B, C, E, F, I , J, K L, M, or Q in Table 2.
32. The computer program of any of paragraphs 22-27, wherein the genes comprise: i. ENSG00000197978 and ENSG00000277060; ii. ENSG00000197978 and ENSG00000100376; iii. ENSG00000197978 and ENSG00000172799; iv. ENSG00000197978 and ENSG00000166670; or v. ENSG00000172799 and ENSG00000159247.
33. The computer program of any of paragraphs 22-27, wherein the genes comprise: i. ENSG00000197978, ENSG00000277060, ENSG00000100376,
ENSG00000166670, and ENSG00000172799; ii. ENSG00000197978, ENSG00000159247, ENSG00000100376,
ENSG00000166670, and ENSG00000172799;
iii. ENSG00000277060, ENSG00000100376, ENSG00000166670,
ENSG00000172799, and ENSG00000159247; iv. ENSG00000197978, ENSG00000100376, ENSG00000166670,
ENSG00000106392, and ENSG00000159247; or v. ENSG00000197978, ENSG00000100376, ENSG00000172799,
ENSG00000159247, and ENSG00000260075.
34. The computer program of any of paragraphs 22-27, wherein the genes comprise: i. ENSG00000197978, ENSG00000277060, ENSG00000100376,
ENSG00000166670, ENSG00000172799, ENSG00000106392,
ENSG00000112139, ENSG00000130779, ENSG00000159247, and ENSG00000260075; ii. ENSG00000197978, ENSG00000277060, ENSG00000166670,
ENSG00000172799, ENSG00000106392, ENSG00000159247,
ENSG00000100376, ENSG00000260075, ENSG00000188818, and ENSG00000169282; iii. ENSG00000166670, ENSG00000172799, ENSG00000106392,
ENSG00000100376, ENSG00000260075, ENSG00000188818,
ENSG00000115602, ENSG00000112139, ENSG00000224472, and ENSG00000233913; or iv. ENSG00000277060, ENSG00000166670, ENSG00000106392,
ENSG00000159247, ENSG00000100376, ENSG00000260075,
ENSG00000169282, ENSG00000126890, ENSG00000176933, and ENSG00000134184.
35. The computer program of any of paragraphs 22-27, wherein the genes comprise the genes in set A, B, C, D, E, F, G, H, I , J, K L, M, N, O, P, Q, or R in Table 2.
36. The computer program of any of paragraphs 22-27, wherein the genes comprise the genes in set A in Table 2.
37. The computer program of any of paragraphs 22-35, wherein the genes comprise up to 200, 150, 100, 50, or 20 genes.
38. The computer program of any of paragraphs 22-36, wherein the sample comprised, consisted of, or consisted essentially of epidermis cells and/or dermis cells.
39. The computer program of any of paragraphs 22-37, wherein the subject is phototype l-IV on the Fitzpatrick scale.
40. The computer program of any of paragraphs 22-38, wherein the machine learning model is a support vector machine (SVM).
41. The computer program of any of paragraphs 22-39, wherein the machine learning model performs sequential backward selection (SBS).
42. The computer program of any of paragraphs 22-40, wherein the absolute error is about 25 mJ/cm2 or less, about 20 mJ/cm2 or less, about 15 mJ/cm2 or less, about 10 mJ/cm2 or less, or about 5 mJ/cm2 or less.
43. A computer-readable medium comprising the computer program of any of paragraphs 22-41.
44. A method for predicting the minimal erythema dose (MED) of a subject's skin, the method comprising: i. determining the methylation levels of at least 2 CpG sites selected from the CpG sites in Table 3 in a skin sample obtained from the subject, and ii. predicting the subject's MED based on the determined methylation levels using a machine learning model trained on data comprising known MEDs and corresponding known methylation levels of the at least two CpG sites.
45. A method for preventing damage to a subject's skin by UV radiation, the method comprising: i. receiving or obtaining a skin sample from a subject; ii. determining the methylation levels of at least 2 CpG sites selected from the CpG sites in Table 3 in a skin sample obtained from the subject; iii. predicting the subject's MED based on the determined methylation levels using a machine learning model trained on data comprising known MEDs and corresponding known methylation levels of the at least two CpG sites; and iv. administering an effective amount of a UV protectant to the subject.
46. The method of paragraph 44 or 45, wherein the at least 2 CpG sites comprise at least
5 CpG sites.
47. The method of any preceding paragraph, wherein the at least 2 CpG sites comprise at least 10 CpG sites.
48. The method of any preceding paragraph, wherein the at least 2 CpG sites comprise at least 18 CpG sites.
49. The method of any preceding paragraph, wherein the at least 2 CpG sites comprise at least 20 CpG sites.
50. The method of any preceding paragraph, wherein the CpG sites are selected from the CpG sites in Table 4.
51. The method of any of paragraphs 44-50, wherein the CpG sites are selected from the CpG sites in set B, C, D, E, G, H, I, J, K, N, O, P, or R in Table 4.
52. The method of any of paragraphs 44-48, wherein the CpG sites are selected from the CpG sites in set A, F, L, M, or Q in Table 4.
53. The method of any of paragraphs 44-48, wherein the CpG sites are selected from the CpG sites in set A of Table 4.
54. The method of any of paragraphs 44-50, wherein the CpG sites comprise: i. cg06376130 and cg03953789; ii. eg 18269134 and cg00613587; iii. cg06376130 and cg01199135; iv. cg03953789 and cg00492074; or v. cg06376130 cg18269134.
55. The method of any of paragraphs 44-50, wherein the CpG sites comprise: i. cg06376130, cg03953789, cg18269134, cg00613587, and cg01199135; ii. cg06376130, cg18269134, cg00613587, cg01199135, and cg00492074; iii. cg03953789, cg00613587, cg01199135, cg00492074, and cg20271602; iv. cg03953789, cg18269134, cg01199135, cg20707157, and cg22235661; or v. cg06376130, cg03953789, cg01199135, cg00492074, and cg10094916.
56. The method of any of paragraphs 44-50, wherein the CpG sites comprise: i. cg06376130, cg03953789, cg00613587, cg20707157, cg09218398, cg26096304, cg09174638, cg00492074, cg20271602, and cg22235661;
ii. cg03953789, cg18269134, cg00613587, cg01199135, cg20707157, cg00492074, cg20271602, cg22235661, cg24688871, and cg10094916; iii. cg06376130, cg03953789, cg01199135, cg20707157, cg09218398, cg00492074, cg20271602, cg22235661, cg17972013, and cg15224600; or iv. eg 18269134, cg00613587, cg01199135, cg20707157, cg00492074, cg20271602, cg22235661, cg24688871, and cg10094916.
57. The method of any of paragraphs 44-50, wherein the CpG sites comprise the CpG sites in set A, B, C, D, E, F, G, H, I , J, K L, M, N, O, P, Q, or R in Table 4.
58. The method of any of paragraphs 44-50, wherein the CpG sites comprise the CpG sites in set A in Table 4.
59. The method of any preceding paragraph, wherein the CpG sites comprise up to 200, 150, 100, 50, 20, or 18 CpG sites.
60. The method of any preceding paragraph, wherein the sample comprises, consists, or consists essentially of epidermis cells and/or dermis cells.
61. The method of any preceding paragraph, wherein the subject is phototype l-IV on the Fitzpatrick scale.
62. The method of any preceding paragraph, wherein the machine learning model is a support vector machine (SVM).
63. The method of any preceding paragraph, wherein the machine learning model performs sequential backward selection (SBS).
64. The method of any preceding paragraph, wherein the absolute error is about 25 mJ/cm2 or less, about 20 mJ/cm2 or less, about 15 mJ/cm2 or less, about 10 mJ/cm2 or less, or about 5 mJ/cm2 or less.
65. The method of any preceding paragraph, further comprising: i. determining the maximum dose of UV radiation the subject can be exposed to before experiencing negative effects thereof; ii. determining an appropriate UV protection substance or strategy for the subject;
iii. determining the minimum dose of UV radiation the subject should be exposed to in order to experience positive effects thereof; and/or iv. determining the appropriate UV dose for treating a disease or condition susceptible to UV therapy in the subject.
66. A computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out the following steps: i. inputting the methylation levels of at least two CpG sites selected from the CpG sites in Table 3 in a skin sample obtained from the subject; ii. predicting the subject's MED based on the determined methylation levels using a machine learning model trained on data comprising known MEDs and corresponding known methylation levels of the at least two CpG sites; and iii. outputting the predicted MED.
67. The computer program of paragraph 66, wherein the at least 2 CpG sites are at least 5 CpG sites.
68. The computer program of any of paragraphs 66-67, wherein the at least 2 CpG sites are at least 10 CpG sites.
69. The computer program of any of paragraphs 66-68, wherein the at least 2 CpG sites are at least 18 CpG sites.
70. The computer program of any of paragraphs 66-69, wherein the at least 2 CpG sites are at least 20 CpG sites.
71. The computer program of any of paragraphs 66-70, wherein the CpG sites are selected from the CpG sites in Table 4.
72. The computer program of any of paragraphs 66-71, wherein the CpG sites are selected from the CpG sites in set B, C, D, E, G, H, I, J, K, N, O, P, or R of Table 4.
73. The computer program of any of paragraphs 66-69, wherein the CpG sites are selected from the CpG sites in set A, F, L, M, or Q in Table 4.
74. The computer program of any of paragraphs 66-69, wherein the CpG sites are selected from the CpG sites in set A of Table 4.
75. The computer program of any of paragraphs 66-71, wherein the CpG sites comprise: i. cg06376130 and cg03953789; ii. eg 18269134 and cg00613587; iii. cg06376130 and cg01199135; iv. cg03953789 and cg00492074; or v. cg06376130 cg18269134.
76. The computer program of any of paragraphs 66-71, wherein the CpG sites comprise: i. cg06376130, cg03953789, cg18269134, cg00613587, and cg01199135; ii. cg06376130, cg18269134, cg00613587, cg01199135, and cg00492074; iii. cg03953789, cg00613587, cg01199135, cg00492074, and cg20271602; iv. cg03953789, cg18269134, cg01199135, cg20707157, and cg22235661; or v. cg06376130, cg03953789, cg01199135, cg00492074, and cg10094916.
77. The computer program of any of paragraphs 66-71, wherein the CpG sites comprise: i. cg06376130, cg03953789, cg00613587, cg20707157, cg09218398, cg26096304, cg09174638, cg00492074, cg20271602, and cg22235661; ii. cg03953789, cg18269134, cg00613587, cg01199135, cg20707157, cg00492074, cg20271602, cg22235661, cg24688871, and cg10094916; iii. cg06376130, cg03953789, cg01199135, cg20707157, cg09218398, cg00492074, cg20271602, cg22235661, cg17972013, and cg15224600; or iv. eg 18269134, cg00613587, cg01199135, cg20707157, cg00492074, cg20271602, cg22235661, cg24688871, and cg10094916.
78. The computer program of any of paragraphs 66-71, wherein the CpG sites comprise the CpG sites in set A, B, C, D, E, F, G, H, I , J, K L, M, N, O, P, Q, or R in Table 4.
79. The computer program of any of paragraphs 66-71, wherein the CpG sites comprise the CpG sites in set A in Table 4.
80. The computer program of any of paragraphs 66-79, wherein the CpG sites comprise up to 200,150, 100, 50, 20, or 18 CpG sites.
81. The computer program of any of paragraphs 66-80, wherein the sample comprised, consisted of, or consisted essentially of epidermis cells and/or dermis cells.
82. The computer program of any of paragraphs 66-81, wherein the subject is phototype l-IV on the Fitzpatrick scale.
83. The computer program of any of paragraphs 66-82, wherein the machine learning model is a support vector machine (SVM).
84. The computer program of any of paragraphs 66-83, wherein the machine learning model performs sequential backward selection (SBS).
85. The computer program of any of paragraphs 66-84, wherein the absolute error is about 25 mJ/cm2 or less, about 20 mJ/cm2 or less, about 15 mJ/cm2 or less, about 10 mJ/cm2 or less, or about 5 mJ/cm2 or less.
86. A computer-readable medium comprising the computer program of any of paragraphs 66-85.
87. A method for predicting the minimal erythema dose (MED) of a subject's skin, the method comprising: i. determining the methylation level(s) and RNA expression level(s) of at least 2 features selected from the CpG sites in Table 3 and the genes in Table 1 in a skin sample obtained from the subject, wherein the features comprise at least one CpG site in Table 3 and at least one gene in Table 1, and ii. predicting the subject's MED based on the determined methylation level(s) and RNA expression level(s) using a machine learning model trained on data comprising known MEDs and corresponding known methylation level(s) and RNA expression level(s) of the at least 2 features.
88. A method for preventing damage to a subject's skin by UV radiation, the method comprising: i. receiving or obtaining a skin sample from a subject; ii. determining the methylation level(s) and RNA expression level(s) of at least 2 features selected from the CpG sites in Table 3 and the genes in Table 1 in a skin sample obtained from the subject, wherein the features comprise at least one CpG site in Table 3 and at least one gene in Table 1; iii. predicting the subject's MED based on the determined methylation level(s) and RNA expression level(s) using a machine learning model trained on data
comprising known MEDs and corresponding known methylation level(s) and RNA expression level(s) of the at least 2 features; and iv. administering an effective amount of a UV protectant to the subject.
89. The method of paragraph 87 or 88, wherein the at least 2 features comprise at least 5 features.
90. The method of any of paragraphs 87-89, wherein the at least 2 features comprise at least 10 features.
91. The method of any of paragraphs 87-90, wherein the at least 2 features comprise at least 18 features.
92. The method of any of paragraphs 87-91, wherein the at least 2 features comprise at least 20 features.
93. The method of any of paragraphs 87-92, wherein the features are selected from the features in Table 5.
94. The method of any of paragraphs 87-93, wherein the features are selected from the features in set B, C, G, or J of Table 5.
95. The method of any of paragraphs 87-91, wherein the genes are selected from the genes in set A, D, E, F, H, I, or J in Table 5.
96. The method of any of paragraphs 87-93, wherein the features comprise: i. ENSG00000197978 and cg06376130; ii. ENSG00000172799 and cg00613587; iii. ENSG00000197978 and cg03953789; iv. ENSG00000277060 and cg06376130; or v. ENSG00000166670 and cg03953789.
97. The method of any of paragraphs 87-93, wherein the features comprise: i. ENSG00000197978, ENSG00000277060, cg06376130, cg18269134, and cg00613587; ii. ENSG00000197978, cg06376130, cg00613587, cg03953789, and cg00492074;
iii. ENSG00000277060, cg03953789, cg18269134, cg01199135, and cg00492074; iv. ENSG00000159247, ENSG00000100376, ENSG00000172799, cg06376130, and eg 18269134; or v. ENSG00000100376, cg06376130, cg18269134, cg00613587, and cg01199135.
98. The method of any of paragraphs 87-93, wherein the features comprise: i. ENSG00000197978, ENSG00000277060, ENSG00000100376,
ENSG00000166670, ENSG00000172799, cg03953789, cg18269134, cg00613587, cg09218398, and eg 10094916; ii. ENSG00000197978, ENSG00000166670, cg06376130, cg00613587, cg26096304, cg20707157, cg20271602, cg22235661, cg24688871, and cg09174638; iii. ENSG00000277060, cg06376130, cg03953789, cg18269134, cg01199135, cg20707157, cg09218398, cg00492074, cg20271602, and cg22235661; or iv. ENSG00000197978, ENSG00000277060, ENSG00000172799,
ENSG00000159247, cg03953789, cg00613587, cg01199135, cg26096304, cg00492074, and eg 10094916.
99. The method of any of paragraphs 87-93, wherein the features comprise the features in set A, B, C, D, E, F, G, H, I , or J in Table 5.
100. The method of any of paragraphs 87-93, wherein the features comprise the features in set A in Table 5.
101. The method of any of paragraphs 87-100, wherein the features comprise up to 200,150, 100, 50, or 20 features.
102. The method of any of paragraphs 87-101, wherein the sample comprises, consists, or consists essentially of epidermis cells and/or dermis cells.
103. The method of any of paragraphs 87-102, wherein the subject is phototype l-IV on the Fitzpatrick scale.
104. The method of any of paragraphs 87-103, wherein the machine learning model is a support vector machine (SVM).
105. The method of any of paragraphs 87-104, wherein the machine learning model performs sequential backward selection (SBS).
106. The method of any of paragraphs 87-105, wherein the absolute error is about 25 mJ/cm2 or less, about 20 mJ/cm2 or less, about 15 mJ/cm2 or less, about 10 mJ/cm2 or less, or about 5 mJ/cm2 or less.
107. The method of any of paragraphs 87-106, further comprising: i. determining the maximum dose of UV radiation the subject can be exposed to before experiencing negative effects thereof; ii. determining an appropriate UV protection substance or strategy for the subject; iii. determining the minimum dose of UV radiation the subject should be exposed to in order to experience positive effects thereof; and/or iv. determining the appropriate UV dose for treating a disease or condition susceptible to UV therapy in the subject.
108. A computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out the following steps: i. inputting the methylation level(s) and RNA expression level(s) of at least 2 features selected from the CpG sites in Table 3 and the genes in Table 1 in a skin sample obtained from the subject, wherein the features comprise at least one CpG site in Table 3 and at least one gene in Table 1, ii. predicting the subject's MED based on the determined methylation level(s) and RNA expression level(s) using a machine learning model trained on data comprising known MEDs and corresponding known methylation level(s) and RNA expression level(s) of the at least 2 features, and iii. outputting the predicted MED.
109. The computer program of paragraph 108, wherein the at least 2 features comprise at least 5 features.
110. The computer program of paragraph 108 or 109, wherein the at least 2 features comprise at least 10 features.
111. The computer program of any of paragraphs 108-110, wherein the at least 2 features comprise at least 18 features.
112. The computer program of any of paragraphs 108-111 , wherein the at least 2 features comprise at least 20 features.
113. The computer program of any of paragraphs 108-112, wherein the features are selected from the features in Table 5.
114. The computer program of any of paragraphs 108-113, wherein the features are selected from the features in set B, C, G, or J of Table 5.
115. The computer program of any of paragraphs 108-111 , wherein the genes are selected from the genes in set A, D, E, F, H, I, or J in Table 5.
116. The computer program of any of paragraphs 108-113, wherein the features comprise: i. ENSG00000197978 and cg06376130; ii. ENSG00000172799 and cg00613587; iii. ENSG00000197978 and cg03953789; iv. ENSG00000277060 and cg06376130; or v. ENSG00000166670 and cg03953789.
117. The computer program of any of paragraphs 108-113, wherein the features comprise: i. ENSG00000197978, ENSG00000277060, cg06376130, cg18269134, and cg00613587; ii. ENSG00000197978, cg06376130, cg00613587, cg03953789, and cg00492074; iii. ENSG00000277060, cg03953789, cg18269134, cg01199135, and cg00492074; iv. ENSG00000159247, ENSG00000100376, ENSG00000172799, cg06376130, and eg 18269134; or v. ENSG00000100376, cg06376130, cg18269134, cg00613587, and cg01199135.
118. The computer program of any of paragraphs 108-113, wherein the features comprise:
i. ENSG00000197978, ENSG00000277060, ENSG00000100376, ENSG00000166670, ENSG00000172799, cg03953789, cg18269134, cg00613587, cg09218398, and eg 10094916; ii. ENSG00000197978, ENSG00000166670, cg06376130, cg00613587, cg26096304, cg20707157, cg20271602, cg22235661, cg24688871, and cg09174638; iii. ENSG00000277060, cg06376130, cg03953789, cg18269134, cg01199135, cg20707157, cg09218398, cg00492074, cg20271602, and cg22235661; or iv. ENSG00000197978, ENSG00000277060, ENSG00000172799, ENSG00000159247, cg03953789, cg00613587, cg01199135, cg26096304, cg00492074, and eg 10094916.
119. The computer program of any of paragraphs 108-113, wherein the features comprise the features in set A, B, C, D, E, F, G, H, I , or J in Table 5.
120. The computer program of any of paragraphs 108-113, wherein the features comprise the features in set A in Table 5.
121. The computer program of any of paragraphs 108-120, wherein the features comprise up to 200,150, 100, 50, or 20 features.
122. The computer program of any of paragraphs 108-121 , wherein the sample comprises, consists, or consists essentially of epidermis cells and/or dermis cells.
123. The computer program of any of paragraphs 108-122, wherein the subject is phototype l-IV on the Fitzpatrick scale.
124. The computer program of any of paragraphs 108-123, wherein the machine learning model is a support vector machine (SVM).
125. The computer program of any of paragraphs 108-124, wherein the machine learning model performs sequential backward selection (SBS).
126. The computer program of any of paragraphs 108-125, wherein the absolute error is about 25 mJ/cm2 or less, about 20 mJ/cm2 or less, about 15 mJ/cm2 or less, about 10 mJ/cm2 or less, or about 5 mJ/cm2 or less.
127. A computer-readable medium comprising the computer program of any of paragraphs 108-126.
[81] It should be noted that the present invention may comprise any combination of features and/to limitations referred to herein, except for combinations of such features which are mutually exclusive. The foregoing description is directed to particular embodiments of the present invention for the purpose of illustrating it. It will be apparent, however, to one skilled in the art, that many modifications and variations to the embodiments described herein are possible. All such modifications and variations are intended to be within the scope of the present invention, as defined in the appended claims.
EXAMPLES
Methods
MED determination using the known, standardised method
[82] MED was determined as previously described (Heckman et al (2013) Minimal Erythema Dose (MED) Testing, J Vis Exp. (75): 50175) and outlined below. This method has been standardised as International Standard ISO 2444:2010.
[83] The study sites were located on the subjects' lower backs since this area is rarely exposed to sunlight. The sites were split into control and test areas.
[84] The first irradiation of the test sites was performed using a SOL 500 full spectrum solar simulator (Honle UV Technology). Intensities were chosen individually to reach 0.9 MED for all subjects (i.e. 90% of the MED in a given test subject).
[85] Irradiation to 0.9 MED was repeated twice more in the same manner, each time 24 hours apart, so that all test sites had been irradiated three times.
Skin sampling
[86] 24 hours after the final irradiation session of each subject, four suction blisters of 7 mm diameter were taken from both test sites (irradiated) and control sites (not irradiated), as previously described (Sudel et al. (2003) Tight control of matrix metalloproteinase-1 activity in human skin. Photochem Photobiol. 78: 355-60).
[87] By taking samples from both test and control sites, the inventors enabled the identification of markers that allow reliable determination of UV sensitivity via MED prediction, regardless of recent UV exposure.
[88] A single sample of the epidermis from each test site was sufficient to perform transcriptome sequencing and methylation profiling, as described below.
Nucleic acid extraction
[89] Tissue samples were suspended in the respective lysis buffers for RNA or DNA extraction and homogenized using an MM 301 bead mill (Retsch®). DNA was then extracted using the QIAamp® DNA Investigator Kit (Qiagen®) according to manufacturer's instructions. RNA was extracted using the RNeasy® Fibrous Tissue Mini Kit (Qiagen®) according to manufacturer's instructions. RNA and DNA samples were used for transcriptome sequencing and methylation profiling, respectively, as described below. Transcriptome sequencing
[90] Transcriptome libraries were prepared using TruSeq Library Prep Kit (lllumina®) and sequencing performed at 1x50 bp on lllumina's® HiSeq system to a final sequencing depth of 100 million reads per sample.
[91] Sequencing data was processed using a custom pipeline including Fastqc vO.11.767 for quality control, Trimmomatic v0.3668 for trimming and Salmon v0.8.169 for read mapping and quantification.
[92] This method determined the level of 38,892 transcripts.
Methylation profiling
[93] Methylation profiling was performed using lllumina's® Infinium Methylation EPIC arrays (Love et al. (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2, Genome Biol. 15:550).
[94] Methylation data was processed using the minfi package (Aryee (2014) Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays, Bioinformatics 30:1363-1369) in R. Normalization was carried out using the funnorm normalization method.
[95] This method determined the methylation level of over 866,091 human CpG sites throughout the genome with single nucleotide resolution.
Identifying features correlating with MED
[96] Data from test (irradiated) and control (not irradiated) samples were pooled before analysis.
[97] The 2000 most strongly MED-correlated features (genes or CpG sites) of each dataset (from a total of 38,892 genes and 866,091 CpG sites) were selected via absolute Pearson correlation coefficient.
Machine learning models for predicting MED
[98] Sequential backwards selection (also referred to as sequential feature elimination) was used in a support vector model (SVM) to reduce the sets of 2000 predictive features to sets of fewer features, for example 20 features.
[99] Lasso regression models (Tibshirani (1994) Regression Shrinkage and Selection Via the Lasso, J R Stat Soc Ser B. 58:267-288) for the prediction of MED were built in R using the machine learning framework mlr (Bischl et al. (2016) mlr: Machine Learning in R, J Mach Learn Res. 17:1-5). First, models were built using the 2000 most strongly MED-correlated features during the learning process. These were then cut down to produce a model with fewer features, for example 20 features.
[100] The models were then used to predict MED based on subjects' RNA expression levels.
[101] In examples, the parameters of the support vector models were as follows:
• Kernel: Radial-basis function
• Gamma: 0.05
• Cost: 1
• Epsilon: 0.1
[102] Parameters to be set for SBS are the maximum number of features to retain and a threshold parameter determining the minimal value of improvement needed for a feature to be eliminated from the model. The inventors' models did not use the first parameter, but the second parameter was set to 0.01.
[103] Model predictions and accuracy scores were extracted from leave-one-out-cross- validation (LOOCV) to avoid overfitting.
[104] The feature selection and training included data from both test and control (irradiated and non-irradiated) samples, in order to allow accurate MED predictions irrespective of previous UV exposure of the sample.
EXAMPLE 1 - MED determination using known methods
[105] This study recruited 32 healthy female subjects belonging to Fitzpatrick phototypes I to IV (12 subjects belonging to phototype I and II, 10 to phototype III; and 10 to phototype IV). The subjects were aged between 30 and 65 years, with homogeneous age distributions in each phototype group.
[106] Possible recruits were excluded is they had tattoos or scars in the area of skin to be exposed to UV radiation, if they had pigmentation disorders, if they were pregnant, or if they had taken anti-histamine or anti-inflammatory medication within two weeks prior to study start.
[107] The MED of each of the 32 subjects was determined using the known, standardised method described above.
[108] MED values ranged from approximately 50 mJ/cm2 to approximately 210 mJ/cm2. As expected from previously published data, stratification of donors using the Fitzpatrick classification system was an inaccurate predictor of MED. For example, the measured MED values for the subjects of phototype IV varied from 99.7 to 210.4 mJ/cm2.
EXAMPLE 2 - MED prediction using RNA expression levels
Example 2a - 200 predictive genes
[109] Skin samples were taken from the 32 subjects and the RNA extracted. The transcriptomes of each subject were then sequenced and the data processed, as described above.
[110] From these data, the inventors identified the 200 genes whose RNA expression levels each exhibited a strong linear correlation with the MED determined by the known, standardised method (used in Example 1). These 200 genes are provided in Table 1.
Example 2b - predicting MED using 200 genes
[111] A machine learning model was trained on data comprising (i) the MEDs and (ii) the corresponding RNA expression levels of the 200 genes in Table 1, from all 32 subjects. This model was then used to predict MED based on RNA expression levels of the 200 genes in Table 1 in skin samples obtained from subjects.
[112] Using the set of 200 genes in Table 1, MAEs of 16-18 mJ/cm2 were consistently achieved.
Example 2c - predicting MED using 150, 100, or 50 genes
[113] The inventors then tested how accurately MED could be predicted using models trained on smaller sets of genes selected from the 200 genes in Table 1.
[114] When sets of 150 genes selected from Table 1 were used, MAEs of approximately 13-15 mJ/cm2 were achieved.
[115] When sets of 100 genes selected from Table 1 were used, MAEs of approximately 11-13 mJ/cm2 were achieved.
[116] When sets of 50 genes selected from Table 1 were used, MAEs of approximately 9-11 mJ/cm2 were achieved.
Example 2d - predicting MED using 18-20 genes
[117] High accuracies (low MAEs) were achieved by using sets of 18-20 genes selected from Table 1.
[118] When sets of 18-20 genes selected from Table 1 were used, MAEs of approximately 8-13 mJ/cm2 were achieved.
[119] For example, the set of 20 genes in set A in T able 2 achieved a very low MAE of 7.85 mJ/cm2 (see Figure 1).
Example 2e - predicting MED using 10 genes
[120] When sets of 10 genes selected from Table 1 were used, MAEs of approximately 10-15 mJ/cm2 were achieved.
[121] For example, the set of 10 genes ENSG00000197978, ENSG00000277060,
ENSG00000100376, ENSG00000166670, ENSG00000172799, ENSG00000106392,
ENSG00000112139, ENSG00000130779, ENSG00000159247, and ENSG00000260075 achieved an MAE of 9.71 mJ/cm2.
Example 2f - predicting MED using 5 genes
[122] When sets of 5 genes selected from Table 1 were used, MAEs of approximately 13-20 mJ/cm2 were achieved.
[123] For example, the set of 5 genes ENSG00000197978 (GOLGA6L9),
ENSG00000277060 (NLRP2), ENSG00000100376 (FAM118A), ENSG00000166670
(MMP10), and ENSG00000172799, which achieved an MAE of 12.76 mJ/cm2.
Example 2g - predicting MED using 2 genes
[124] Surprisingly and advantageously, as few as 2 genes selected from Table 1 could be used to accurately predict MAE. When sets of 2 genes were used, MAEs of approximately 20-25 mJ/cm2 were achieved.
[125] For example, the set of 2 genes ENSG00000197978 and ENSG00000277060 achieved an MAE of 19.62 mJ/cm2.
EXAMPLE 3 - MED prediction using methylation levels
Example 3a - 200 predictive CpG sites
[126] Skin samples were taken from the 32 subjects and the DNA extracted. DNA methylation profiling was performed and the data processed, as described above.
[127] From these data, the inventors identified the 200 CpG sites whose methylation levels each exhibited a strong linear correlation with the MED determined by the known, standardised method (used in Example 1). These 200 CpG sites are provided in Table 3.
Example 3b - predicting MED using 200 CpG sites [128] A machine learning model was trained on data comprising (i) the MEDs and (ii) the corresponding methylation levels of the 200 CpG sites in Table 3, from all 32 subjects. This model was then used to predict MED based on methylation levels of the 200 CpG sites in Table 3 in skin samples obtained from subjects.
[129] Using the 200 CpG sites in Table 3, MAEs of approximately 9-11 mJ/cm2 were consistently achieved.
Example 3c - predicting MED using 150, 100, or 50 CpG sites
[130] The inventors then tested how accurately MED could be predicted using models trained on smaller sets of CpG sites selected from the 200 CpG sites in Table 3.
[131] When sets of 150 CpG sites selected from Table 3 were used, MAEs of approximately 7-8 mJ/cm2 were achieved.
[132] When sets of 100 CpG sites selected from Table 3 were used, MAEs of approximately 6-7 mJ/cm2 were achieved.
[133] When sets of 50 CpG sites selected from Table 3 were used, MAEs of approximately 4-6 mJ/cm2 were achieved. Example 3d - predicting MED using 18-20 CpG sites
[134] High accuracies (low MAEs) were achieved by using sets of 18-20 CpGs selected from Table 1.
[135] When sets of 18-20 CpGs were used, MAEs of approximately 4-7 mJ/cm2 were achieved. [136] For example, the set of 18 CpG sites in set A in T able 4 achieved a very low MAE of only 4.18 mJ/cm2 (see Figure 2).
Example 3e - predicting MED using 10 CpG sites
[137] When sets of 10 CpG sites selected from Table 3 were used, MAEs of approximately 7-12 mJ/cm2 were achieved. [138] For example, the set of 10 CpG sites cg06376130, cg03953789, cg00613587, cg20707157, cg09218398, cg26096304, cg09174638, cg00492074, cg20271602, and cg22235661 (described further in Table 3) achieved an MAE of 6.81 mJ/cm2.
Example 3f - predicting MED using 5 CpG sites
[139] When sets of 5 CpG sires selected from Table 3 were used, MAEs of approximately 11-15 mJ/cm2 were achieved.
[140] For example, the set of 5 CpG sites cg06376130, cg03953789, cg18269134, cg00613587, and cg01199135 (described further in Table 3) achieved an MAE of 11.01 mJ/cm2.
Example 3f - predicting MED using 2 CpG sites
[141] Surprisingly and advantageously, as few as 2 CpG sites selected from Table 3 could be used to accurately predict MAE. When sets of 2 CpG sites were used, MAEs of approximately 15-21 mJ/cm2 were achieved.
[142] For example, the set of 2 CpG sites cg06376130 and cg03953789 (described further in Table 3) achieved an MAE of 15.58 mJ/cm2.
Claims
1) A method for predicting the minimal erythema dose (MED) of a subject's skin, the method comprising: a) i) determining the methylation levels of at least 2 CpG sites selected from the genes in Table 3 in a skin sample obtained from the subject, and ii) predicting the subject's MED based on the determined methylation levels using a machine learning model trained on data comprising known MEDs and corresponding known methylation levels of the at least 2 CpG sites; b) i) determining the RNA expression levels of at least 2 genes selected from the genes in Table 1 in a skin sample obtained from the subject, and ii) predicting the subject's MED based on the determined RNA expression levels using a machine learning model trained on data comprising known MEDs and corresponding known RNA expression levels of the at least 2 genes; and/or c) i) determining the methylation level(s) and RNA expression level(s) of at least 2 features selected from the CpG sites in Table 3 and the genes in Table 1 in a skin sample obtained from the subject, wherein the features comprise at least one CpG site in Table 3 and at least one gene in Table 1, and ii) predicting the subject's MED based on the determined methylation level(s) and RNA expression level(s) using a machine learning model trained on data comprising known MEDs and corresponding known methylation level(s) and RNA expression level(s) of the at least 2 features.
2) The method of claim 1, wherein: a) the at least 2 CpG sites comprise at least 5, 10, or 18 CpG sites; b) the at least 2 genes comprise at least 5, 10, or 18 genes; and/or c) the at least 2 features comprise at least 5, 10, or 18 features.
3) The method claim 1 or 2, wherein: a) the CpG sites are selected from the CpG sites in Table 4; b) the genes are selected from the genes in Table 2; and/or c) the features are selected from the CpG sites in Table 4 and the genes in Table 2.
4) The method claim 3, wherein: a) the CpG sites are selected from the CpG sites in set A, B, C, D, E, F, G, H, I, J,
K, L, M, N, O, P, Q, or R in Table 4;
b) the genes are selected from the genes in set A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, or R in Table 2; and/or c) the features are selected from the features in set A, B, C, D, E, F, G, H, I, or J in Table 5.
5) The method of claim 4, wherein: a) the CpG sites are selected from the CpG sites in set A in Table 4; b) the genes are selected from the genes in set A in Table 2; and/or c) the features are selected from the features in set A in Table 5.
6) The method of any of claims 1-5, wherein : a) the CpG sites comprise: i) cg06376130 and cg03953789; ii) cg18269134 and cg00613587; iii) cg06376130 and cg01199135; iv) cg03953789 and cg00492074; or v) cg06376130 cg18269134; b) the genes comprise: i) ENSG00000197978 and ENSG00000277060; ii) ENSG00000197978 and ENSG00000100376; iii) ENSG00000197978 and ENSG00000172799; iv) ENSG00000197978 and ENSG00000166670; or v) ENSG00000172799 and ENSG00000159247; and/or c) the features comprise: i) ENSG00000197978 and cg06376130; ii) ENSG00000172799 and cg00613587; iii) ENSG00000197978 and cg03953789; iv) ENSG00000277060 and cg06376130; or v) ENSG00000166670 and cg03953789.
7) The method of any of claims 1-5, wherein: a) the CpG sites comprise: i) cg06376130, cg03953789, cg18269134, cg00613587, and cg01199135; ii) cg06376130, cg18269134, cg00613587, cg01199135, and cg00492074; iii) cg03953789, cg00613587, cg01199135, cg00492074, and cg20271602; iv) cg03953789, cg18269134, cg01199135, cg20707157, and cg22235661; or
v) cg06376130, cg03953789, cg01199135, cg00492074, and cg10094916; b) the genes comprise: i) ENSG00000197978, ENSG00000277060, ENSG00000100376, ENSG00000166670, and ENSG00000172799; ii) ENSG00000197978, ENSG00000159247, ENSG00000100376, ENSG00000166670, and ENSG00000172799; iii) ENSG00000277060, ENSG00000100376, ENSG00000166670, ENSG00000172799, and ENSG00000159247; iv) ENSG00000197978, ENSG00000100376, ENSG00000166670, ENSG00000106392, and ENSG00000159247; or v) ENSG00000197978, ENSG00000100376, ENSG00000172799, ENSG00000159247, and ENSG00000260075; and/or c) the features comprise: i) ENSG00000197978, ENSG00000277060, cg06376130, cg18269134, and cg00613587; ii) ENSG00000197978, cg06376130, cg00613587, cg03953789, and cg00492074; iii) ENSG00000277060, cg03953789, cg18269134, cg01199135, and cg00492074; iv) ENSG00000159247, ENSG00000100376, ENSG00000172799, cg06376130, and cg18269134; or v) ENSG00000100376, cg06376130, cg18269134, cg00613587, and cg01199135.
8) The method of any of claims 1-5, wherein: a) the CpG sites comprise: i) cg06376130, cg03953789, cg00613587, cg20707157, cg09218398, cg26096304, cg09174638, cg00492074, cg20271602, and cg22235661; ii) cg03953789, cg18269134, cg00613587, cg01199135, cg20707157, cg00492074, cg20271602, cg22235661, cg24688871, and cg10094916; iii) cg06376130, cg03953789, cg01199135, cg20707157, cg09218398, cg00492074, cg20271602, cg22235661, cg17972013, and cg15224600; or iv) cg18269134, cg00613587, cg01199135, cg20707157, cg00492074, cg20271602, cg22235661 , cg24688871, and cg10094916; b) the genes comprise:
i) ENSG00000197978, ENSG00000277060, ENSG00000100376, ENSG00000166670, ENSG00000172799, ENSG00000106392, ENSG00000112139, ENSG00000130779, ENSG00000159247, and ENSG00000260075; ii) ENSG00000197978, ENSG00000277060, ENSG00000166670, ENSG00000172799, ENSG00000106392, ENSG00000159247, ENSG00000100376, ENSG00000260075, ENSG00000188818, and ENSG00000169282; iii) ENSG00000166670, ENSG00000172799, ENSG00000106392, ENSG00000100376, ENSG00000260075, ENSG00000188818, ENSG00000115602, ENSG00000112139, ENSG00000224472, and ENSG00000233913; or iv) ENSG00000277060, ENSG00000166670, ENSG00000106392, ENSG00000159247, ENSG00000100376, ENSG00000260075, ENSG00000169282, ENSG00000126890, ENSG00000176933, and ENSG00000134184; and/or c) the features comprise: i) ENSG00000197978, ENSG00000277060, ENSG00000100376, ENSG00000166670, ENSG00000172799, cg03953789, cg18269134, cg00613587, cg09218398, and cg10094916; ii) ENSG00000197978, ENSG00000166670, cg06376130, cg00613587, cg26096304, cg20707157, cg20271602, cg22235661 , cg24688871, and cg09174638; iii) ENSG00000277060, cg06376130, cg03953789, cg18269134, cg01199135, cg20707157, cg09218398, cg00492074, cg20271602, and cg22235661; or iv) ENSG00000197978, ENSG00000277060, ENSG00000172799, ENSG00000159247, cg03953789, cg00613587, cg01199135, cg26096304, cg00492074, and cg10094916.
9) The method of any of claims 1-5, wherein: a) the CpG sites comprise the CpG sites provided in set A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, or R in Table 4; b) the genes comprise the genes provided in set A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, or R in Table 2; and/or c) the features comprise the features provided in set A, B, C, D, E, F, G, H, I, or J in Table 5.
10) The method of any of claims 1-5, wherein: a) the CpG sites comprise the CpG sites provided in set A in T able 4; b) the genes comprise the genes provided in set A in Table 2; and/or c) the features comprise the features provided in set A in Table 5.
11) The method of any preceding claim, wherein: a) the CpG sites comprise up to 200, 100, 50, 20, or 18 CpG sites; b) the genes comprise up to 200, 100, 50, or 20 genes; and/or c) the features comprise up to 200, 100, 50, or 20 features.
12) The method of any preceding claim, wherein the sample comprises, consists, or consists essentially of epidermis cells and/or dermis cells.
13) The method of any preceding claim, wherein the subject is phototype l-IV on the Fitzpatrick scale.
14) The method of any preceding claim, wherein the machine learning model is a support vector machine (SVM).
15) The method of any preceding claim, wherein the machine learning model performs sequential backward selection (SBS).
16) The method of any preceding claim, wherein the absolute error is about 25 mJ/cm2 or less, about 20 mJ/cm2 or less, about 15 mJ/cm2 or less, about 10 mJ/cm2 or less, or about 5 mJ/cm2 or less.
17) The method of any preceding claim, further comprising: a) determining the maximum dose of UV radiation the subject can be exposed to before experiencing negative effects thereof; b) determining an appropriate UV protection substance or strategy for the subject; c) determining the minimum dose of UV radiation the subject should be exposed to in order to experience positive effects thereof; and/or d) determining the appropriate UV dose for treating a disease or condition susceptible to UV therapy in the subject.
18) A computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out the following steps: a) i) inputting the methylation levels of at least two CpG sites selected from the genes in Table 3 in a skin sample obtained from the subject, ii) predicting the subject's MED based on the determined methylation levels using a machine learning model trained on data comprising known MEDs and corresponding known methylation levels of the at least two CpG sites, and iii) outputting the predicted MED; b) i) inputting the RNA expression levels of at least two genes selected from the genes in Table 1 in a skin sample obtained from the subject, ii) predicting the subject's MED based on the determined RNA expression levels using a machine learning model trained on data comprising known MEDs and corresponding known RNA expression levels of the at least two genes, and iii) outputting the predicted MED; and/or d) i) inputting the methylation level(s) and RNA expression level(s) of at least 2 features selected from the CpG sites in Table 3 and the genes in Table 1 in a skin sample obtained from the subject, wherein the features comprise at least one CpG site in T able 3 and at least one gene in T able 1 , ii) predicting the subject's MED based on the determined methylation level(s) and RNA expression level(s) using a machine learning model trained on data comprising known MEDs and corresponding known methylation level(s) and RNA expression level(s) of the at least 2 features, and iii) outputting the predicted MED.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP19206611.6 | 2019-10-31 | ||
EP19206611.6A EP3816301A1 (en) | 2019-10-31 | 2019-10-31 | Determining the sensitivity of skin to uv radiation |
EP20153231.4 | 2020-01-22 | ||
EP20153231 | 2020-01-22 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021083572A1 true WO2021083572A1 (en) | 2021-05-06 |
Family
ID=72322474
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2020/075047 WO2021083572A1 (en) | 2019-10-31 | 2020-09-08 | Determining the sensitivity of skin to uv radiation |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2021083572A1 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020022868A1 (en) * | 2000-07-03 | 2002-02-21 | Egbert Lenderink | Method of optimizing the use of a tanning-related device, device for performing such a method, and tanning-related device |
US20090096977A1 (en) * | 2003-10-15 | 2009-04-16 | Song Jang-Kun | Liquid crystal display |
US20130065781A1 (en) * | 2010-03-01 | 2013-03-14 | Atsushi Terunuma | Gene sets for detection of ultraviolet a exposure and methods of use thereof |
-
2020
- 2020-09-08 WO PCT/EP2020/075047 patent/WO2021083572A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020022868A1 (en) * | 2000-07-03 | 2002-02-21 | Egbert Lenderink | Method of optimizing the use of a tanning-related device, device for performing such a method, and tanning-related device |
US20090096977A1 (en) * | 2003-10-15 | 2009-04-16 | Song Jang-Kun | Liquid crystal display |
US20130065781A1 (en) * | 2010-03-01 | 2013-03-14 | Atsushi Terunuma | Gene sets for detection of ultraviolet a exposure and methods of use thereof |
Non-Patent Citations (10)
Title |
---|
ARYEE: "Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays", BIOINFORMATICS, vol. 30, 2014, pages 1363 - 1369 |
BISCHL ET AL.: "mlr: Machine Learning in R", J MACH LEARN RES., vol. 17, 2016, pages 1 - 5 |
CAROLYN J. HECKMAN ET AL: "Minimal Erythema Dose (MED) Testing", JOURNAL OF VISUALIZED EXPERIMENTS, no. 75, 28 May 2013 (2013-05-28), XP055681001, DOI: 10.3791/50175 * |
EILERS ET AL.: "Accuracy of Self-report in Assessing Fitzpatrick Skin Phototypes I through VI", JAMA DERMATOL, vol. 149, no. 11, 2013, pages 1289 - 1294 |
FITZPATRICK: "Soleil et peau", J MED ESTHET., vol. 2, 1975, pages 33 - 34 |
HECKMAN ET AL.: "Minimal Erythema Dose (MED) Testing", J VIS EXP., vol. 75, 2013, pages 50175 |
LOVE ET AL.: "Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2", GENOME BIOL., vol. 15, 2014, pages 550, XP021210395, DOI: 10.1186/s13059-014-0550-8 |
SANTOSH K. KATIYAR ET AL: "Epigenetic Alterations in Ultraviolet Radiation-Induced Skin Carcinogenesis: Interaction of Bioactive Dietary Components on Epigenetic Targets+", PHOTOCHEMISTRY AND PHOTOBIOLOGY, vol. 88, no. 5, 17 November 2011 (2011-11-17), US, pages 1066 - 1074, XP055681478, ISSN: 0031-8655, DOI: 10.1111/j.1751-1097.2011.01020.x * |
SUDEL ET AL.: "Tight control of matrix metalloproteinase-1 activity in human skin", PHOTOCHEM PHOTOBIOL., vol. 78, 2003, pages 355 - 60, XP009027198, DOI: 10.1562/0031-8655(2003)078<0355:TCOMMA>2.0.CO;2 |
TIBSHIRANI: "Regression Shrinkage and Selection Via the Lasso", J R STAT SOC SER B., vol. 58, 1994, pages 267 - 288 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110257494B (en) | Method and system for obtaining individual ages of Chinese population and amplification detection system | |
Alonso et al. | Complex signatures of selection for the melanogenic loci TYR, TYRP1 and DCT in humans | |
EP2542691A1 (en) | Gene sets for detection ultraviolet a exposure and methods of use thereof | |
Lee et al. | Epigenetic age signatures in bones | |
Shin et al. | GWAS analysis of 17,019 Korean women identifies the variants associated with facial pigmented spots | |
WO2020191413A1 (en) | De novo compartment deconvolution and weight estimation of tumor tissue samples using decoder | |
US20200115754A1 (en) | Age determination of human individual | |
López et al. | Comparison of the transcriptional profiles of melanocytes from dark and light skinned individuals under basal conditions and following ultraviolet-B irradiation | |
WO2021083572A1 (en) | Determining the sensitivity of skin to uv radiation | |
Taylor et al. | Dynamic and physical clustering of gene expression during epidermal barrier formation in differentiating keratinocytes | |
CN114214323A (en) | hsa-miR-23c and application thereof as molecular marker for early diagnosis of radiation damage | |
WO2021148200A1 (en) | Classifying subjects based on their biological response to uv irradiation | |
EP3816301A1 (en) | Determining the sensitivity of skin to uv radiation | |
CN107067017B (en) | Burn depth prediction system based on near infrared spectrum of CAGA and SVM | |
US20180276337A1 (en) | Method for identifying radiation induced genes and long non-coding RNAs and Application Thereof | |
Zhao et al. | Critical genes in human photoaged skin identified using weighted gene co-expression network analysis | |
EP4461823A1 (en) | Wrinkle determination of a human individual | |
WO2024231477A1 (en) | Wrinkle determination of a human individual | |
Haluza et al. | Axolotl epigenetic clocks offer insights into the nature of negligible senescence | |
CN115612743B (en) | HPV integration gene combination and application thereof in prediction of cervical cancer recurrence and metastasis | |
WO2022258310A1 (en) | Epigenetic method to detect uvr-induced skin damage | |
Ustyantsev et al. | A study of genes controlling carcinogenesis in a regenerative model flatworm Macrostomum lignano | |
Livraghi | Identification of methylation signatures to assess homologous recombination deficiency in breast cancer | |
Ghai | Forensic applications of epigenetic (DNA methylation) markers through NGS | |
Watanabe et al. | Application of an ordered subset analysis approach to the genetics of alcoholism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20765048 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20765048 Country of ref document: EP Kind code of ref document: A1 |