8000 GitHub - sterling-engle/snpcrypt: Securing Identification SNPs, Masked Short Reads, Phenotypes, and Clinical Data using Cryptography
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

sterling-engle/snpcrypt

Repository files navigation

snpcrypt

                 UNIVERSITY OF CALIFORNIA LOS ANGELES
                    DEPARTMENT OF COMPUTER SCIENCE

Securing Masked Short Reads, Identification SNPs, Phenotypes, and Clinical Data using Cryptography

                         Student: Sterling Engle

  Capstone Project for: MSCS - Faculty Advisor: Prof. Sriram Sankararaman
                         Date: February 19, 2022

Abstract---Many people are understandably concerned about protecting their genomic privacy. For this reason, they are unwilling to donate their genomes for genomic research. Huang et al discovered 116 identification (ID) SNPs (Single Nucleotide Polymorphisms) on the human genome that can be used to uniquely identify every individual. My snpcrypt Python program extracts and encrypts these ID SNPs and can return a subset of the sample genotypes. It encrypts masked extracted short reads of raw genomic data. It can encrypt any file containing phenotypes or clinical data. All files are encrypted and decrypted using a symmetric key that is protected by asymmetric RSA public key cryptography. Multiple RSA public keys are supported for all file types for situations where more than one individual needs to access the same file. Encryption enables this extremely sensitive data to be securely stored in a genomic databank and securely transmitted to researchers and clinicians. I validated ID SNP extraction, optional sample selection, and encryption using a 2,504-sample chromosome 21 variant call format (VCF) file from phase 3 of the Human Genome Project. I tested masked genomic sequence short read extraction and encryption with an indexed binary alignment map (BAM) file of one of the phase 1 human DNA samples. I verified error-free parsing of SAM and VCF versions 4.1, 4.2, and 4.3 files using scripts to run snpcrypt against all the "passed" test files in the Samtools hts-specs repository. Synpcrypt.py source code is available in my public code repository: https://github.com/sterling-engle/snpcrypt.

About

Securing Identification SNPs, Masked Short Reads, Phenotypes, and Clinical Data using Cryptography

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0