CN110504005A - Data processing method - Google Patents
Data processing method Download PDFInfo
- Publication number
- CN110504005A CN110504005A CN201910795698.3A CN201910795698A CN110504005A CN 110504005 A CN110504005 A CN 110504005A CN 201910795698 A CN201910795698 A CN 201910795698A CN 110504005 A CN110504005 A CN 110504005A
- Authority
- CN
- China
- Prior art keywords
- data
- cell
- gene
- screening
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
- G16B25/10—Gene or protein expression profiling; Expression-ratio estimation or normalisation
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Landscapes
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Genetics & Genomics (AREA)
- Theoretical Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Biotechnology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Epidemiology (AREA)
- Software Systems (AREA)
- Public Health (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Bioethics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Molecular Biology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Present invention discloses a kind of data processing method, the processing method of the lower machine data in ICB-scSeq technology is related in particular to.Using the data processing method, can at low cost, simply, rapidly carry out cell sequencing, there is sizable economic benefit and safety benefit.
Description
Technical field
The present invention relates to a kind of data processing methods.It particularly relates to ICB-scSeq (Intelligent
Combinatorial Barcoding-single cell Sequencing, the unicellular sequencing of intelligences combination bar code method) skill
The processing method of lower machine data in art.
Background technique
In past 10 years, as (next generation sequencing, NGS) technology and third is sequenced in the second generation
The rapid development of generation sequencing (third generation sequencing, TGS) technology, causes the huge of life science
It changes.Previous research needs to obtain enough nucleic acid from a large amount of cells and is sequenced, therefore sequencing result often indicates
Be cell colony characterization, and the exclusive cell characteristics of individual cells are often ignored.In order to solve above-mentioned confinement problems,
Unicellular sequencing technologies come into being.
Unicellular sequencing achieves achievement abundant in fields such as tumour, Developmental Biology, Neuscience.And it is slender
The research of born of the same parents originally more can rapidly expand scientific payoffs, but unicellular sequencing technologies are there is also many problems, than if you need to
Using fresh cell, sample utilisation is not high, expensive equipment and related reagent etc., this research to unicellular sequencing technologies
Bring many inconvenience with popularization, and also have for the extensive development of unicellular life science it is many unfavorable.Therefore optimize
The new unicellular sequencing technologies of exploitation, just seem very urgent.
ICB-scSeq (Intelligent Combinatorial Barcoding-single cell Sequencing,
The unicellular sequencing of intelligences combination bar code method) it is the unicellular sequencing technologies researched and developed by the present inventors, it is a kind of
Based on SPLIT (split-pool ligation-based transcriptome, the transcript profile sequencing based on the connection of segmentation pond)
The method that technology passes through the unicellular sequencing of combination bar code (combinatorial barcoding) labeled RNA origin of cell.
Therefore, because there are above-mentioned technological deficiency, in the sequencing approach of ICB-scSeq, it is also desirable to find a kind of more preferable
The data processing method from original lower machine data to downstream analysis, can at low cost, simply, rapidly carry out cell survey
Sequence.
Summary of the invention
It is an object of the present invention to overcome the deficiencies of existing technologies, providing one kind can at low cost, simply, fastly
The data processing method of cell sequencing is carried out fastly.
To achieve the above object, the following technical solutions are proposed: a kind of data processing method by the present invention, which is characterized in that packet
It includes:
Initial data obtaining step, carry out both-end sequencing and to for the original of the unicellular sequencing of intelligences combination bar code method
Data are obtained, and first end is the part cDNA, and second end is specific molecular label and cell bar code part;
Quality control and filtration step, are filtered acquired initial data and obtain filtered data;
Step is compared, filtered data is compared with reference genome sequence and obtains comparing rear data;
Specific molecular label duplicate removal step goes to the duplicate part of specific molecular label in data after comparison
Remove data after obtaining duplicate removal;
Gene quantification step carries out gene quantification to data after duplicate removal and obtains quantitatively rear data;
Expression matrix construction step constructs expression matrix according to quantitatively rear data, which includes each cell
In each gene original count value;
Cell screening step, Mitochondria content and expressing gene number to expression matrix are screened after obtaining screening
Matrix;
Normalizing steps are standardized the original count value of matrix after screening and obtain normalized matrix;
Analytical procedure analyzes normalized matrix.
Provided data processing method according to the present invention, can at low cost, simply, rapidly carry out cell survey
Sequence has sizable economic benefit and safety benefit.
Detailed description of the invention
Fig. 1 is the schematic diagram of the data processing method of first embodiment of the invention.
Fig. 2 is the schematic diagram of gene order used in the data processing method of Fig. 1.
Fig. 3 is the schematic diagram of the duplicate removal process in the data processing method of Fig. 1.
Fig. 4 is the display diagram of the achievement of the clustering in the data processing method of Fig. 1.
Fig. 5 is the display diagram of the achievement of the access enrichment analysis in the data processing method of Fig. 1.
Fig. 6 is another display diagram of the achievement of the access enrichment analysis in the data processing method of Fig. 1.
Specific embodiment
Below in conjunction with attached drawing of the invention, clear, complete description is carried out to the technical solution of the embodiment of the present invention.
First embodiment of the invention is a kind of data processing method.
Fig. 1 is the schematic diagram of the data processing method of first embodiment of the invention.As shown in Figure 1, the data processing
Method includes: initial data obtaining step, quality control and filtration step, compares step, specific molecular label duplicate removal step
Suddenly, gene quantification step, expression matrix construction step, cell screening step, normalizing steps, analytical procedure.
In initial data obtaining step, carry out both-end sequencing (paired-end sequencing) and to for intelligence
The initial data of the unicellular sequencing (ICB-scSeq) of combination bar code method is obtained, and first end, that is, end read1 is the portion cDNA
Point, second end, that is, end read2 is specific molecular label and the part cell bar code (UMI+cell barcode).CDNA is
Refer to the DNA having with certain RNA chain in complementary base sequence.UMI (Unique Molecular indentifier) is specificity
Molecular label.
In quality control and filtration step, acquired initial data is filtered and obtains filtered data.
In the present embodiment, it schematically illustrates quality control and filtration step includes following sub-step: to acquired original
The cell bar code part of the second end of beginning data is corrected;Construct the white list of cell bar code;It is extracted according to white list
The sequence of first end;The sequence of extracted first end is screened to be filtered and obtain filtered data.But this
Invention is not limited to this, and quality control and filtration step also may include other sub-steps.
Specifically, there are three sections of cell barcode in each read2, be BC1, BC2, BC3 respectively, every section is all
The length (as shown in Figure 2) of 8bp.And the sequence of these barcode is fixed every time.For example, if barcode1 makes
It is combined with 96 kinds, then illustrating that the sequence of barcode1 only has 96 kinds in total, each is 8bp.Therefore according to hamming
Distance (Hamming distance) is equal to 1 calibration principle to be corrected to every read.
In each read, BC1 is extracted, three sections of 8bp sequences of BC2, the position BC3 are used as candidate barcode
Sequence (is labeled as barcode1-new, barcode2-new, barcode3-new).Then successively to barcode1-new with
All sequences in list through the barcode1 determined are compared, and calculate hamming distance, are denoted as hd.If
Hd is equal to 0, then without changing, if hd is equal to 1, the sequence of the barcode1-new is changed to corresponding barcode1's
Sequence.To complete the correction course of barcode sequence.
After the correction for completing cell barcode, cell barcode sequence is closed according to the cell number estimated
And as the unique identification of a cell (cell UID), the white list of a cell barcode is constructed.In this white list
List inside, be the UID of all cells that can be identified.
The white list of the cell barcode built up according to previous step extracts the cDNA sequence inside the end read1.
To the sequence of any one read1, if the cell UID inside read2 corresponding to it is in cell barcode white list
Face, then this read1 will be extracted.Open-Source Tools umi-tools can be used in building white list and abstraction sequence
It is handled.
After having extracted read1 sequence, it is also necessary to further be screened to sequence, mainly remove the polyA structure at end
The low quality value at (being shown below) and sequence both ends.In following formula, upper row is original series, below a behavior removal
The sequence of the low quality value of the polyA structure and sequence both ends at end.
In comparing step, the read1 sequence obtained according to above-mentioned screening is compared with the sequence of reference genome,
The comparison, which can be used, compares software STAR to carry out.
According to comparison as a result, having obtained the bam file for having already passed through sequence.The sequence root that each is compared
It is annotated according to the GTF file of reference genome, that is, carries out the specified of gene.Purpose is the sequence on clear each compares
Which gene column belong to after the annotation of GTF file.This, which is specified, can be used Open-Source Tools featureCounts to complete.
In the specific molecular label duplicate removal step, according to previous step as a result, it has been found that each compares
Which gene read belongs to.Because PCR-bias when ICB-scSeq continues library after sample in order to eliminate and in every sequence
In introduce the UMI sequence of one section of 10bp long.In this way, if there are two identical sequences within the scope of the same gene
And if the 10bp of the UMI of sequence is also identical, it is considered that this two read are from same cDNA points
Son needs duplicate removal.As shown in figure 3, show five read on the left of Fig. 3, but in this five read, above three
Read be it is duplicate, below two read be also duplicate, therefore after duplicate removal, the read on right side only has two.
In gene quantification step, gene quantification is carried out to data after duplicate removal and obtains quantitatively rear data.
In expression matrix construction step, expression matrix is constructed according to quantitatively rear data, which includes each
The original count value (raw counts) of each gene in cell.In this matrix, each column represent a cell
UID, every a line represent the ID of a gene, as shown in the table.
In cell screening step, Mitochondria content and expressing gene number to expression matrix are screened and are sieved
Matrix after choosing.Specifically, the data of each cell inside expression matrix are calculated, calculates all of chondriogen
The ratio of expression value just screens out this cell if this ratio is more than the threshold value of setting.Threshold value is, for example, 5%, but
It is to be not limited to this, also can be set to other threshold values.Further need exist for the number to expressing gene in cell each in expression matrix
Amount is screened, and general screening criteria is, for example, that the number of minimum expression is 200, and most highly expressed number is 2500, still
It is not limited to this, also can be set to other ranges.Seurat can be used to carry out in screening step.By screening twice, obtain
Expression matrix after one screening, can carry out the processing of next step.
In normalizing steps, the standard of obtaining is standardized to the original count value of the expression matrix after screening
Change matrix.Since in unicellular sequencing procedure, the number that each cell measures reads is inhomogenous, in order to eliminating because
Quantitative error caused by depth is sequenced, needs to be standardized raw counts.Normalizing steps can be used
Seurat is carried out, and standardized calculation formula is as follows:
The raw counts, AllCount that wherein CountOfGene represents each gene in each cell represent each thin
The sum of the raw counts of all genes in born of the same parents.
In analytical procedure, normalized matrix is analyzed.In the present embodiment, analytical procedure is in cell level
For clustering step, analytical procedure is enriched with analytical procedure in gene level for variance analysis step and access, but simultaneously
It is without being limited thereto, it is also possible to other suitable analytical procedures.
The analysis method of clustering is as follows.
First carry out feature extraction, to it is all measure it is unicellular carry out cluster sub-clustering analysis.First to the table after standardization
The feature that high variation is calculated up to matrix, comes out these feature extractions and carries out subsequent analysis.
Then sized analysis is carried out to matrix data, in order to be eliminated as much as some data sources error (including
Technical error, the error of batch error and some biological origins), recurrence processing is carried out to matrix data, excludes these
Error, to improve the effect of subsequent dimensionality reduction and cluster.
Then linear Dimension Reduction Analysis is carried out, is utilized PCA (principal component analysis, principal component analysis)
Method to have been subjected to sized analysis data carry out Dimension Reduction Analysis.
Then cluster grouping analysis is carried out, according to the PC (principal component) for the conspicuousness that previous step identifies, using based on figure
The clustering method of shape.This method is calculated according to KNN (K-nearest neighbor, the K arest neighbors) figure and Louvain of building
Method clusters to be made iteratively, and finally all cells is gathered inside different monoids.Above analytic process can be used
Seurat is carried out.
The displaying of UMAP two dimension is finally carried out, as shown in figure 4, according to previous step cluster as a result, using UMAP (uniform
Manifold approximation and projection, uniform manifold is approximate and projection) method carry out two-dimentional displaying.
Seurat can be used to be analyzed in the methods of exhibiting.
The analysis method of clustering is as follows.
According to cluster as a result, using Wilcoxon rank sum test (Wilcoxen order to all cluster
And examine) method carry out differential gene screening analysis, obtain the column of a difference expression gene about all cluster
Table.As a result as shown in the table.
The analysis method of access enrichment analysis is as follows.
The enrichment analysis of the first access is GO (Gene Ontology, Gene Ontology) enrichment analysis.As shown in figure 5, root
According to previous step differential gene as a result, carrying out GO enrichment analysis to the differential gene of each cluster.
In addition to GO be enriched with analyze, moreover it is possible to carry out KEGG (Kyoto Encyclopedia of Genes and Genomes,
Capital of a country gene and genomic encyclopedia) access enrichment analysis, as shown in fig. 6, identifying the difference base inside each cluster
It because of conspicuousness is enriched to inside which access, bubble diagram has been used to be shown.
As described above, using the data processing method of first embodiment, can at low cost, simply, rapidly into
The sequencing of row cell, has sizable economic benefit and safety benefit.
It should be noted that each unit mentioned in each equipment embodiment of the present invention is all logic unit, physically,
One logic unit can be a physical unit, be also possible to a part of a physical unit, can also be with multiple physics
The combination of unit realizes that the Physical realization of these logic units itself is not most important, these logic units institute reality
The combination of existing function is only the key for solving technical problem proposed by the invention.In addition, in order to protrude innovation of the invention
Part, there is no the technical problem relationship proposed by the invention with solution is less close for the above-mentioned each equipment embodiment of the present invention
Unit introduce, this does not indicate above equipment embodiment and there is no other units.
It should be noted that in the claim and specification of this patent, such as first and second or the like relationship
Term is only used to distinguish one entity or operation from another entity or operation, without necessarily requiring or implying
There are any actual relationship or orders between these entities or operation.Moreover, the terms "include", "comprise" or its
Any other variant is intended to non-exclusive inclusion so that include the process, methods of a series of elements, article or
Equipment not only includes those elements, but also including other elements that are not explicitly listed, or further include for this process,
Method, article or the intrinsic element of equipment.In the absence of more restrictions, being wanted by what sentence " including one " limited
Element, it is not excluded that there is also other identical elements in the process, method, article or apparatus that includes the element.
Although being shown and described to the present invention by referring to some of the preferred embodiment of the invention,
It will be understood by those skilled in the art that can to it, various changes can be made in the form and details, without departing from this hair
Bright spirit and scope.
Claims (7)
1. a kind of data processing method characterized by comprising
Initial data obtaining step, carry out both-end sequencing and to the initial data for the unicellular sequencing of intelligences combination bar code method
It is obtained, first end is the part cDNA, and second end is specific molecular label and cell bar code part;
Quality control and filtration step, are filtered acquired initial data and obtain filtered data;
Step is compared, filtered data is compared with reference genome sequence and obtains comparing rear data;
Specific molecular label duplicate removal step, the duplicate part of specific molecular label in data after comparison is removed and
Obtain data after duplicate removal;
Gene quantification step carries out gene quantification to data after duplicate removal and obtains quantitatively rear data;
Expression matrix construction step constructs expression matrix according to quantitatively rear data, which includes in each cell
The original count value of each gene;
Cell screening step, square after Mitochondria content and expressing gene number to expression matrix are screened and screened
Battle array;
Normalizing steps are standardized the original count value of matrix after screening and obtain normalized matrix;
Analytical procedure analyzes normalized matrix.
2. data analysing method according to claim 1, which is characterized in that
Quality control and filtration step include following sub-step:
The cell bar code part of the second end of acquired initial data is corrected;
Construct the white list of cell bar code;
The sequence of first end is extracted according to white list;
The sequence of extracted first end is screened to be filtered and obtain filtered data.
3. data analysing method according to claim 1, which is characterized in that
In cell screening step, the threshold value to the screening of Mitochondria content is 5%.
4. data analysing method according to claim 1, which is characterized in that
In cell screening step, the range to the screening of expressing gene number is 200-2500.
5. data analysing method according to claim 1, which is characterized in that
It in normalizing steps, is standardized using following calculating formula, wherein CountOfGene is represented in each cell
The original count value of each gene, AllCount represent the sum of the original count value of all genes in each cell,
6. data analysing method according to claim 1, which is characterized in that
The analytical procedure is clustering step in cell level.
7. data analysing method according to claim 1, which is characterized in that
The analytical procedure is enriched with analytical procedure in gene level for variance analysis step and access.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910795698.3A CN110504005A (en) | 2019-08-27 | 2019-08-27 | Data processing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910795698.3A CN110504005A (en) | 2019-08-27 | 2019-08-27 | Data processing method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110504005A true CN110504005A (en) | 2019-11-26 |
Family
ID=68589756
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910795698.3A Pending CN110504005A (en) | 2019-08-27 | 2019-08-27 | Data processing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110504005A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112270953A (en) * | 2020-10-29 | 2021-01-26 | 哈尔滨因极科技有限公司 | Analysis method, device and equipment based on BD single cell transcriptome sequencing data |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090222506A1 (en) * | 2008-02-29 | 2009-09-03 | Evident Software, Inc. | System and method for metering and analyzing usage and performance data of a virtualized compute and network infrastructure |
CN102952854A (en) * | 2011-08-25 | 2013-03-06 | 深圳华大基因科技有限公司 | Single cell sorting and screening method and device thereof |
CN106156541A (en) * | 2015-03-27 | 2016-11-23 | 深圳华大基因科技有限公司 | The method and apparatus analyzing the immunity difference of individual two class states |
CN107273711A (en) * | 2017-06-22 | 2017-10-20 | 宁波大学 | A kind of shrimp disease quantitative forecasting technique based on enteron aisle bacterial indicator |
CN107463801A (en) * | 2017-07-31 | 2017-12-12 | 浙江绍兴千寻生物科技有限公司 | A kind of Drop seq data quality controls and analysis method |
CN107723343A (en) * | 2017-11-28 | 2018-02-23 | 宜昌美光硅谷生命科技股份有限公司 | A kind of method of gene quantification analysis |
CN108319817A (en) * | 2018-01-15 | 2018-07-24 | 臻和(北京)科技有限公司 | The processing method and processing device of Circulating tumor DNA repetitive sequence |
CN109072193A (en) * | 2015-04-03 | 2018-12-21 | 达纳-法伯癌症研究所有限公司 | The composition and method of B cell genome editor |
CN109658981A (en) * | 2018-12-10 | 2019-04-19 | 海南大学 | A kind of data classification method of unicellular sequencing |
CN109979538A (en) * | 2019-03-28 | 2019-07-05 | 广州基迪奥生物科技有限公司 | A kind of analysis method based on the unicellular transcript profile sequencing data of 10X |
-
2019
- 2019-08-27 CN CN201910795698.3A patent/CN110504005A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090222506A1 (en) * | 2008-02-29 | 2009-09-03 | Evident Software, Inc. | System and method for metering and analyzing usage and performance data of a virtualized compute and network infrastructure |
CN102952854A (en) * | 2011-08-25 | 2013-03-06 | 深圳华大基因科技有限公司 | Single cell sorting and screening method and device thereof |
CN106156541A (en) * | 2015-03-27 | 2016-11-23 | 深圳华大基因科技有限公司 | The method and apparatus analyzing the immunity difference of individual two class states |
CN109072193A (en) * | 2015-04-03 | 2018-12-21 | 达纳-法伯癌症研究所有限公司 | The composition and method of B cell genome editor |
CN107273711A (en) * | 2017-06-22 | 2017-10-20 | 宁波大学 | A kind of shrimp disease quantitative forecasting technique based on enteron aisle bacterial indicator |
CN107463801A (en) * | 2017-07-31 | 2017-12-12 | 浙江绍兴千寻生物科技有限公司 | A kind of Drop seq data quality controls and analysis method |
CN107723343A (en) * | 2017-11-28 | 2018-02-23 | 宜昌美光硅谷生命科技股份有限公司 | A kind of method of gene quantification analysis |
CN108319817A (en) * | 2018-01-15 | 2018-07-24 | 臻和(北京)科技有限公司 | The processing method and processing device of Circulating tumor DNA repetitive sequence |
CN109658981A (en) * | 2018-12-10 | 2019-04-19 | 海南大学 | A kind of data classification method of unicellular sequencing |
CN109979538A (en) * | 2019-03-28 | 2019-07-05 | 广州基迪奥生物科技有限公司 | A kind of analysis method based on the unicellular transcript profile sequencing data of 10X |
Non-Patent Citations (4)
Title |
---|
BENJAMÍN SIGURGEIRSSON等: "Analysis of stranded information using an automated procedure for strand specific RNA sequencing", 《BMC GENOMICS》 * |
P VAN LOO等: "Allele-specific copy number analysis of tumors", 《PNAS》 * |
ZHAOCHENXU: "PCR Array 简单实用的检测基因表达的高通量方法", 《HTTPS://WWW.ANTPEDIA.COM/NEWS/76/N-2283776.HTML》 * |
戚礼兴: "菜粉蝶不同发育阶段mRNA与miRNA转录组的高通量测序分析", 《中国优秀硕士学位论文全文数据库 农业科技辑》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112270953A (en) * | 2020-10-29 | 2021-01-26 | 哈尔滨因极科技有限公司 | Analysis method, device and equipment based on BD single cell transcriptome sequencing data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109979538B (en) | Analysis method based on 10X single cell transcriptome sequencing data | |
Voichek et al. | Identifying genetic variants underlying phenotypic variation in plants without complete genomes | |
Amaratunga et al. | Exploration and analysis of DNA microarray and protein array data | |
CN111261229B (en) | Biological analysis process of MeRIP-seq high-throughput sequencing data | |
AU2021257920A1 (en) | Variant classifier based on deep neural networks | |
CN109196123B (en) | SNP molecular marker combination for rice genotyping and application thereof | |
WO2019200338A1 (en) | Variant classifier based on deep neural networks | |
Amaratunga et al. | Exploration and analysis of DNA microarray and other high-dimensional data | |
CN114999573A (en) | Genome variation detection method and detection system | |
Akhter et al. | Applying Shannon's information theory to bacterial and phage genomes and metagenomes | |
CN113470743A (en) | Differential gene analysis method based on BD single cell transcriptome and proteome sequencing data | |
CN107034302A (en) | A kind of method that Relationship iden- tification is carried out using SLAF seq technological development awns genus plants SNP marker | |
CN110504005A (en) | Data processing method | |
Yan et al. | Identification of cell-type marker genes from plant single-cell RNA-seq data using machine learning | |
CN112102880A (en) | Method for identifying variety, and method and device for constructing prediction model thereof | |
CN116825182B (en) | Method for screening bacterial drug resistance characteristics based on genome ORFs and application | |
Klapproth et al. | Tailored machine learning models for functional RNA detection in genome-wide screens | |
Martínez | Time course gene expression experiments | |
Jing et al. | ScSmOP: a universal computational pipeline for single-cell single-molecule multiomics data analysis | |
Ostash et al. | Visualizing codon usage within and across genomes: concepts and tools | |
Lee et al. | A beginner's guide to single-cell transcriptomics | |
CN116168761B (en) | Method and device for determining characteristic region of nucleic acid sequence, electronic equipment and storage medium | |
CN117746979B (en) | A method for identifying animal species | |
CN116646010B (en) | Human virus detection method and device, equipment and storage medium | |
Wainer-Katsir et al. | BIRD: identifying cell doublets via biallelic expression from single cells |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20191126 |