CN118888001A

CN118888001A - Pathogen detection system and method based on metagenome high-throughput sequencing

Info

Publication number: CN118888001A
Application number: CN202411370770.5A
Authority: CN
Inventors: 费宏; 柳佳琦; 未庆; 李翰鹏
Original assignee: Shanghai Xinnuo Baishi Medical Laboratory Co ltd
Current assignee: Shanghai Xinnuo Baishi Medical Laboratory Co ltd
Priority date: 2024-09-29
Filing date: 2024-09-29
Publication date: 2024-11-01

Abstract

The invention discloses a pathogen detection system and a pathogen detection method based on metagenome high-throughput sequencing; the detection method comprises at least one of the following steps: cleaning a public database, calculating and filtering S_confidence and L_score of kmer classification information, and identifying pollution of a basic background model and a DIC background model; the method can control the detection of false positive species, and provides convenience for downstream report interpretation and assistance to clinicians in judging true pathogens.

Description

Pathogen detection system and method based on metagenome high-throughput sequencing

Technical Field

The invention belongs to the technical field of pathogen detection, and particularly relates to a pathogen detection system and method based on metagenome high-throughput sequencing.

Background

Infectious diseases are a collective term for diseases caused by pathogenic microorganisms (bacteria, viruses, fungi, parasites, etc.), and are serious diseases seriously threatening human health. The diagnosis and curative effect monitoring of infectious diseases have been dependent on methods such as morphology, immunology, molecular biology and pathogen isolation culture for a long time, and the methods have advantages and disadvantages and play an important role in auxiliary diagnosis of infectious diseases.

Recently emerging pathogen metagenome high throughput sequencing (metagenomic next-generation sequencing, mNGS) technology refers to a detection method that uses high throughput sequencing technology to sequence all nucleic acids in a specific clinical sample and determines whether a pathogen is present in the sample by bioinformatic analysis. Compared with the traditional pathogen detection technology based on separation culture, the technology can detect various microorganisms (such as viruses, bacteria, fungi, parasites and the like) without bias in theory, including pathogens which are difficult to culture and new pathogens. mNGS is an open analysis and diagnosis system, the number of pathogens detected by mNGS is not specified, and according to incomplete statistics, the mechanism for developing relevant detection services has almost ten thousand pathogens including bacteria, viruses, fungi, parasites and the like, and an effective technical means is provided for diagnosing serious and rare pathogen infection. Has clinical significance in specific clinical application scenes.

MNGS-based pathogen species identification falls into several broad categories:

1. an alignment-based method. mNGS reads are aligned to different species assemblies and the presence of a particular species is determined based on the alignment position and the alignment of the reliabilities. mNGS have no assumption about the scope of species identification, and the large annual increase in the species assembly sequences of public databases makes it difficult for general application equipment to meet the challenges presented by its spatial and temporal complexity. If only the sequence of the common known pathogen species is concerned, although the hardware and time bottlenecks can be alleviated, the detection range of the species is limited, and the sensitivity of clinical pathogen detection, particularly rare pathogens or novel pathogens, is reduced;

2. Identification of species based on marker genes. The method comprises the steps of firstly calculating species-level specific single copy genes (marker genes), and comparing mNGS data to the marker genes. This method has the advantage of small calculation amount, and has the disadvantage of high requirement for sequencing data amount (considering that the marker gene is sufficiently covered), which limits the sensitivity of the method. Meanwhile, the similarity of partial species nucleic acid is extremely high, so that challenges are brought to calculating the marker gene, and the common operation is that a plurality of near source species share the marker gene, so that the resolution of the near source species is reduced;

3. Kmer (consecutive subsequences of nucleic acid sequences) based methods. The species classification problem is converted into mNGS sequence classification problem, then the sequence classification problem is converted into kmer matching, the calculation amount is reduced, the timeliness of the whole analysis is improved, and the ultrahigh sensitivity is realized, so that the species detection method based on kmer is popular in the pathogen detection field. The ultra-high sensitivity of the method can have certain false positive under the combined actions of poor database quality, biological variation, sequencing noise, potential pollution of different experimental steps and the like, and is difficult to solve all the time, and the method brings confusion to report interpretation and auxiliary doctor diagnosis.

Therefore, there is a need to provide a pathogen detection system and method based on metagenomic high throughput sequencing to suppress false positives occurring in existing metagenomic high throughput sequencing pathogen detection, improving the accuracy of clinical diagnosis.

Disclosure of Invention

The invention aims to provide a pathogen detection system and a pathogen detection method based on metagenome high-throughput sequencing, which are used for effectively reducing or inhibiting pathogen detection false positive and improving the accuracy of clinical diagnosis.

In view of this, the scheme of the invention is as follows:

in a first aspect of the invention, a metagenomic high throughput sequencing-based pathogen detection system is presented, comprising:

The database module is used for constructing a pathogen species database;

The detection module is used for comparing the metagenome sequencing result with the database to obtain a pathogen species identification result;

The filtering module is used for filtering the identification result of the pathogenic species; the filtering method comprises the following steps: constructing a species kmer database based on the pathogenic species database, judging the reliability of the species based on the distribution condition of the metagenome sequence kmer on the corresponding species classification tree in the species kmer database, and filtering the identified pathogenic species; the reliability judging process comprises the following steps: and respectively calculating the S_confidence of the number of the species sequences to the ratio S_confidence of the number of the current species to the total number of the kmers, and the L_score of the number of the kmers of the species sequences on the current species node and the straight line node thereof to the total number of the kmers, taking the sequences of which the S_confidence and the L_score are close to 1 as reliable sequences, and taking the species with more than 2 reliable sequences as filtered pathogenic species.

Further, the reliability judging process is as follows: taking the species with the S_confidence quartile value of more than 80% of all sequences in the identified species as filtered pathogenic species;

And/or, taking as filtered pathogenic species a species of the identified species whose sequence l_score maximum satisfies the formula: Count _k (all) is the number of kmers at the maximum sequence length assigned to the current species.

Further, the process of constructing the pathogenic species database by the pathogenic species database construction module includes: collecting and screening high-quality pathogen species assembly sequences; the high quality assembly sequence in the species is used as a core, and the assembly sequence with high similarity with the core assembly sequence is reserved.

Preferably, the seed similarity is based on average nucleic acid identity, alignment coverage.

Preferably, the screening process of the pathogenic species assembly sequence is selected according to the assembly completion degree and the pollution degree.

Preferably, the construction process of the pathogen species database further comprises filtering abnormal assembly indexes, wherein the assembly indexes comprise assembly total length, contig number and GC content;

and/or, combining the highly proximal species in the assembled sequence;

and/or, rejecting moving elements in the assembly sequence;

and/or masking low complexity sequences.

Further, the filtration module further comprises a step of removing internal and/or external contamination of the pathogenic species identification result.

Preferably, the internal contamination removal process is to calculate the significance of the identified species rPM relative to the negative control sample rPM, treat the identified species as a contaminant species with little significance and remove it;

preferably, the external contamination removal process is to normalize the number of reads of the identified species according to the number of manually inserted reference sequences, calculate the significance of the normalized number of species reads relative to the negative control sample, treat the species as a contamination species with little significance, and remove the species.

In a second aspect of the invention, a pathogen detection method based on metagenome high-throughput sequencing is provided, comprising the steps of constructing a pathogen species database, and comparing a metagenome sequencing result with the database to obtain a pathogen species identification result;

The detection method further comprises the steps of constructing a species kmer database based on the pathogenic species database, judging species reliability based on the distribution condition of the metagenome sequence kmers on corresponding species classification trees in the species kmer database, and filtering the identified pathogenic species; the reliability judging process comprises the following steps: and respectively calculating the S_confidence of the number of the species sequences to the ratio S_confidence of the number of the current species to the total number of the kmers, and the L_score of the number of the kmers of the species sequences on the current species node and the straight line node thereof to the total number of the kmers, taking the sequences of which the S_confidence and the L_score are close to 1 as reliable sequences, and taking the species with more than 2 reliable sequences as filtered pathogenic species.

The ratio close to 1 refers to the ratio closer to 1 in the ratio percentages, and the threshold values such as more than 50%, more than 60%, more than 70%, more than 80% and the like can be set to select reliable sequences and then filter to obtain reliable species.

In a third aspect of the present invention, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the pathogen detection method according to the second aspect.

Compared with the prior art, the invention has the beneficial effects that:

The pathogen detection system judges the species reliability based on the distribution condition of the metagenomic sequence kmer on the corresponding species classification tree in the species kmer database, filters the identified pathogen species, can control the detection of false positive species, and provides convenience for downstream report interpretation and auxiliary clinicians to judge the true pathogen;

The pathogen detection system constructs a pathogen species database through species similarity screening, eliminates assembly sequences with low credibility, and can further control the detection of false positive species through highly similar species merging and removing mobile elements;

According to the pathogen detection system, the basic background model is set, and the reference sequence background model respectively utilizes the statistical index z-score to identify potential pollutant species and remove the pollutant species, so that false positive detection can be further reduced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for detecting pathogens based on metagenome high throughput sequencing according to the present invention;

FIG. 2 is a flow chart of a species database process according to the present invention;

FIG. 3 is a schematic diagram of an intra-seed similarity screening process according to the present invention;

FIG. 4 is a schematic diagram of a mobile element rejection process according to the present invention;

FIG. 5 is a schematic representation of species filtration according to the distribution of sequence kmers over classification trees according to the present invention.

Detailed Description

The following provides definitions of some of the terms used in this specification. Unless otherwise defined, all terms used herein are intended to have the meanings commonly understood by those skilled in the art to which the present scheme pertains.

Term interpretation:

mNGS: metagenomic sequencing, metagenomic Next-Generation Sequencing, uses a high throughput sequencing method to sequence all biological sequences ((DNA or/and RNA)) in a sample. For analyzing biological species, abundance, function, etc. in a sample. It should be noted that the source of the samples for which the metagenomic sequencing data is directed may be clinical samples, or samples from animals and living environments.

Kmer: refers to a sequence of k consecutive base pairs in a DNA or RNA, and kmer is a characteristic representation of the sequence.

Z-score: in statistics, Z-score, also known as standard score, represents the distance of a data point from the mean of the data set in standard deviation units. In other words, it measures the degree of deviation of a data point from the average of the dataset.

ANI: average Nucleotide Identity, which is the average identity of nucleotides between two genomic sequences. It is obtained by comparing the whole genome sequences of two genomes and calculating the ratio of the same base pairs between them.

AF: ALIGNMENT FRACTION in bioinformatics, ALIGNMENT FRACTION, chinese translation is "alignment score" or "alignment" which refers to the ratio of the number of bases on a successful alignment to the total number of bases after alignment of two sequences.

NTC: negative control samples are commonly referred to as No Template Control negative control samples. In the experiment, the negative control sample is an important control group, which can help to exclude the influence of other factors on the experimental result, thereby ensuring the reliability of the experimental result.

In one embodiment, a method for detecting a pathogen based on metagenome high throughput sequencing is provided, comprising the steps of comparing metagenome sequencing results with a database of pathogenic species to obtain identification results of pathogenic species, and filtering the identification results of pathogenic species. In order to reduce or inhibit false positives in the species classification process, improvements are made in terms of assembly of pathogen databases, filtration of identification results and the like, and a specific flow is shown in figure 1. The specific improvements include the following aspects:

1. Public database species assembly clean-up

The inventors have found that the reliability of pathogen identification by kmer methods is first dependent on database quality, based on which they have collected the assembly sequences (including but not limited to Refseq, geneBank) for viruses, bacteria, archaea, parasites, etc. in the public database. Wherein bacteria and archaea account for the vast majority of the database. Therefore, the bacteria and archaea are subjected to database quality treatment, and the operation flow is shown in figure 2 and is specifically as follows:

1. assembly quality screening

While assemblies that meet the following conditions are retained as high quality candidate assemblies for further processing.

(1) Assembly integrity <60%;

(2) The assembly pollution degree is more than 5%;

(3) (assembly completion 5) x assembly contamination level <50;

the assembly evaluation index is calculated by checkM.

2. Seed similarity screening

The public database stores partially erroneous species information, which leads to deviations in species classification and even affects the sensitivity of the species.

For this purpose, we calculated the average nucleic acid similarity and alignment coverage of other assemblies to the core assembly under the same species with the highest quality of assembly or putative classical strain in the species and representative assembly as the core, as shown in fig. 3. In fig. 3, the central red dot represents the core assembly of a particular species, and the other dots (black and orange) represent the other assemblies of a particular species. Average nucleic acid identity (ANI) is used to measure the similarity of two assemblies at the nucleic acid level (between 0 and 1, closer to 1 means closer), and since it has unpaired properties, ANI for two assemblies is calculated twice, ANI for other assemblies versus core assemblies is defined as negative and ANI for core assemblies versus other assemblies is defined as positive. The alignment score (AF) represents the degree of alignment coverage of the two assemblies. Similarly, for the sake of unity, we define:

Where sgn () is a common sign function. v_ANI (formula 1) falls within the [ -5,5] interval and is considered similar to the core assembly of this species from the perspective of nucleic acid similarity. v_AF (equation 2) falls within the interval [ -10, 10] and is considered to be close and complete from the coverage level to the core assembly. As shown in fig. 3, the assembly shown in orange dots remains for an intra-species trusted assembly; black dot assembly kicks out for intra-seed suspected assembly.

3. Assembly index anomaly filtering

The internal assembly should be similar in terms of overall assembly length and GC content.

For assembly length, GC content, N50, L50, contig numbers gave kick-outs with significantly abnormal assembly compared to other assemblies. This step may further reduce the risk of contamination or inaccuracy of the assembled biological sample.

4. Highly proximal species incorporation

Since there are differences between traditional taxonomies and genomic species taxonomies, mNGS essentially infer the taxonomies from the nucleic acid level. Thus, we will increase the sensitivity of species detection by combining species with v_ani <1 between species core assemblies (if the two species nucleic acids are highly similar, the single sequence of mNGS will be assigned to the common minimal ancestor node of the two species, thus reducing the detection sensitivity at the species level, classical examples are e.coli and shigella).

5. Removal of moving elements

During species assembly, partial submitters can keep sequences or organism sequences such as plasmids, bacterial viruses (phage) and the like which can horizontally spread among different closely related species in species assembly, which can confuse downstream sequence species classification, and relevant horizontal moving elements are removed according to relevant plasmid databases, phage databases and sequence name information, so that the assembly cleanliness is ensured;

Mainly comprises two steps as shown in fig. 4: the first step is to kick out the complete sequence of the mobile element assembled by the species such as bacteria according to the sequence ID, and the second step is to kick out the sequence fragment of the mobile element formed by incorrect assembly or sequence integration in the assembly of the species according to the sequence similarity.

6. Low complexity sequence masking

In the biological sequence comparison process, the information contained in the low-complexity sequence is low, and an error result of the sequence comparison can be caused, and dust (not only dust) and other software are used for shielding the low-complexity sequence, so that false positive results in the biological sequence comparison process are avoided.

Filtering of sequence kmer classification information

After the clean database is obtained, a species kmer database is constructed using kmer based classification software (such as kraken a2 but not limited to kraken a). And a classification analysis is performed on the sequence of mNGS samples. Biological mutation, sequencing noise and limitation of different kmer classification methods, and certain probability of errors exists in the classification information of single kmers. To reduce false positives caused by such probability errors, the distribution of the kmer of the sequence over the species classification tree is further analyzed and two scores are used to determine the classification reliability of an independent sequence, S_confidence (features-level confidence) and L_ score (Lineage score), respectively. The calculation method is shown in fig. 5, and the specific process is as follows:

Defining S_confidence: the ratio of the number of kmers classified into the current species (and its child nodes) to the total number of kmers in the sequence (read or assembled contig) of all kmers of a particular sequence ranges from 0 to 1, and the more unique kmers belonging to the current species in sequence kmers, the greater the likelihood of being from the current species (as shown in equation 3).

Wherein count _k represents the number of kmers at the maximum sequence length assigned to the current species, as follows.

L_score definition: all kmers of a particular sequence are assigned to the number of counts over the total kmer number at the current species node and its orthonodes (including the orthoancestor node and the orthooffspring node). The red circle of fig. 5 represents "seed 1", the blue shade represents its orthonormal node, and the others are non-orthonormal nodes, and if kmers of a specific sequence are centrally distributed over the orthonormal nodes, the surface classification process and data construction are reliable. The more reliable the value is between 0 and 1, the closer to 1. If kmer is distributed to other non-orthoscopic branches of the classification tree (e.g., non-negative nodes in fig. 4) too much, it is indicated that classification of the current sequence is unreliable (as shown in equation 4).

According to the definition, the sequences of S_confidence and L_score which are close to 1 are taken as reliable sequences, and the species with more than 2 reliable sequences are taken as filtered pathogen species.

Specifically, the reliable and efficient filtering means are: for sequences classified by kmer method, we group by species, set a threshold >80% for the s_confidence upper quartile for all sequences of the identified species, and satisfy equation 5 for the l_score maximum for the identified species, as a reliable species.

3. A background model is built to exclude internal and external contaminant species.

The species obtained by high-throughput sequencing may be from clinical samples per se, environmental pollution, reagent consumables, library-building sequencing background bacteria and the like. Pathogen detection aims at detecting microorganisms from clinical specimens, while microorganisms of other origin are regarded as detecting false positives. In order to identify microorganisms from environmental, consumable, instrument and the like sources, a plurality of Negative Control Samples (NTCs) are arranged, a specified quantity of reference Sequences (DICs) are added to all samples, a background model is established on a biological information layer according to the samples and data, and a statistical index z-score is used for identifying potential pollution species, so that false positives are detected in a reduced manner.

Background models are divided into two: a base background model, a reference sequence background model. The specific description is as follows:

1. basic background model:

The basic background model, without reference fragments, was used for general flow experiments. Firstly, calculating rPM (reads per million, millions of sequences) of a specific species (sp) of a biological sample (biosample) as shown in a formula (6); the purpose of this calculation is to counteract the reads variance due to the variance in the amount of sequencing data. And calculating rPM of corresponding species of three Negative Control Samples (NTCs) or Negative Control Samples (NTCs) of nearly three days of the current experiment, assuming that the DNA amount of the experimental pollutant species is in normal distribution in a plurality of NTC samples, calculating whether rPM(s) of the current biological sample is obvious under an NTC model by using Standard Deviation (SD), setting a Z-score (as shown in formula 7) threshold according to the characteristics of the species to define significance (as shown in virus and bacteria, and setting Z-score not less than 1.96).

Where mean () represents the average function.

2. Reference sequence (DIC) background model:

the reference sequence is added for quality normalization to further accurately identify exogenous contaminant species.

Firstly, the number of reads of a specific species sp in a sample is standardized according to the number of DIC, and the standardized number of the sp reads is obtained and is named as reads_norm (sp), as shown in a formula (8). The z-score was calculated with normalized reads and a background sample (NC) model, as in equation (9), with the thresholds above.

A significant difference between the species reads of the biological sample and either of the two background models is considered a reliable species, and vice versa a suspected contaminant species.

Examples

(1) MNGS pretreatment of data (high-throughput data quality control, human source data removal, low-complexity sequence removal) to obtain quality control, human source reads, low-complexity reads and microorganism reads, and the statistics are as follows:

Sample	Sample1	NTC1	NTC2	NTC3
					Sample_type	Bio	NTC	NTC	NTC
Raw_reads	20000000	1578463	1941510	1499540
					Raw_bases	1000000000	78923150	97075500	74977000
Clean_reads	19999782	1565835	1925977	1487543
					Clean_bases	979989318	77508832	95335861	73633378
Q20(%)	98.53	98.55	98.28	98.56
					Q30(%)	91.48	91.28	90.66	91.14
Reads_median_length	50	50	50	50
					GC(%)	42.7	43.08	42.88	42.9
Host_reads	19803696	1061636	1161364	1014504
					Low_complexity_reads	2459	192	236	182
Micro_reads	193627	504007	764377	472857
					Micro_rate(%)	0.01	0.322	0.397	0.318

(2) Sequence species classification based on kmer (kraken for example)

The number of species-unique reads was obtained using kmer-based sequence classification software (Kraken 2).

Sample	name	UniqReads
			Sample1	Mycobacterium tuberculosis	24
Sample1	Leptotrichia wadei	3
			Sample1	Escherichia coli	11
Sample1	Oribacterium sinus	1
			Sample1	Klebsiella pneumoniae	21

(3) S_confidence and L_score computation and filtering

The kmer distribution unreliable reads are filtered according to the quartile threshold on s_confidence and the l_score maximum threshold.

Sample	name	S_confidence(UQ)	Max(L_score)	Pass
					Sample1	Mycobacterium tuberculosis	0.9375	1	True
Sample1	Leptotrichia wadei	0.6875	0.8125	False
					Sample1	Escherichia coli	0.9375	1	True
Sample1	Oribacterium sinus	0.5625	0.5625	False
					Sample1	Klebsiella pneumoniae	0.9375	1	True

(4) Background model filtering (basic background model, DIC background model)

Sample	name	Basic background model Z	DIC background model Z	Pass
					Sample1	Mycobacterium tuberculosis	100	100	True
Sample1	Escherichia coli	1.03	1.25	False
					Sample1	Klebsiella pneumoniae	7.32	6.91	True

* And (3) injection: when no relevant species were detected in the NC sample, the z-score value was defined as 100.

(5) Final species information output

And (3) reliably detecting species information obtained after the false positive filtration in the step (3) and the step (4).

Sample	name	UniqReads
			Sample1	Mycobacterium tuberculosis	24
Sample1	Klebsiella pneumoniae	21

Although the present disclosure is disclosed above, the scope of the present disclosure is not limited thereto. Various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the disclosure, and these changes and modifications will fall within the scope of the disclosure.

Claims

1. A metagenome high throughput sequencing-based pathogen detection system, comprising:

The database module is used for constructing a pathogen species database;

The filtering module is used for filtering the identification result of the pathogenic species; the filtering method comprises the following steps: constructing a species kmer database based on the pathogenic species database, judging the reliability of the species based on the distribution condition of the metagenome sequence kmer on the corresponding species classification tree in the species kmer database, and filtering the identified pathogenic species; the reliability judging process comprises the following steps: and respectively calculating the ratio S_confidence of the number of the species sequences to the total number of the present species and the ratio L_score of the number of the species sequences to the total number of the present species nodes and the direct line nodes thereof, taking the sequences of which the S_confidence and the L_score are close to 1 as reliable sequences, and taking the species with more than 2 reliable sequences as filtered pathogenic species.

2. The pathogen detection system of claim 1, wherein the reliability determination process is: taking the species with the S_confidence quartile value of more than 80% of all sequences in the identified species as filtered pathogenic species;

And/or, taking as filtered pathogenic species a species of the identified species whose sequence l_score maximum satisfies the formula:

Count _k (all) is the number of kmers at the maximum sequence length assigned to the current species.

3. The pathogen detection system of claim 1, wherein the pathogen species database construction process includes: collecting and screening high-quality pathogen species assembly sequences; the high quality assembly sequence in the species is used as a core, and the assembly sequence with high similarity with the core assembly sequence is reserved.

4. The pathogen detection system of claim 3, wherein the seed similarity is based on average nucleic acid identity, alignment coverage.

5. A pathogen detection system according to claim 3, wherein the screening process of the pathogen species assembly sequence screens according to assembly completion, contamination level.

6. The pathogen detection system of claim 3, wherein the process of constructing the pathogen species database further includes filtering for anomalies in assembly indicators including total assembly length, contig number, and GC content;

and/or, combining the highly proximal species in the assembled sequence;

and/or, rejecting moving elements in the assembly sequence;

and/or masking low complexity sequences.

7. The pathogen detection system of claim 1, wherein the filtration module further includes a step of removing internal and/or external contamination of the pathogen species identification.

8. The pathogen detection system of claim 7, wherein the internal contamination removal process is to calculate the significance of the identified species rPM relative to the negative control sample rPM, treat the identified species as a contaminant species with little significance and remove it;

And/or, the external pollution removal process is to normalize the number of reads of the identified species according to the number of manually inserted reference sequences, calculate the significance of the normalized number of species reads relative to the negative control sample, treat the species as a pollution species with small significance and remove the species.

9. The pathogen detection method based on metagenome high-throughput sequencing is characterized by comprising the steps of constructing a pathogen species database, and comparing a metagenome sequencing result with the database to obtain a pathogen species identification result;

10. Computer readable storage medium, having stored thereon a computer program, the processor executing the computer program to implement the pathogen detection method of claim 9.