[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN115954048B - Screening method and device for CRISPR-Cas system - Google Patents

Screening method and device for CRISPR-Cas system Download PDF

Info

Publication number
CN115954048B
CN115954048B CN202310004646.6A CN202310004646A CN115954048B CN 115954048 B CN115954048 B CN 115954048B CN 202310004646 A CN202310004646 A CN 202310004646A CN 115954048 B CN115954048 B CN 115954048B
Authority
CN
China
Prior art keywords
crispr
cas system
sequence
determining
conserved
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310004646.6A
Other languages
Chinese (zh)
Other versions
CN115954048A (en
Inventor
李文慧
王无可
蔡润泽
侯丽亚
黄嘉健
唐进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202310004646.6A priority Critical patent/CN115954048B/en
Publication of CN115954048A publication Critical patent/CN115954048A/en
Application granted granted Critical
Publication of CN115954048B publication Critical patent/CN115954048B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A50/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE in human health protection, e.g. against extreme weather
    • Y02A50/30Against vector-borne diseases, e.g. mosquito-borne, fly-borne, tick-borne or waterborne diseases whose impact is exacerbated by climate change

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The specification discloses a screening method and a screening device for a CRISPR-Cas system, which can acquire relevant information and corresponding labeling information of the CRISPR-Cas system, wherein the labeling information corresponding to the CRISPR-Cas system is used for indicating whether the CRISPR-Cas system has self-processing capability or not; and then, a conserved repeated sequence in the CRISPR-Cas system can be determined according to the related information, and the gene characteristics corresponding to the CRISPR-Cas system are determined according to the conserved repeated sequence, so that a prediction model is trained, the trained prediction model can be used for screening out a target CRISPR-Cas system for developing a gene editing tool, and the method can improve the development efficiency of the gene editing tool by automatically screening out a group of CRISPR-Cas systems with larger probability and self-processing capability.

Description

Screening method and device for CRISPR-Cas system
Technical Field
The specification relates to the biotechnology field, in particular to a screening method and device for a CRISPR-Cas system.
Background
The CRISPR-Cas system is an acquired immune system in microorganisms, and can be gradually used for researching a gene editing technology along with development of a metagenome technology and data accumulation of massive microorganism metagenomes.
Currently, there are some CRISPR-Cas systems with self-processing capabilities (capability of self-processing crrnas), such as Cas12a and Cas13, which have rnase activity themselves, do not require additional rnase involvement, do not require tracrRNA sequences for sgRNA design, and can achieve multi-target simultaneous editing in gene editing applications, with great advantages over traditional CRISPR-Cas 9.
The use of a CRISPR-Cas system with self-processing capability to develop a gene editing tool requires a series of active experimental verification of the CRISPR-Cas system, requires a longer experimental period and a greater human input, and determines whether the CRISPR-Cas system has self-processing capability is also involved in the experimental process. How to improve the efficiency of gene editing tool development using CRISPR-Cas system with self-processing capability and reduce labor cost is a urgent problem.
Disclosure of Invention
The present disclosure provides a screening method and apparatus for CRISPR-Cas systems to partially solve the above-mentioned problems of the prior art.
The technical scheme adopted in the specification is as follows:
the present specification provides a screening method for a CRISPR-Cas system, comprising:
acquiring relevant information of a CRISPR-Cas system and labeling information corresponding to the CRISPR-Cas system, wherein the labeling information corresponding to the CRISPR-Cas system is used for indicating whether the CRISPR-Cas system has self-processing capability or not;
according to the related information, a conserved repeated sequence in the CRISPR-Cas system is determined, and according to the conserved repeated sequence, the gene characteristic corresponding to the CRISPR-Cas system is determined;
inputting gene features corresponding to a CRISPR-Cas system into a prediction model to obtain a prediction result, training the prediction model with the aim of minimizing deviation between the prediction result and labeling information corresponding to the CRISPR-Cas system, wherein the trained prediction model is used for screening out a target CRISPR-Cas system from the CRISPR-Cas system with unknown self-processing capacity, and the target CRISPR-Cas system is used for developing a gene editing tool.
Optionally, determining the gene characteristic corresponding to the CRISPR-Cas system according to the conserved repeated sequence specifically includes:
and determining a reverse complementary sequence corresponding to the conserved repeated sequence, and determining the gene characteristics corresponding to the CRISPR-Cas system according to the conserved repeated sequence and the reverse complementary sequence.
Optionally, determining the gene characteristic corresponding to the CRISPR-Cas system according to the conserved repeated sequence specifically includes:
determining the RNA secondary structure corresponding to the conserved repeat sequence;
and determining the gene characteristics corresponding to the CRISPR-Cas system according to the RNA secondary structure.
Optionally, determining a conserved repeat sequence in the CRISPR-Cas system according to the related information, specifically includes:
determining a palindromic sequence CRISPR array in the CRISPR-Cas system according to the related information;
performing initial recognition according to the CRISPR array to obtain an initial repeat sequence and a spacer sequence;
sequence alignment is carried out on the initial spacer sequence so as to obtain the consistency of each base position in the initial spacer sequence;
from the identity and the initial repeat sequence, a conserved repeat sequence in the CRISPR array is determined.
Optionally, determining the gene characteristic corresponding to the CRISPR-Cas system according to the RNA secondary structure specifically includes:
determining structure-related information of the RNA secondary structure, wherein the structure-related information comprises the following components: at least one of information of a loop of a stem-loop structure of the RNA secondary structure, information of a stem, information of a bubble, information of a link;
and determining the gene characteristics corresponding to the CRISPR-Cas system according to the structure related information.
Optionally, determining the gene characteristic corresponding to the CRISPR-Cas system according to the conserved repeated sequence specifically includes:
determining sequence characteristics of the conserved repeat sequence, the sequence characteristics comprising: at least one of the GC content of the conserved repeat sequence and the kmer characteristic of the conserved repeat sequence;
and determining the gene characteristics corresponding to the CRISPR-Cas system according to the sequence characteristics.
Optionally, the method further comprises:
and training a plurality of prediction models according to the gene characteristics corresponding to the CRISPR-Cas system and the labeling information corresponding to the CRISPR-Cas system, wherein machine learning algorithms used by different prediction models are different.
Optionally, screening the target CRISPR-Cas system from CRISPR-Cas systems of unknown self-processing capacity by a trained predictive model, specifically comprising:
determining a training effect characterization value corresponding to each trained prediction model, wherein the training effect characterization value comprises: at least one of accuracy, precision, and recall;
and determining a target prediction model from the plurality of prediction models according to the training effect characterization value, and screening a target CRISPR-Cas system from the CRISPR-Cas systems with unknown self-processing capacity.
Provided herein is a screening device for a CRISPR-Cas system, comprising:
the CRISPR-Cas system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring relevant information of the CRISPR-Cas system and labeling information corresponding to the CRISPR-Cas system, and the labeling information corresponding to the CRISPR-Cas system is used for indicating whether the CRISPR-Cas system has self-processing capability or not;
the extraction module is used for determining a conserved repeated sequence in the CRISPR-Cas system according to the related information and determining a gene characteristic corresponding to the CRISPR-Cas system according to the conserved repeated sequence;
and the training module is used for inputting the gene characteristics corresponding to the CRISPR-Cas system into the prediction model to obtain a prediction result, training the prediction model with the aim of minimizing deviation between the prediction result and the labeling information corresponding to the CRISPR-Cas system, and screening a target CRISPR-Cas system from the CRISPR-Cas system with unknown self-processing capacity by the trained prediction model, wherein the target CRISPR-Cas system is used for developing a gene editing tool.
The present specification provides a computer readable storage medium storing a computer program which when executed by a processor implements the above screening method for a CRISPR-Cas system.
The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, which when executed implements the above-described screening method for a CRISPR-Cas system.
The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:
according to the screening method for the CRISPR-Cas system, the related information of the CRISPR-Cas system and the label information corresponding to the CRISPR-Cas system can be obtained, and the label information corresponding to the CRISPR-Cas system is used for indicating whether the CRISPR-Cas system has self-processing capability or not; and then, a conserved repeated sequence in the CRISPR-Cas system can be determined according to the related information, and the gene characteristic corresponding to the CRISPR-Cas system is determined according to the conserved repeated sequence, so that the gene characteristic corresponding to the CRISPR-Cas system is input into a prediction model to obtain a prediction result, the prediction model is trained with the aim of minimizing deviation between the prediction result and labeling information corresponding to the CRISPR-Cas system, the trained prediction model can be used for screening out a target CRISPR-Cas system from the CRISPR-Cas system with unknown self-processing capacity, and the target CRISPR-Cas system is used for developing a gene editing tool.
From the above, it can be seen that the screening method for the CRISPR-Cas system provided in the present specification can be achieved, and the prediction model is trained by the pre-labeled CRISPR-Cas system, so that the CRISPR-Cas system possibly having self-processing capability can be automatically screened from the CRISPR-Cas system with unknown self-processing capability by the prediction model.
Drawings
The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. In the drawings:
fig. 1 is a flow diagram of a screening method for a CRISPR-Cas system provided in the present specification;
FIG. 2 is a schematic representation of one of the conservative repeat corrections provided herein;
FIG. 3 is a schematic representation of the digitization of the stem-loop structure of an RNA secondary structure provided herein;
FIG. 4 is a schematic structural diagram of a screening device for CRISPR-Cas system according to the present disclosure;
fig. 5 is a schematic view of the electronic device corresponding to fig. 1 provided in the present specification.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.
The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.
Fig. 1 is a schematic flow chart of a screening method for a CRISPR-Cas system provided in the present specification, specifically including the following steps:
s100: and acquiring relevant information of the CRISPR-Cas system and labeling information corresponding to the CRISPR-Cas system, wherein the labeling information corresponding to the CRISPR-Cas system is used for indicating whether the CRISPR-Cas system has self-processing capability or not.
S102: and determining a conserved repeated sequence in the CRISPR-Cas system according to the related information, and determining the gene characteristics corresponding to the CRISPR-Cas system according to the conserved repeated sequence.
In practical application, the CRISPR-Cas system in the microorganism is mined, and corresponding gene editing technology is developed, so that a scientific research personnel is often required to perform long-time experiments, and in order to save time and labor cost, the CRISPR-Cas system possibly having self-processing capability can be screened out in advance, and then subsequent experiments are performed.
Thus, the server can obtain relevant information of CRISPR (clustered regularly interspaced short palindromic repeats) -Cas systems and labeling information corresponding to each CRISPR-Cas system.
The CRISPR-Cas system corresponding label information is used for indicating whether the CRISPR-Cas system has self-processing capability or not according to each CRISPR-Cas system, a plurality of CRISPR-Cas systems can be excavated through a large number of microorganism macro-base data in advance, and whether the CRISPR-Cas system has self-processing capability or not is marked in advance. The above-mentioned relevant information for a CRISPR-Cas system may indicate the structure inside the CRISPR-Cas system, e.g. what the gene sequence is in particular.
The server can then determine a conserved repeat sequence in the CRISPR-Cas system by the relevant information and determine a gene signature corresponding to the CRISPR-Cas system from the conserved repeat sequence, wherein the conserved repeat sequence referred to herein can refer to a repeat sequence in the CRISPR-Cas system, and typically a palindromic sequence (CRISPR array) in the CRISPR-Cas system consists of the conserved repeat sequence and the spacer sequence, where the repeat sequence in the gene sequence can be determined.
In the mechanism of action of the CRISPR-Cas system, it is necessary to transcribe palindromic sequences into non-coding RNAs, known as pre-crrnas, which are cleaved by nucleases to crrnas (comprising repeat sequences), which are available for gene editing.
Because the RNA corresponding to the repeat sequence is a main component part of the non-coding RNA, has conservation and also has a certain secondary structural characteristic, the prediction model in the specification can predict whether the CRISPR-Cas system has self-processing capability by utilizing the RNA secondary structure of the repeat. In addition, the identification of conserved repeats can be achieved by published basic algorithms, such as MinCED.
It should be noted that, when the CRISPR-Cas system performs transcription, sense transcription is possible, and antisense transcription is also possible, so that the reverse complement sequence corresponding to the conserved repetitive sequence can be determined, and the gene characteristic corresponding to the CRISPR-Cas system can be determined according to the conserved repetitive sequence and the reverse complement sequence. That is, when the gene characteristics of the CRISPR-Cas system are learned, the subsequent prediction model learns the characteristics of both the conserved repeated sequence and the reverse complement sequence.
It is also noted that in CRISPR-Cas systems, although there are multiple conserved repeats and most of these are identical, there may be some problem in initially identifying the conserved repeat (repeat sequence) and thus, correction of the identity is made for the conserved repeat that can be initially identified.
Firstly, a palindromic sequence CRISPR array in a CRISPR-Cas system can be determined according to the related information, initial recognition is carried out according to the CRISPR array to obtain an initial repeat sequence and a spacer sequence, and then sequence comparison is carried out on the initial spacer sequence to obtain the consistency of each base position in the spacer sequence; further, based on the determined identity and the initial repeat sequence, a conserved repeat sequence in the CRISPR array is determined, as shown in fig. 2.
FIG. 2 is a schematic representation of one of the conservative repeat corrections provided herein.
As can be seen from FIG. 2, the initially identified repeat sequence and the spacer sequence are shown in the upper half of FIG. 2, and it can be seen that the initially identified repeat sequence lacks a portion and is identified in the spacer sequence, such that sequence alignments can be performed separately for the initial spacer sequence to obtain the identity of each base position in the initial spacer sequence, where reference herein to the identity of each base position in the initial spacer sequence can refer to whether each base position in each spacer sequence identified is identical, and that the portion of the sequence with identical base positions can be added to the initial repeat sequence to obtain a conservative repeat sequence.
It should also be noted that the sequence characteristics of the conserved repeated sequence may be determined, and the gene characteristics corresponding to the CRISPR-Cas system may be determined according to the sequence characteristics, where the sequence characteristics include, but are not limited to, at least one of GC content of the conserved repeated sequence and kmer characteristics of the conserved repeated sequence.
Wherein the above mentioned GC content may refer to the content of G, C bases contained in the conserved repeat sequence, the GC content is used as a sequence feature, since the GC content may be related to species and may be related to the stability of the secondary structure formed by the sequence. In addition, kmer features may represent word frequency features of single base or multiple bases in a conserved repeat sequence, similar to word segmentation features in natural language processing.
For example, every n (n may be self-set) bases in a conserved repeat sequence may be combined as a set of bases, such that the combination of bases determined from the conserved repeat sequence may be used as a kmer signature. n may be a preset positive integer. For another example, the base combinations may be determined by combining a sliding window, and if one base combination is set to include 3 bases, 1 st to 3 rd bases may be used as one base combination, 2 nd to 4 th bases may be used as the second base combination, and all base combinations may be obtained by the same method, thereby obtaining the kmer characteristic.
Structural information of the RNA secondary structure corresponding to the conserved repeat sequence may also be determined, including but not limited to: information of loop (loop), stem (stem), bubble, link) of stem-loop structure of RNA secondary structure; then, according to the structure related information, the gene characteristics corresponding to the CRISPR-Cas system can be determined, that is, the gene characteristics also need to include the overall structural characteristics of the stem-loop structure in the RNA secondary structure, wherein the number, length, proportion, ratio and other digital characteristics of loops (loops), stems (bubbles), links (links) can be determined and added to the gene characteristics, and specifically, an example of the structure related information can be shown in fig. 3.
Fig. 3 is a schematic diagram showing the digitization characteristics of the stem-loop structure of an RNA secondary structure provided in the present specification.
S104: inputting gene features corresponding to a CRISPR-Cas system into a prediction model to obtain a prediction result, training the prediction model with the aim of minimizing deviation between the prediction result and labeling information corresponding to the CRISPR-Cas system, wherein the trained prediction model is used for screening out a target CRISPR-Cas system from the CRISPR-Cas system with unknown self-processing capacity, and the target CRISPR-Cas system is used for developing a gene editing tool.
After determining the gene characteristics of the CRISPR-Cas system, the gene characteristics corresponding to the CRISPR-Cas system may be input into a prediction model to obtain a prediction result, and the prediction model is trained with the aim of minimizing the deviation between the prediction result and the labeling information corresponding to the CRISPR-Cas system, the trained prediction model is used to screen out the target CRISPR-Cas system from the CRISPR-Cas system with unknown self-processing capability, the target CRISPR-Cas system may be used for the development of a gene editing tool, and the target CRISPR-Cas system referred to herein may refer to a CRISPR-Cas system with self-processing capability.
The above prediction results may indicate whether the CRISPR-Cas system is predicted to have self-processing capability. It should be noted that, during actual training, multiple prediction models may be trained, each prediction model uses a different machine learning algorithm, and the effects of the multiple trained prediction models may be different, so that multiple prediction models may be combined to screen the CRISPR-Cas system with self-processing capability.
Wherein, can confirm the training effect characterization value that each prediction model after training corresponds, training effect characterization value includes: at least one of accuracy, precision, and recall; further, a target prediction model is selected from the plurality of prediction models according to training effect characterization values of the respective prediction models, and a target CRISPR-Cas system is selected from CRISPR-Cas systems of unknown self-processing capacity according to the target prediction model.
Specifically, the target CRISPR-Cas system is screened according to the training effect characterization value of each prediction model, which may mean that the prediction model with the best training effect characterization value is determined, and the target CRISPR-Cas system is screened through the prediction model, or of course, the target CRISPR-Cas system may be set according to actual requirements, for example, the actual requirements are loose, the CRISPR-Cas system with less probability of self-processing capability may be screened, and the prediction model with the training effect characterization value above a certain threshold may be selected, and the target CRISPR-Cas system may be comprehensively screened through the prediction models.
From the above, it can be seen that the screening method for a CRISPR-Cas system provided in the present specification can automatically screen a CRISPR-Cas system with possibly self-processing ability from a CRISPR-Cas system with unknown self-processing ability, and can initially identify an erroneous repeat sequence for correction to obtain a correct conserved repeat sequence (repeat sequence), and in addition, can determine various structural features, sequence features and the like related to RNA transcribed from the repeat sequence, thereby being capable of more accurately identifying a CRISPR-Cas system with self-processing ability.
Fig. 4 is a schematic structural diagram of a screening device for CRISPR-Cas system provided in the present specification, comprising:
an obtaining module 401, configured to obtain relevant information of an acquired immune CRISPR-Cas system and label information corresponding to the CRISPR-Cas system, where the label information corresponding to the CRISPR-Cas system is used to indicate whether the CRISPR-Cas system has self-processing capability;
an extraction module 402, configured to determine a conserved repeated sequence in the CRISPR-Cas system according to the related information, and determine a gene feature corresponding to the CRISPR-Cas system according to the conserved repeated sequence;
the training module 403 is configured to input the gene feature corresponding to the CRISPR-Cas system into a prediction model to obtain a prediction result, train the prediction model with a goal of minimizing a deviation between the prediction result and the labeling information corresponding to the CRISPR-Cas system, and screen a target CRISPR-Cas system from the CRISPR-Cas system with unknown self-processing capability, where the target CRISPR-Cas system is used for developing a gene editing tool.
Optionally, the extracting module 402 is specifically configured to determine a reverse complement sequence corresponding to the conserved repeated sequence, and determine a gene feature corresponding to the CRISPR-Cas system according to the conserved repeated sequence and the reverse complement sequence.
Optionally, the extracting module 402 is specifically configured to determine an RNA secondary structure corresponding to the conserved repeated sequence; and determining the gene characteristics corresponding to the CRISPR-Cas system according to the RNA secondary structure.
Optionally, the extracting module 402 is specifically configured to determine a palindromic sequence CRISPR array in the CRISPR-Cas system according to the related information; performing initial recognition according to the CRISPR array to obtain an initial repeat sequence and a spacer sequence; sequence alignment is carried out on the initial spacer sequence so as to obtain the consistency of each base position in the initial spacer sequence; from the identity and the initial repeat sequence, a conserved repeat sequence in the CRISPR array is determined.
Optionally, the extracting module 402 is specifically configured to determine structure related information of the RNA secondary structure, where the structure related information includes: at least one of loop information, stem information, bleb information, and linked information in the stem-loop structure of the RNA secondary structure; and determining the gene characteristics corresponding to the CRISPR-Cas system according to the structure related information.
Optionally, the extracting module 402 is specifically configured to determine sequence features of the conserved repeated sequence, where the sequence features include: at least one of the GC content of the conserved repeat sequence and the kmer characteristic of the conserved repeat sequence; and determining the gene characteristics corresponding to the CRISPR-Cas system according to the sequence characteristics.
Optionally, the training module 403 is specifically configured to train a plurality of prediction models according to the gene features corresponding to each CRISPR-Cas system and the labeling information corresponding to each CRISPR-Cas system, where machine learning algorithms used by different prediction models are different.
Optionally, the training module 403 is specifically configured to determine a training effect representation value corresponding to each trained prediction model, where the training effect representation value includes: at least one of accuracy, precision, and recall; and determining a target prediction model from the plurality of prediction models according to the training effect characterization value so as to screen the target CRISPR-Cas system from the CRISPR-Cas system with unknown self-processing capacity through the target prediction model.
The present specification also provides a computer readable storage medium storing a computer program operable to perform the above screening method for a CRISPR-Cas system.
The present specification also provides a schematic structural diagram of the electronic device shown in fig. 5. At the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile storage, as illustrated in fig. 5, although other hardware required by other services may be included. The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs to realize the screening method based on the CRISPR-Cas system.
Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.
In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present description is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims (11)

1. A screening method for a CRISPR-Cas system, comprising:
acquiring relevant information of a CRISPR-Cas system and labeling information corresponding to the CRISPR-Cas system, wherein the labeling information corresponding to the CRISPR-Cas system is used for indicating whether the CRISPR-Cas system has self-processing capability or not;
according to the related information, a conserved repeated sequence in the CRISPR-Cas system is determined, and according to the conserved repeated sequence, the gene characteristic corresponding to the CRISPR-Cas system is determined;
inputting gene features corresponding to a CRISPR-Cas system into a prediction model to obtain a prediction result, training the prediction model with the aim of minimizing deviation between the prediction result and labeling information corresponding to the CRISPR-Cas system, wherein the trained prediction model is used for screening out a target CRISPR-Cas system from the CRISPR-Cas system with unknown self-processing capacity, and the target CRISPR-Cas system is used for developing a gene editing tool.
2. The method of claim 1, wherein determining the gene signature corresponding to the CRISPR-Cas system based on the conserved repeat sequence, specifically comprises:
and determining a reverse complementary sequence corresponding to the conserved repeated sequence, and determining the gene characteristics corresponding to the CRISPR-Cas system according to the conserved repeated sequence and the reverse complementary sequence.
3. The method of claim 1, wherein determining the gene signature corresponding to the CRISPR-Cas system based on the conserved repeat sequence, specifically comprises:
determining the RNA secondary structure corresponding to the conserved repeat sequence;
and determining the gene characteristics corresponding to the CRISPR-Cas system according to the RNA secondary structure.
4. The method of claim 1, wherein determining the conserved repeat sequence in the CRISPR-Cas system based on the correlation information, specifically comprises:
determining a palindromic sequence CRISPR array in the CRISPR-Cas system according to the related information;
performing initial recognition according to the CRISPR array to obtain an initial repeat sequence and a spacer sequence;
sequence alignment is carried out on the initial spacer sequence so as to obtain the consistency of each base position in the initial spacer sequence;
from the identity and the initial repeat sequence, a conserved repeat sequence in the CRISPR array is determined.
5. The method of claim 3, wherein determining the gene signature corresponding to the CRISPR-Cas system from the RNA secondary structure, specifically comprises:
determining structure-related information of the RNA secondary structure, wherein the structure-related information comprises the following components: at least one of loop information, stem information, bleb information, and linked information in the stem-loop structure of the RNA secondary structure;
and determining the gene characteristics corresponding to the CRISPR-Cas system according to the structure related information.
6. The method of claim 3, wherein determining the gene signature corresponding to the CRISPR-Cas system based on the conserved repeat sequence, specifically comprises:
determining sequence characteristics of the conserved repeat sequence, the sequence characteristics comprising: at least one of the GC content of the conserved repeat sequence and the kmer characteristic of the conserved repeat sequence;
and determining the gene characteristics corresponding to the CRISPR-Cas system according to the sequence characteristics.
7. The method of claim 1, wherein the method further comprises:
and training a plurality of prediction models according to the gene characteristics corresponding to the CRISPR-Cas system and the labeling information corresponding to the CRISPR-Cas system, wherein machine learning algorithms used by different prediction models are different.
8. The method of claim 7, wherein the target CRISPR-Cas system is screened from CRISPR-Cas systems of unknown self-processing capacity by a trained predictive model, comprising in particular:
determining a training effect characterization value corresponding to each trained prediction model, wherein the training effect characterization value comprises: at least one of accuracy, precision, and recall;
and determining a target prediction model from the plurality of prediction models according to the training effect characterization value so as to screen the target CRISPR-Cas system from the CRISPR-Cas system with unknown self-processing capacity through the target prediction model.
9. A screening apparatus for a CRISPR-Cas system, comprising:
the CRISPR-Cas system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring relevant information of the CRISPR-Cas system and labeling information corresponding to the CRISPR-Cas system, and the labeling information corresponding to the CRISPR-Cas system is used for indicating whether the CRISPR-Cas system has self-processing capability or not;
the extraction module is used for determining a conserved repeated sequence in the CRISPR-Cas system according to the related information and determining a gene characteristic corresponding to the CRISPR-Cas system according to the conserved repeated sequence;
and the training module is used for inputting the gene characteristics corresponding to the CRISPR-Cas system into the prediction model to obtain a prediction result, training the prediction model with the aim of minimizing deviation between the prediction result and the labeling information corresponding to the CRISPR-Cas system, and screening a target CRISPR-Cas system from the CRISPR-Cas system with unknown self-processing capacity by the trained prediction model, wherein the target CRISPR-Cas system is used for developing a gene editing tool.
10. A computer-readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-8.
11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of the preceding claims 1-8 when executing the program.
CN202310004646.6A 2023-01-03 2023-01-03 Screening method and device for CRISPR-Cas system Active CN115954048B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310004646.6A CN115954048B (en) 2023-01-03 2023-01-03 Screening method and device for CRISPR-Cas system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310004646.6A CN115954048B (en) 2023-01-03 2023-01-03 Screening method and device for CRISPR-Cas system

Publications (2)

Publication Number Publication Date
CN115954048A CN115954048A (en) 2023-04-11
CN115954048B true CN115954048B (en) 2023-06-16

Family

ID=85907632

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310004646.6A Active CN115954048B (en) 2023-01-03 2023-01-03 Screening method and device for CRISPR-Cas system

Country Status (1)

Country Link
CN (1) CN115954048B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117252306B (en) * 2023-10-11 2024-02-27 中央民族大学 Gene editing capability index calculation method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107784200A (en) * 2016-08-26 2018-03-09 深圳华大基因研究院 A kind of method and apparatus for screening novel C RISPR Cas systems
CN111261223A (en) * 2020-01-12 2020-06-09 湖南大学 CRISPR off-target effect prediction method based on deep learning
CA3142230A1 (en) * 2019-05-31 2020-12-03 The Governing Council Of The University Of Toronto Methods and compositions for multiplex gene editing
WO2022013186A1 (en) * 2020-07-13 2022-01-20 Helmholtz-Zentrum für Infektionsforschung GmbH Method for prediction of the guide efficiency when targeting a gene of interest
WO2022199511A1 (en) * 2021-03-26 2022-09-29 珠海舒桐医疗科技有限公司 Lt1cas13d protein and gene editing system
WO2022261115A1 (en) * 2021-06-07 2022-12-15 Yale University Peptide nucleic acids for spatiotemporal control of crispr-cas binding

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018035250A1 (en) * 2016-08-17 2018-02-22 The Broad Institute, Inc. Methods for identifying class 2 crispr-cas systems
US11810649B2 (en) * 2016-08-17 2023-11-07 The Broad Institute, Inc. Methods for identifying novel gene editing elements

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107784200A (en) * 2016-08-26 2018-03-09 深圳华大基因研究院 A kind of method and apparatus for screening novel C RISPR Cas systems
CA3142230A1 (en) * 2019-05-31 2020-12-03 The Governing Council Of The University Of Toronto Methods and compositions for multiplex gene editing
CN111261223A (en) * 2020-01-12 2020-06-09 湖南大学 CRISPR off-target effect prediction method based on deep learning
WO2022013186A1 (en) * 2020-07-13 2022-01-20 Helmholtz-Zentrum für Infektionsforschung GmbH Method for prediction of the guide efficiency when targeting a gene of interest
WO2022199511A1 (en) * 2021-03-26 2022-09-29 珠海舒桐医疗科技有限公司 Lt1cas13d protein and gene editing system
WO2022261115A1 (en) * 2021-06-07 2022-12-15 Yale University Peptide nucleic acids for spatiotemporal control of crispr-cas binding

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
CRISPR/Cas系统的分类及研究现状;高维崧等;生物技术进展;第12卷(第04期);第532-538页 *
CRISPRDetect: A flexible algorithm to define CRISPR arrays;Ambarish Biswas等;BMC Genomics;第17卷(第356期);第1-14页 *
Engineered Cas12i2 is a versatile high-efficiency platform for therapeutic genome editing;Colin McGaw等;nature communications;第1-11页 *
Enhancement of trans-cleavage activity of Cas12a with engineered crRNA enables amplified nucleic acid detection;Long T. Nguyen等;nature communications;第1-13页 *
Prediction of activity and specificity of CRISPR-Cpf1 using convolutional deep learning neural networks;Jiesi Luo等;BMC Bioinformatics;第20卷(第332期);第1-10页 *

Also Published As

Publication number Publication date
CN115954048A (en) 2023-04-11

Similar Documents

Publication Publication Date Title
CN107957989B (en) Cluster-based word vector processing method, device and equipment
JP6793838B2 (en) Blockchain-based data processing methods and equipment
CN108345580B (en) Word vector processing method and device
CN116663618B (en) Operator optimization method and device, storage medium and electronic equipment
CN116860259B (en) Method, device and equipment for model training and automatic optimization of compiler
CN112860968A (en) Abnormity detection method and device
CN115954048B (en) Screening method and device for CRISPR-Cas system
CN112966577B (en) Method and device for model training and information providing
CN117828360A (en) Model training method, model training device, model code generating device, storage medium and storage medium
CN114861665B (en) Method and device for training reinforcement learning model and determining data relation
CN108170663A (en) Term vector processing method, device and equipment based on cluster
CN116822606A (en) Training method, device, equipment and storage medium of anomaly detection model
CN116434787A (en) Voice emotion recognition method and device, storage medium and electronic equipment
CN111539520A (en) Method and device for enhancing robustness of deep learning model
CN107844472B (en) Word vector processing method and device and electronic equipment
CN116186272B (en) Combined training method and device, storage medium and electronic equipment
CN116340469B (en) Synonym mining method and device, storage medium and electronic equipment
CN109753351A (en) A kind of Time-critical tasks processing method, device, equipment and medium
CN117407690B (en) Task execution method, device and equipment based on model migration evaluation
CN117743809B (en) Cell detection data preprocessing method, device and storage medium
CN117873789B (en) Checkpoint writing method and device based on segmentation quantization
CN115098271B (en) Multithreading data processing method, device, equipment and medium
CN110209746B (en) Data processing method and device for data warehouse
CN117520850A (en) Model training method and device, storage medium and electronic equipment
CN107577659A (en) Term vector processing method, device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant