[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2024214217A1 - Genetic analysis device and genetic analysis method - Google Patents

Genetic analysis device and genetic analysis method Download PDF

Info

Publication number
WO2024214217A1
WO2024214217A1 PCT/JP2023/014893 JP2023014893W WO2024214217A1 WO 2024214217 A1 WO2024214217 A1 WO 2024214217A1 JP 2023014893 W JP2023014893 W JP 2023014893W WO 2024214217 A1 WO2024214217 A1 WO 2024214217A1
Authority
WO
WIPO (PCT)
Prior art keywords
fluorescence intensity
signal
section
intensity data
feature
Prior art date
Application number
PCT/JP2023/014893
Other languages
French (fr)
Japanese (ja)
Inventor
徹 横山
功 原浦
尚哉 室岡
基博 山崎
周志 隅田
Original Assignee
株式会社日立ハイテク
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社日立ハイテク filed Critical 株式会社日立ハイテク
Priority to PCT/JP2023/014893 priority Critical patent/WO2024214217A1/en
Publication of WO2024214217A1 publication Critical patent/WO2024214217A1/en

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N27/00Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
    • G01N27/26Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating electrochemical variables; by using electrolysis or electrophoresis
    • G01N27/416Systems
    • G01N27/447Systems using electrophoresis

Definitions

  • the present invention relates to a genetic analysis device and a genetic analysis method.
  • the base sequence of a nucleic acid is determined by including the following steps (A) to (C) in that order: (A) a base peak extraction step of extracting base peaks from electrophoretic data including peaks of four types of base types obtained by electrophoretic separation of a sample nucleic acid; (B) a condition setting step of setting a search start base peak and a peak interval reference value for starting a search in time series data composed of the extracted base peaks; (C) starting from the search start base peak in the time series data, sequentially scanning between adjacent base peaks in the forward and backward directions of the time series, comparing the interval between base peaks with the peak interval reference value and adding an interpolated peak to a peak missing section, thereby determining the base sequence.”
  • the present invention aims to detect signal sections (signal regions) with high accuracy from time-series data showing the results of electrophoresis.
  • a representative genetic analysis device of the present invention comprises an acquisition unit that acquires time series data indicating the results of electrophoresis of a sample, and an analysis unit that analyzes the base sequence of the sample from the time series data, wherein the time series data includes a plurality of fluorescence intensity data corresponding to a plurality of bases, and the analysis unit divides the time series data into a plurality of intervals, generates for each of the plurality of fluorescence intensity data a feature amount indicating the frequency of occurrence of at least one of a maximum portion, a minimum portion, and a flat portion of the fluorescence intensity data in each interval, determines an interval feature amount from the plurality of feature amounts generated for the plurality of fluorescence intensity data based on a magnitude relationship between the feature amounts, and uses the interval feature amount to detect a signal region in the time series data that is a region to be analyzed for the base sequence.
  • one representative genetic analysis method of the present invention is characterized by comprising the steps of: acquiring time series data indicating the result of electrophoresis of the sample, the time series data including a plurality of fluorescence intensity data corresponding to a plurality of bases; dividing the time series data into a plurality of intervals; generating non-signal features of the fluorescence intensity data in each interval for each of the plurality of fluorescence intensity data based on the frequency of appearance of at least one of a maximum portion, a minimum portion, and a flat portion of the fluorescence intensity data in each interval; determining an interval feature from the plurality of feature features generated for the plurality of fluorescence intensity data based on a magnitude relationship between the feature features; and detecting a signal region, which is a region to be analyzed of the base sequence in the time series data, using the interval feature.
  • Example of the configuration of the gene analysis device according to the first embodiment Configuration example of electrophoresis apparatus according to the first embodiment 1 is a flowchart outlining a process executed by a gene analysis device according to a first embodiment of the present invention.
  • Flow of electrophoresis processing of real samples Base calling flow
  • Signal section detection flow A diagram explaining the characteristics of non-signal sections
  • Diagram of determining non-signal features of a section Diagram of signal boundary determination (part 1)
  • Diagram of signal boundary determination (part 2)
  • FIG. 1 is a diagram showing an example of the configuration of a gene analysis device 101 according to a first embodiment.
  • the genetic analysis device 101 includes an electrophoresis device 105 and a data analysis device 112.
  • the electrophoresis device 105 and the data analysis device 112 are communicatively connected using a communication cable.
  • the data analysis device 112 includes a central control unit 102 , a storage unit 104 , and a user interface unit 103 .
  • the central control unit 102 executes control and data processing of the electrophoretic device 105.
  • the central control unit 102 is, for example, a central processing unit (CPU) and a graphics processing unit (GPU).
  • the storage unit 104 stores programs executed by the central control unit 102, setting information for the electrophoretic device 105, information used for various processes, etc.
  • the storage unit 104 is, for example, a memory.
  • the user interface unit 103 is an interface for connecting to an input device and an output device, or an interface for connecting to an external device via a network.
  • the data analysis device 112 presents information to a user via the user interface unit 103, and also accepts information input by the user.
  • the central control unit 102 operates as a sample information setting unit 106, an electrophoresis device control unit 108, a fluorescence intensity calculation unit 110, and a base calling unit 107 by executing the programs stored in the memory unit 104.
  • a sample information setting unit 106 an electrophoresis device control unit 108
  • a fluorescence intensity calculation unit 110 a base calling unit 107 by executing the programs stored in the memory unit 104.
  • the sample information setting section 106 is a setting section for setting information related to a sample.
  • the electrophoresis device control unit 108 is a control unit that controls the electrophoresis of the sample performed by the electrophoresis device 105.
  • the fluorescence intensity calculation unit 110 is an acquisition unit that acquires time series data indicating the results of electrophoresis from the electrophoresis device 105.
  • the time series data includes a plurality of fluorescence intensity data corresponding to a plurality of bases.
  • the base calling unit 107 is an analysis unit that analyzes the base sequence of a sample from time-series data.
  • the base calling unit 107 includes an analysis interval detection unit 109.
  • the analysis interval detection unit 109 divides the time series data into a plurality of intervals, and generates a non-signal feature indicating a non-signal for each fluorescence intensity data based on the frequency of occurrence of maximum, minimum, and flat points in the fluorescence intensity data in each interval.
  • the feature is set to a value that is larger as the non-signal is more likely.
  • a value that is larger as the occurrence frequency is smaller, such as the inverse of the occurrence frequency or a value obtained by subtracting the occurrence frequency from a fixed value, may be generated as a signal feature.
  • the signal feature is set to a value that is larger as the signal is more likely.
  • the analysis interval detection unit 109 determines the minimum value of the multiple non-signal features generated for multiple fluorescence intensity data as the non-signal feature for that interval. If a signal feature is used, the maximum signal feature is determined as the signal feature for that interval. Then, the signal interval of the time series data is detected using the signal feature for the interval.
  • a signal interval (signal region) is an interval in the time series data that includes a change in fluorescence intensity due to the presence of bases.
  • a non-signal section is a section of the time series data that does not contain any changes in fluorescence intensity due to the presence of bases.
  • the electrophoresis device 105 electrophoreses the sample (DNA fragments) and obtains electrophoresis data.
  • the electrophoresis data is time-series data of the brightness values of DNA fragments labeled with fluorescent dyes.
  • Figure 2 is a diagram showing an example of the configuration of the electrophoresis device 105 of Example 1.
  • the electrophoresis device 105 has a detection unit 216, a thermostatic chamber 218, a transport machine 225, a high-voltage power supply 204, a first ammeter 205, an anode electrode 211, a second ammeter 212, a capillary array 217, and a pump mechanism 203.
  • the capillary array 217 is a replacement component that includes multiple (e.g., eight) capillaries 202, and includes a load header 229, a detection unit 216, and a capillary head 233. In addition, if a capillary 202 is damaged or its quality deteriorates, it can be replaced with a new capillary array 217.
  • the capillary 202 is made of a glass tube with an inner diameter of several tens to several hundred microns and an outer diameter of several hundred microns, and its surface is coated with polyimide to improve its strength.
  • the light irradiation section where the laser light is irradiated has a structure where the polyimide coating has been removed so that the internal light emission can easily leak to the outside.
  • the inside of the capillary 202 is filled with a separation medium that creates a difference in migration speed during electrophoresis. Separation media come in both fluid and non-fluid types, but in Example 1, a fluid polymer is used.
  • the high-voltage power supply 204 applies a high voltage to the capillary 202.
  • the first ammeter 205 detects the current emitted from the high-voltage power supply 204.
  • the second ammeter 212 detects the current flowing through the anode electrode 211.
  • the optical detection unit that detects the information light obtained from the sample is composed of a light source 214 that irradiates the detection unit 216 with excitation light, an optical detector 215 for detecting the light emitted within the detection unit 216, and a diffraction grating 232.
  • the detection unit 216 is a component that acquires information that depends on the sample.
  • the detection unit 216 When detecting a sample in the capillary 202 that has been separated by electrophoresis, the detection unit 216 is irradiated with excitation light from the light source 214, generating fluorescence having a wavelength that depends on the sample as information light. Furthermore, the diffraction grating 232 separates the information light in the wavelength direction, and the optical detector 215 detects the separated information light to analyze the sample.
  • the capillary cathode ends 227 are each fixed through a metallic hollow electrode 226, with the tip of the capillary 202 protruding from the hollow electrode 226 by approximately 0.5 mm.
  • the hollow electrodes 226 provided on each capillary 202 are all attached together to the load header 229. Furthermore, all hollow electrodes 226 are electrically connected to the high-voltage power supply 204 mounted on the main body of the device, and function as cathode electrodes when voltage application is required for electrophoresis, sample introduction, etc.
  • the capillary end opposite the capillary cathode end 227 (the other end) is bound together by the capillary head 233.
  • the capillary head 233 can be connected to the block 207 in a pressure-tight manner.
  • a high voltage is applied between the load header 229 and the capillary head 233 from the high-voltage power supply 204.
  • new polymer is filled into the capillary 202 from the other end by the syringe 206.
  • the polymer in the capillary 202 is refilled for each measurement to improve the measurement performance.
  • the pump mechanism 203 is composed of a syringe 206 and a mechanism for pressurizing the syringe 206, and injects the polymer into the capillary 202.
  • Block 207 is a connection part for connecting the syringe 206, the capillary array 217, the anode buffer container 210, and the polymer container 209.
  • the thermostatic chamber 218 is covered with a heat insulating material to keep the capillaries 202 in the thermostatic chamber 218 at a constant temperature, and the temperature is controlled by a heating and cooling mechanism 220.
  • a fan 219 circulates and stirs the air in the thermostatic chamber 218, keeping the temperature of the capillary array 217 uniform and constant in position.
  • the transporter 225 transports various containers to the capillary cathode end 227.
  • the transporter 225 is equipped with three electric motors and linear actuators, and can move in three axial directions: up and down, left and right, and depth. At least one container can be placed on the moving stage 230 of the transporter 225. Furthermore, the moving stage 230 is equipped with an electric grip 231, which can grasp and release each container. Therefore, the buffer container 221, the washing container 222, the waste liquid container 223, and the sample plate 224 can be transported to the capillary cathode end 227 as necessary. Unnecessary containers are stored in a designated storage location within the electrophoresis device 105.
  • the user can use the data analysis device 112 to control various functions of the electrophoresis device 105 and obtain the electrophoresis data detected by the optical detection unit.
  • the electrophoresis device 105 may have sensors for acquiring information about the observation environment that affects electrophoresis (observation environment information).
  • the electrophoresis device 105 in FIG. 2 has an in-device sensor 240, a polymer sensor 241, and a buffer solution sensor 242.
  • the internal sensor 240 is a sensor for acquiring information about the internal environment of the electrophoresis device 105, and measures, for example, a temperature sensor, a humidity sensor, and an air pressure sensor within the electrophoresis device 105.
  • the polymer sensor 241 is a sensor for acquiring information about the quality of the polymer, such as a pH sensor and an electrical conductivity sensor.
  • the polymer sensor 241 is installed inside the polymer container 209, but the installation location is not limited to this.
  • the buffer solution sensor 242 is a sensor for obtaining information regarding the quality of the buffer solution, and may be, for example, a temperature sensor.
  • the buffer solution sensor 242 is installed in the anode buffer container 210, but the installation location is not limited to this.
  • the buffer solution sensor 242 may be installed in the buffer container 221.
  • FIG. 3 is a flowchart outlining the processing executed by the genetic analysis device 101 of the first embodiment.
  • the electrophoresis device 105 of the genetic analysis device 101 performs electrophoresis processing on the sample to be analyzed (step S301). Details of the electrophoresis processing will be explained using FIG. 4.
  • the data analysis device 112 of the genetic analysis device 101 performs spectrum correction to correct the wavelength characteristics of the device (step S302), and executes a fluorescence intensity calculation process using the electrophoresis data (step S303).
  • the fluorescence intensity calculation unit 110 calculates time series data of the fluorescence intensity of the fluorescent dye from the electrophoresis data, and detects the center position, height, width, etc. of the peak from the time series data of the fluorescence intensity.
  • the data analyzer 112 of the genetic analyzer 101 executes a mobility correction process on the time series data of the fluorescence intensity (step S304).
  • the data analyzer 112 of the genetic analyzer 101 executes base calling using the time series data of the fluorescence intensity corrected based on the result of the mobility correction process (step S305).
  • the base calling unit 107 identifies the base sequence of the sample using the time series data of the corrected fluorescence intensity.
  • FIG 4 shows the flow of electrophoresis processing of an actual sample in S301.
  • the basic steps of electrophoresis can be broadly divided into sample preparation (S401), analysis start event (S402), loading of migration medium (S403), preliminary migration (S404), sample introduction (S405), migration analysis (S406), and end of migration analysis (S407).
  • the operator of this device sets the samples and reagents in this device as sample preparation (S401) before starting the analysis. More specifically, first, the buffer container 221 and the anode buffer container 210 are filled with a buffer solution that forms part of the current path.
  • the buffer solution is, for example, an electrolyte solution commercially available from various companies for electrophoresis.
  • the sample to be analyzed is dispensed into the wells of the sample plate 224.
  • the sample is, for example, a PCR product of DNA.
  • a cleaning solution for cleaning the capillary cathode end 227 is dispensed into the cleaning container 222.
  • the cleaning solution is, for example, pure water.
  • a migration medium for electrophoresis of the sample is injected into the syringe 206.
  • the migration medium is, for example, a polyacrylamide separation gel or polymer commercially available from various companies for electrophoresis.
  • the capillary array 217 is replaced if degradation of the capillary 202 is expected or if the length of the capillary 202 is to be changed.
  • the samples set on the sample plate 224 at this time include the actual DNA sample to be analyzed, as well as a positive control, a negative control, and an allelic ladder, each of which is electrophoresed in a different capillary.
  • the positive control is, for example, a PCR product containing known DNA, and is a sample used in a control experiment to confirm that DNA has been correctly amplified by PCR.
  • the negative control is a PCR product that does not contain DNA, and is a sample used in a control experiment to confirm that the PCR amplified product has not been contaminated by the operator's DNA, dust, etc.
  • allelic ladder is an artificial sample that contains many alleles that may commonly be contained in a DNA marker, and is usually provided by reagent manufacturers as part of a reagent kit for DNA identification. Allelic ladders are used to fine-tune the correspondence between the DNA fragment length of each DNA marker and the allele.
  • the operator specifies the type of allelic ladder, the type of size standard, the type of fluorescent reagent, and the type of sample set in the wells on the sample plate 224 corresponding to each capillary.
  • the type of sample specified is any one of real sample, positive control, negative control, and allelic ladder. This information is set in the sample information setting section 106 on the data analysis device 112 via the user interface section 103.
  • the operator After completing the above sample preparation (S401), the operator operates the user interface unit 103 on the data analysis device 112 to instruct the start of analysis. This instruction to start analysis is passed to the electrophoresis device control unit 108.
  • the electrophoresis device control unit 108 sends an analysis start signal to the electrophoresis device 105, thereby starting the analysis (S402).
  • the electrophoresis device 105 starts filling the migration medium (S403). This step may be performed automatically after the start of the analysis, or may be performed sequentially by sending a control signal from the electrophoresis device control unit 108. Filling the migration medium is a procedure in which new migration medium is filled into the capillary 202 to form a migration path.
  • the waste liquid container 223 is transported directly below the load header 229 by the transport machine 225, and the solenoid valve 213 is closed so that the used migration medium discharged from the capillary cathode end 227 can be received. Then, the syringe 206 is driven to fill the capillary 202 with new migration medium, and the used migration medium is discarded. Finally, the capillary cathode end 227 is immersed in a cleaning solution in the cleaning container 222, and the capillary cathode end 227 contaminated by the migration medium is cleaned.
  • preliminary electrophoresis is performed. This step may be performed automatically or sequentially by sending a control signal from the electrophoresis device control unit 108.
  • Preliminary electrophoresis is a procedure in which a predetermined voltage is applied to the electrophoretic medium to make the electrophoretic medium suitable for electrophoresis.
  • the capillary cathode end 227 is immersed in the buffer solution in the buffer container 221 by the conveyor 225 to form a current path.
  • the capillary cathode end 227 is immersed in the cleaning solution in the cleaning container 222 to clean the capillary cathode end 227 contaminated by the buffer solution.
  • sample introduction is performed. This step may be performed automatically or sequentially by sending a control signal from the electrophoresis device control unit 108.
  • sample introduction sample components are introduced into the migration path.
  • the capillary cathode end 227 is immersed in the sample held in the well of the sample plate 224 by the conveyor 225, and then the solenoid valve 213 is opened. This forms a current path, and the sample components are ready to be introduced into the migration path. Then, a pulse voltage is applied to the current path by the high-voltage power supply 204, and the sample components are introduced into the migration path. Finally, the capillary cathode end 227 is immersed in a cleaning solution in the cleaning container 222, and the capillary cathode end 227 contaminated by the sample is washed.
  • electrophoretic analysis S406 is performed. This step may be performed automatically or sequentially by sending a control signal from the electrophoretic device control unit 108.
  • electrophoretic analysis S406
  • each sample component contained in the sample is separated and analyzed by electrophoresis.
  • the capillary cathode end 227 is immersed in the buffer solution in the buffer container 221 by the conveyor 225 to form a current path.
  • a high voltage of about 15 kV is applied to the current path by the high-voltage power supply 204 to generate an electric field in the electrophoretic path.
  • each sample component in the electrophoretic path moves to the detection unit 216 at a speed that depends on the properties of each sample component.
  • the sample components are separated due to the difference in their moving speed.
  • the sample components that reach the detection unit 216 are detected in order.
  • the migration speed differs depending on the base length, and the DNAs reach the detection unit 216 in order starting from the shortest base length.
  • a fluorescent dye that depends on the terminal base sequence is attached to each DNA.
  • This information light is detected by the optical detector 215.
  • the optical detector 215 detects this information light at regular time intervals and transmits image data to the data analysis device 112.
  • the luminance of only a part of the image data may be transmitted instead of the image data.
  • luminance values sampled only at wavelength positions at regular intervals may be transmitted for each capillary.
  • This luminance value data represents the spectral waveform of each capillary. This spectral waveform is stored in the memory unit 104.
  • FIG. 5 shows the flow of base calling in S305.
  • the analysis interval detection unit 109 of the base calling unit 107 detects a signal interval from the time-series data of the corrected fluorescence intensity (step S501).
  • the base calling unit 107 analyzes the detected signal section and identifies the base sequence of the sample (step S502).
  • Step S601 The analysis interval detection unit 109 divides the entire time series data into a plurality of small intervals. Then, the process proceeds to step S602. In step S602, the analysis interval detection unit 109 selects one of the small intervals and generates non-signal feature values for each signal included in that interval. Each signal is four pieces of fluorescence intensity data corresponding to four bases. The analysis interval detection unit 109 generates non-signal feature values for each of the four pieces of fluorescence intensity data in the selected small interval. Then, the process proceeds to step S603.
  • Step S603 The analysis section detection unit 109 determines the non-signal feature of the selected subsection. Specifically, the analysis section detection unit 109 sets the smallest feature of the four non-signal features calculated from the four fluorescence intensity data as the non-signal feature of the subsection. After step S603, if there are still subsections remaining for which non-signal features have not been determined, the process returns to step S602. Once non-signal features have been determined for all subsections, the process proceeds to step S604. Note that, as described above, when signal features are used instead of non-signal features, the largest signal feature is set as the signal feature of the subsection, and processing is performed in the same manner as above.
  • Step S604 The analysis section detection unit 109 uses the non-signal features determined for each small section to determine the boundary between the non-signal section and the signal section, and ends the process.
  • Figure 7 is a diagram explaining the features of the non-signal section.
  • Figure 8 is a diagram explaining the features of the signal section.
  • Dye1 to Dye4 indicate four fluorescent dyes corresponding to four bases.
  • the horizontal axis is time and the vertical axis is fluorescence intensity.
  • the analysis section detection unit 109 generates non-signal features based on the number of occurrences of the three shape patterns (maximum, minimum, flat) in the fluorescence intensity data.
  • the generation of non-signal features from the shape patterns corresponds to step S602.
  • FIG. 9 is an explanatory diagram of generation of non-signal features from a shape pattern.
  • the shape pattern "flat" is defined as a case where the intensity difference between adjacent points is ⁇ h1. In other words, the following formula is satisfied.
  • the points referred to here are individual sample values of the electrophoretic signal, and are determined by the time interval or sampling rate at which the optical detector 215 acquires data. This time interval is determined in advance by the user or as a default value for the device.
  • -h1 ⁇ (y[k+1]-y[k]) ⁇ h1
  • the shape pattern "maximum” is a pattern that satisfies the following formula.
  • h1, h2, and h3 may be values that are determined in advance according to the sampling rate and the electrophoretic voltage.
  • the analysis interval detection unit 109 regards the number of times the three patterns appear as non-signal features of the fluorescence intensity data in that interval. Note that it is also possible to normalize by the interval length and regard the frequency of appearance of the three patterns as non-signal features.
  • the "flat" shape pattern is unlikely to appear in signal sections and is therefore highly important as a feature of non-signal sections. Therefore, the "flat" shape pattern may be weighted more heavily than the other shape patterns to generate non-signal features.
  • FIG. 10 is an explanatory diagram of the determination of the non-signal feature of a section in S603.
  • the minimum value of the non-signal feature of the fluorescence intensity data within the section is set as the non-signal feature of that section.
  • the graph shown in FIG. 10 is the fluorescence intensity data of Dye1 to Dye4.
  • F(Dye1) to F(Dye4) are the non-signal feature generated from the fluorescence intensity data of Dye1 to Dye4.
  • the analysis section detection unit 109 finds the non-signal feature Fq for section q using Min(F(Dye1), F(Dye2), F(Dye3), F(Dye4)).
  • FIGS. 11 and 12 are explanatory diagrams of the determination of the signal boundary in S604.
  • the analysis interval detection unit 109 plots the non-signal features of each interval and performs smoothing and interpolation. This makes it possible to suppress the effects of small fluctuations in the non-signal features.
  • the analysis interval detection unit 109 determines the time at which the smoothed and interpolated non-signal features exceed a threshold value as the signal interval boundary.
  • the analysis interval detection unit 109 may determine the boundary to be the time when the interval in which the threshold value is exceeded continuously is equal to or exceeds a certain margin. This makes it possible to be robust against the effects of fluctuations near the boundary.
  • FIG. 13 is an explanatory diagram of a case where a threshold is determined from the distribution of non-signal features in a section.
  • the distribution of non-signal features is assumed to be bimodal.
  • the signal portion is low, and the non-signal portion is high.
  • the analysis section detection unit 109 can determine the threshold based on this distribution. For example, X% of the peak Fp of the higher (non-signal) mountain can be set as the threshold. Alternatively, a value at which the slope of the mountain becomes relatively flat can be set as the threshold.
  • a predetermined fixed value may always be used.
  • the threshold value is changed according to the signal level. That is, the threshold value is determined according to the following (1) and (2). (1) If the signal strength is above a certain level, it is always determined to be a signal section. (2) If the signal strength is below a certain level, the threshold is lowered according to the signal strength. Here, in the range (2), the lower the signal strength, the easier it is to determine that it is a non-signal section. Alternatively, a differential signal may be used. By lowering the threshold as the difference increases, the rising and falling edges of the signal section can be detected.
  • a feature vector including non-signal features and other features is given as input to a signal section identifier 121, and a signal section is obtained as output. Any other feature such as signal intensity or a differential signal can be used.
  • the signal interval identifier 121 any model such as a Deep Neural Network (DNN), a Support-Vector Machine (SVM), or Random Forest can be used.
  • the output may be a discrimination result of whether it is a signal section or a non-signal section, or may be the probability (likelihood) of it being a signal section, or the like.
  • FIG. 16 shows ensemble learning that combines multiple classifiers.
  • a first feature vector is provided as input to the first signal section classifier 122
  • a second feature vector is provided as input to the second signal section classifier 123.
  • the outputs of the signal section classifiers 122-123 are then input to the discriminator 124, which finally obtains an output.
  • the first feature vector includes, for example, a non-signal feature and a signal intensity
  • the second feature vector includes, for example, a differential signal.
  • the classifier 124 determines the output by majority vote, etc. Other methods such as bagging, boosting, and stacking may also be used.
  • FIG. 17 is an explanatory diagram of the configuration for learning on the device.
  • the signal section information storage unit 125 is a memory unit provided in the genetic analysis device. Each time the user performs a measurement, the signal section information storage unit 125 stores the signal section information (a label indicating whether it is a signal section or not) in association with the feature vector.
  • the analysis section detection unit 109 can read the feature vector and label from the signal section information storage unit 125 and provide them to a signal section classifier to perform supervised learning.
  • FIG. 18 is an explanatory diagram of learning that reflects the results of user adjustments.
  • the operation and the results are stored in a specified information storage unit and reflected in the next learning. This improves the accuracy of signal section detection.
  • FIG. 18 shows an example in which the user operates the boundary between the signal section and the non-signal section, the results of adjustments to other parameters can also be used. For example, if the user adjusts non-signal feature parameters (conditions for flat sections, conditions for maximum and minimum points), threshold settings, and parameters related to signal boundary determination, the results of that operation can be stored and learned.
  • the base call unit 107 in the genetic analysis device 101 may reanalyze fluorescence intensity data other than the fluorescence intensity data generated from the electrophoresis results in the fluorescence intensity calculation unit 110.
  • the fluorescence intensity data may be stored in the storage unit 104 or may be transmitted through a communication cable.
  • the user can adjust the analysis interval by editing the fluorescence intensity data.
  • FIG. 19 is an explanatory diagram of a case where the analysis interval is corrected by editing the fluorescence intensity data.
  • the data is corrected so that the fluorescence intensity data near the beginning includes many flat parts, so that the non-signal feature amount increases and the signal start position can be moved.
  • the fluorescence intensity near the beginning is set to a zero value, the fluorescence intensity will change significantly, and the base call result near the beginning may change compared to before the correction.
  • the signal start position is changed by increasing the flat parts so that the signal intensity is within a certain range (gray range in the lower part of the figure) from the signal intensity before the correction.
  • Such editing of the fluorescence intensity data may be performed using an external tool.
  • the genetic analysis device 101 may have a function for editing such fluorescence intensity data.
  • the disclosed genetic analysis device 101 includes a fluorescence intensity calculation unit 110 as an acquisition unit that acquires time-series data showing the results of electrophoresis of a sample, and a base calling unit 107 as an analysis unit that analyzes the base sequence of the sample from the time-series data.
  • the time-series data includes a plurality of fluorescence intensity data corresponding to a plurality of bases
  • the analysis unit divides the time-series data into a plurality of intervals, generates for each of the plurality of fluorescence intensity data a feature amount indicating the frequency of occurrence of at least one of a maximum portion, a minimum portion, and a flat portion of the fluorescence intensity data in each interval, determines an interval feature amount from the plurality of feature amounts generated for the plurality of fluorescence intensity data based on the magnitude relationship of the feature amounts, and detects a signal region, which is an analysis target region of the base sequence in the time-series data, using the interval feature amount.
  • a signal section (signal region) can be detected with high accuracy from time-series data indicating the results of electrophoresis.
  • a signal section including a low-intensity signal can be detected with high accuracy.
  • a signal section can be detected with high accuracy even when the signal intensity varies widely.
  • the detection accuracy of a signal section is improved for data including an unintended, suddenly high-intensity signal (dye blob) caused by sample pretreatment, data including a high-intensity signal that appears at the end of a PCR reaction, and data in which the signal intensity is attenuated due to a special sample or pretreatment.
  • the analysis section determines, as the section feature, the feature that is the smallest among the feature values of the plurality of pieces of fluorescence intensity data.
  • the signal section can be detected by identifying the non-signal section using a shape pattern that appears characteristically not in the signal section but in the non-signal section.
  • the characteristics of electrophoresis due to the characteristics of electrophoresis, only one base will exist at the same position among multiple fluorescence intensity data, so by selecting the feature that is most likely to be a signal from the multiple fluorescence intensity data as the representative feature of the section, the characteristics of electrophoresis can be utilized to detect the signal section with high accuracy.
  • the analysis unit may compare the feature amount of the section with a threshold value to determine whether the section is a non-signal section.
  • the threshold value may be a predetermined fixed value or a value calculated from the distribution of the feature amount of the section. In this configuration, the signal section can be detected with high accuracy by taking into account the distribution of non-signal features.
  • the analysis unit determines that the boundary between the consecutive sections and the signal area adjacent to the consecutive sections is the boundary between the signal area and non-signal area. This configuration can suppress the effects of fluctuations near the boundaries of the signal sections.
  • the analysis unit detects the signal region using a discrimination model in which a feature amount of the section is one of inputs.
  • signal sections can be flexibly detected from non-signal features and other features.
  • the analysis unit generates feature quantities of the fluorescence intensity data by using a weight that is greater for flat portions of the fluorescence intensity data than for maximum and minimum portions of the fluorescence intensity data. In this way, by placing emphasis on the characteristic shape pattern of the non-signal section, the signal section can be detected with high accuracy.
  • the analysis unit can detect a second signal section different from the fluorescence intensity data by using second fluorescence intensity data edited from the fluorescence intensity data, where the second fluorescence intensity data is a deviation amount within a certain range from the intensity of the fluorescence intensity data.
  • the present invention is not limited to the above-mentioned embodiment, and various modifications are included.
  • the above-mentioned embodiment is described in detail to easily explain the present invention, and is not necessarily limited to the embodiment having all the described configurations.
  • the present invention is not limited to the deletion of the configurations, and it is also possible to replace or add the configurations.
  • a configuration was exemplified in which a device that learns from data and a device that updates the signal section discriminator are integrated, but the learning and updating of the signal section discriminator may be performed by separate devices.
  • Reference Signs List 101 Genetic analysis device
  • 102 Central control unit
  • 103 User interface unit
  • 104 Memory unit
  • 105 Electrophoresis device
  • 106 Sample information setting unit
  • 107 Base calling unit
  • 108 Electrophoresis device control unit
  • 109 Analysis section detection unit
  • 110 Fluorescence intensity calculation unit
  • 112 Data analysis device
  • 121 to 123 Signal section discriminator
  • 124 Discriminator
  • 125 Signal section information storage unit

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Electrochemistry (AREA)
  • Physics & Mathematics (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Investigating, Analyzing Materials By Fluorescence Or Luminescence (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

This genetic analysis device comprises an acquisition unit that acquires time series data indicating the result of electrophoresis of a sample; and an analysis unit that analyzes a base sequence of the sample from the time series data. The time series data includes multiple sets of fluorescence intensity data corresponding to multiple bases. The analysis unit divides the time series data into multiple sections, generates, for each of the multiple sets of fluorescence intensity data, a feature amount indicating an emergence frequency of at least one of a local maximum portion, a local minimum portion, and a flat portion of the fluorescence intensity data in each of the sections, determines, from among the multiple feature amounts generated for the multiple sets of fluorescence intensity data, a section feature amount on the basis of the magnitude relationship of the feature amounts, and detects, by using the section feature amount, a signal region which is an analysis target region of the base sequence in the time series data.

Description

遺伝子解析装置及び遺伝子解析方法Genetic analysis device and genetic analysis method
 本発明は、遺伝子解析装置及び遺伝子解析方法に関する。 The present invention relates to a genetic analysis device and a genetic analysis method.
 従来、塩基配列を決定するため、国際公開2008/050426号(特許文献1)に記載の技術がある。この公報には、「劣化した部分を含む泳動データであっても塩基配列を正確に解析することができるようにする。」「以下の工程(A)から(C)をその順に含んで核酸の塩基配列を決定する。(A)試料核酸を電気泳動分離して得られた4種類の塩基種別のピークを含む電気泳動データから基盤ピークを抽出する基盤ピーク抽出工程、(B)抽出された基盤ピークにより構成される時系列データにおいて、探索を開始する探索始点基盤ピーク及びピーク間隔基準値を設定する条件設定工程、(C)前記時系列データにおいて、探索始点基盤ピークを始点として、隣接する基盤ピーク間を時系列の前方向及び後方向に順次走査し、基盤ピーク間の間隔を前記ピーク間隔基準値とを比較してピーク欠落区間に補間ピークを追加することにより塩基配列を決定する。」という記載がある。  There is a conventional technique for determining base sequences, described in International Publication WO 2008/050426 (Patent Document 1). This publication states that "it is possible to accurately analyze base sequences even in electrophoretic data that includes degraded portions." and "The base sequence of a nucleic acid is determined by including the following steps (A) to (C) in that order: (A) a base peak extraction step of extracting base peaks from electrophoretic data including peaks of four types of base types obtained by electrophoretic separation of a sample nucleic acid; (B) a condition setting step of setting a search start base peak and a peak interval reference value for starting a search in time series data composed of the extracted base peaks; (C) starting from the search start base peak in the time series data, sequentially scanning between adjacent base peaks in the forward and backward directions of the time series, comparing the interval between base peaks with the peak interval reference value and adding an interpolated peak to a peak missing section, thereby determining the base sequence."
国際公開2008/050426号International Publication No. 2008/050426
 従来の技術では、信号区間(信号領域)と非信号区間(非信号領域)とを識別する精度に課題があった。特に、強度の低い信号を含む信号領域の検出精度が十分でなかった。例えば、全体の信号強度の分布から信号と非信号を区別する閾値を求める方法では、強度の高い信号の影響によって閾値が大きくなり、強度の低い信号を含む区間を非信号区間と誤判定することがあった。局所的な信号強度の変化(立ち上がり・立下り)をみつけ、信号と非信号を区別する方法では、強度の低い信号は相対的に強度変化が小さくなるため区別が困難であった。信号の周期成分を抽出して信号区間を判定する方法では、非信号領域で信号に近い周期性が観測されると誤判定が生じていた。  Conventional technology had issues with the accuracy of distinguishing between signal sections (signal regions) and non-signal sections (non-signal regions). In particular, the accuracy of detecting signal regions containing low-intensity signals was insufficient. For example, in a method of determining a threshold for distinguishing between signals and non-signals from the overall distribution of signal strength, the threshold becomes large due to the influence of high-intensity signals, and sections containing low-intensity signals are sometimes erroneously determined to be non-signal sections. In a method of distinguishing between signals and non-signals by finding local changes in signal strength (rising and falling edges), it is difficult to distinguish because low-intensity signals have relatively small changes in strength. In a method of extracting periodic components of a signal to determine a signal section, erroneous determinations occur when periodicity close to that of a signal is observed in a non-signal region.
 そこで、本発明では、電気泳動の結果を示す時系列データから高精度に信号区間(信号領域)を検出することを目的とする。 The present invention aims to detect signal sections (signal regions) with high accuracy from time-series data showing the results of electrophoresis.
 上記目的を達成するために、代表的な本発明の遺伝子解析装置の一つは、サンプルの電気泳動の結果を示す時系列データを取得する取得部と、前記時系列データから前記サンプルの塩基配列を解析する解析部と、を備え、前記時系列データは、複数の塩基に対応する複数の蛍光強度データを含み、前記解析部は、前記時系列データを複数の区間に分割し、各区間における前記蛍光強度データの極大部分、極小部分、および平坦部分のうち少なくとも一つの出現頻度を示す特徴量を前記複数の蛍光強度データごとに生成し、前記複数の蛍光強度データについて生成した複数の特徴量の中から、特徴量の大小関係に基づいて区間特徴量を決定し、前記区間特徴量を用いて、前記時系列データにおける塩基配列の解析対象領域であるの信号領域を検出することを特徴とする。
 また、代表的な本発明の遺伝子解析方法の一つは、前記サンプルの電気泳動の結果を示す、複数の塩基に対応する複数の蛍光強度データを含む時系列データを取得するステップと、前記時系列データを複数の区間に分割するステップと、各区間における前記蛍光強度データの極大部分、極小部分、平坦部分のうち少なくとも1つの出現頻度に基づいて各区間における蛍光強度データの非信号特徴量を前記複数の蛍光強度データごとに生成するステップと、前記複数の蛍光強度データについて生成した複数の特徴量の中から、特徴量の大小関係に基づいて区間特徴量を決定するステップと、前記区間の特徴量を用いて、前記時系列データにおける塩基配列の解析対象領域である信号領域を検出するステップと、を含むことを特徴とする。
In order to achieve the above-mentioned object, a representative genetic analysis device of the present invention comprises an acquisition unit that acquires time series data indicating the results of electrophoresis of a sample, and an analysis unit that analyzes the base sequence of the sample from the time series data, wherein the time series data includes a plurality of fluorescence intensity data corresponding to a plurality of bases, and the analysis unit divides the time series data into a plurality of intervals, generates for each of the plurality of fluorescence intensity data a feature amount indicating the frequency of occurrence of at least one of a maximum portion, a minimum portion, and a flat portion of the fluorescence intensity data in each interval, determines an interval feature amount from the plurality of feature amounts generated for the plurality of fluorescence intensity data based on a magnitude relationship between the feature amounts, and uses the interval feature amount to detect a signal region in the time series data that is a region to be analyzed for the base sequence.
Moreover, one representative genetic analysis method of the present invention is characterized by comprising the steps of: acquiring time series data indicating the result of electrophoresis of the sample, the time series data including a plurality of fluorescence intensity data corresponding to a plurality of bases; dividing the time series data into a plurality of intervals; generating non-signal features of the fluorescence intensity data in each interval for each of the plurality of fluorescence intensity data based on the frequency of appearance of at least one of a maximum portion, a minimum portion, and a flat portion of the fluorescence intensity data in each interval; determining an interval feature from the plurality of feature features generated for the plurality of fluorescence intensity data based on a magnitude relationship between the feature features; and detecting a signal region, which is a region to be analyzed of the base sequence in the time series data, using the interval feature.
 本発明によれば、電気泳動の結果を示す時系列データから高精度に信号区間(信号領域)を検出できる。上記した以外の課題、構成及び効果は以下の実施の形態の説明により明らかにされる。 According to the present invention, it is possible to detect signal sections (signal regions) with high accuracy from time-series data showing the results of electrophoresis. Problems, configurations, and advantages other than those described above will become clear from the description of the embodiments below.
実施例1の遺伝子解析装置の構成例Example of the configuration of the gene analysis device according to the first embodiment 実施例1の電気泳動装置の構成例Configuration example of electrophoresis apparatus according to the first embodiment 実施例1の遺伝子解析装置が実行する処理の概要を説明するフローチャート1 is a flowchart outlining a process executed by a gene analysis device according to a first embodiment of the present invention. 実サンプルの電気泳動処理のフローFlow of electrophoresis processing of real samples ベースコールのフローBase calling flow 信号区間検出のフローSignal section detection flow 非信号区間の特徴を説明する図A diagram explaining the characteristics of non-signal sections 信号区間の特徴を説明する図A diagram explaining the characteristics of the signal section 形状パターンからの非信号特徴量の生成の説明図Illustration of generation of non-signal features from shape patterns 区間の非信号特徴量決定の説明図Diagram of determining non-signal features of a section 信号境界の決定の説明図(その1)Diagram of signal boundary determination (part 1) 信号境界の決定の説明図(その2)Diagram of signal boundary determination (part 2) 区間の非信号特徴量の分布から閾値を決定する場合の説明図A diagram illustrating a case where a threshold is determined based on the distribution of non-signal features in a section. 非信号特徴量と他の特徴量とを組み合わせて用いる場合の説明図(その1)Diagram of the case where non-signal features are used in combination with other features (part 1) 非信号特徴量と他の特徴量とを組み合わせて用いる場合の説明図(その2)Diagram of the case where non-signal features are used in combination with other features (part 2) 非信号特徴量と他の特徴量とを組み合わせて用いる場合の説明図(その3)Diagram of the case where non-signal features are used in combination with other features (part 3) 非信号特徴量と他の特徴量とを組み合わせて用いる場合の説明図(その4)Diagram of the combination of non-signal features and other features (part 4) 非信号特徴量と他の特徴量とを組み合わせて用いる場合の説明図(その5)Diagram of the case where non-signal features are used in combination with other features (part 5) 蛍光強度データの編集により解析区間を修正する場合の説明図An explanatory diagram for correcting the analysis interval by editing the fluorescence intensity data
 以下、実施例を図面を用いて説明する。 The following describes the embodiment with reference to the drawings.
 図1は、実施例1の遺伝子解析装置101の構成例を示す図である。
 遺伝子解析装置101は、電気泳動装置105及びデータ解析装置112を備える。電気泳動装置105及びデータ解析装置112は通信ケーブルを用いて通信可能に接続される。
FIG. 1 is a diagram showing an example of the configuration of a gene analysis device 101 according to a first embodiment.
The genetic analysis device 101 includes an electrophoresis device 105 and a data analysis device 112. The electrophoresis device 105 and the data analysis device 112 are communicatively connected using a communication cable.
 データ解析装置112は、中央制御部102、記憶部104、及びユーザインタフェース部103を有する。
 中央制御部102は、電気泳動装置105の制御及びデータ処理を実行する。中央制御部102は、例えば、CPU(Central Processing Unit)及びGPU(Graphics Processing Unit)である。
 記憶部104は、中央制御部102が実行するプログラム、電気泳動装置105の設定情報、各種処理に使用する情報等を格納する。記憶部104は、例えば、メモリである。
 ユーザインタフェース部103は、入力装置及び出力装置と接続するインタフェース、又は、ネットワークを介して外部装置と接続するインタフェースである。データ解析装置112は、ユーザインタフェース部103を介して、ユーザに情報を提示し、また、ユーザによって入力された情報を受け付ける。
The data analysis device 112 includes a central control unit 102 , a storage unit 104 , and a user interface unit 103 .
The central control unit 102 executes control and data processing of the electrophoretic device 105. The central control unit 102 is, for example, a central processing unit (CPU) and a graphics processing unit (GPU).
The storage unit 104 stores programs executed by the central control unit 102, setting information for the electrophoretic device 105, information used for various processes, etc. The storage unit 104 is, for example, a memory.
The user interface unit 103 is an interface for connecting to an input device and an output device, or an interface for connecting to an external device via a network. The data analysis device 112 presents information to a user via the user interface unit 103, and also accepts information input by the user.
 中央制御部102は、記憶部104に格納されるプログラムを実行することによって、サンプル情報設定部106、電気泳動装置制御部108、蛍光強度計算部110、及びベースコール部107として動作する。以下の説明では、機能部を主語に処理を説明する場合、中央制御部102がプログラムを実行していることを表す。 The central control unit 102 operates as a sample information setting unit 106, an electrophoresis device control unit 108, a fluorescence intensity calculation unit 110, and a base calling unit 107 by executing the programs stored in the memory unit 104. In the following explanation, when the processing is explained with the functional units as the subject, it means that the central control unit 102 is executing the programs.
 サンプル情報設定部106は、サンプルに関する情報を設定する設定部である。
 電気泳動装置制御部108は、電気泳動装置105によるサンプルの電気泳動を制御する制御部である。
 蛍光強度計算部110は、電気泳動の結果を示す時系列データを電気泳動装置105から取得する取得部である。時系列データは、複数の塩基に対応する複数の蛍光強度データを含む。
 ベースコール部107は、時系列データからサンプルの塩基配列を解析する解析部である。ベースコール部107は、解析区間検出部109を備える。
The sample information setting section 106 is a setting section for setting information related to a sample.
The electrophoresis device control unit 108 is a control unit that controls the electrophoresis of the sample performed by the electrophoresis device 105.
The fluorescence intensity calculation unit 110 is an acquisition unit that acquires time series data indicating the results of electrophoresis from the electrophoresis device 105. The time series data includes a plurality of fluorescence intensity data corresponding to a plurality of bases.
The base calling unit 107 is an analysis unit that analyzes the base sequence of a sample from time-series data. The base calling unit 107 includes an analysis interval detection unit 109.
 解析区間検出部109は、時系列データを複数の区間に分割し、各区間における蛍光強度データの極大、極小、平坦の出現頻度に基づいて非信号であることを示す非信号特徴量を蛍光強度データごとに生成する。この特徴量は、非信号らしいほど大きい値とする。もしくは上記の出現頻度の逆数や、上記出現頻度を固定値から引いた値など、上記出現頻度が小さいほど大きくなるような値を信号特徴量としてこれを生成してもよい。この場合、信号特徴量は信号らしいほど大きい値となる。以下、上記の非信号特徴量を用いた実施形態を述べるが、信号特徴量を用いた場合にも本実施形態は同様に適用可能である。解析区間検出部109は、複数の蛍光強度データについて生成した複数の非信号特徴量のうち、最小の値をその区間の非信号特徴量として決定する。もしも信号特徴量を用いる場合は、最大の信号特徴量をその区間の信号特徴量として決定する。そして、区間の信号特徴量を用いて時系列データの信号区間を検出する。信号区間(信号領域)とは、時系列データのうち、塩基の存在に由来する蛍光強度の変化が含まれる区間である。非信号区間(非信号領域)とは、時系列データのうち、塩基の存在に由来する蛍光強度の変化が含まれない区間である。 The analysis interval detection unit 109 divides the time series data into a plurality of intervals, and generates a non-signal feature indicating a non-signal for each fluorescence intensity data based on the frequency of occurrence of maximum, minimum, and flat points in the fluorescence intensity data in each interval. The feature is set to a value that is larger as the non-signal is more likely. Alternatively, a value that is larger as the occurrence frequency is smaller, such as the inverse of the occurrence frequency or a value obtained by subtracting the occurrence frequency from a fixed value, may be generated as a signal feature. In this case, the signal feature is set to a value that is larger as the signal is more likely. Below, an embodiment using the non-signal feature is described, but this embodiment is also applicable to the case where a signal feature is used. The analysis interval detection unit 109 determines the minimum value of the multiple non-signal features generated for multiple fluorescence intensity data as the non-signal feature for that interval. If a signal feature is used, the maximum signal feature is determined as the signal feature for that interval. Then, the signal interval of the time series data is detected using the signal feature for the interval. A signal interval (signal region) is an interval in the time series data that includes a change in fluorescence intensity due to the presence of bases. A non-signal section (non-signal region) is a section of the time series data that does not contain any changes in fluorescence intensity due to the presence of bases.
 電気泳動装置105は、サンプル(DNA断片)を電気泳動し、泳動データを取得する。泳動データは、蛍光色素で標識されたDNA断片の輝度値の時系列データである。 The electrophoresis device 105 electrophoreses the sample (DNA fragments) and obtains electrophoresis data. The electrophoresis data is time-series data of the brightness values of DNA fragments labeled with fluorescent dyes.
 ここで、電気泳動装置105の構成を説明する。図2は、実施例1の電気泳動装置105の構成例を示す図である。 The configuration of the electrophoresis device 105 will now be described. Figure 2 is a diagram showing an example of the configuration of the electrophoresis device 105 of Example 1.
 電気泳動装置105は、検出部216、恒温槽218、搬送機225、高圧電源204、第1電流計205、陽極側電極211、第2電流計212、キャピラリアレイ217、ポンプ機構203を有する。 The electrophoresis device 105 has a detection unit 216, a thermostatic chamber 218, a transport machine 225, a high-voltage power supply 204, a first ammeter 205, an anode electrode 211, a second ammeter 212, a capillary array 217, and a pump mechanism 203.
 キャピラリアレイ217は、複数本(例えば、8本)のキャピラリ202を含む交換部材であり、ロードヘッダ229、検出部216、及びキャピラリヘッド233を含む。また、キャピラリ202の破損又は品質の劣化に伴って、新品のキャピラリアレイ217に交換できる。 The capillary array 217 is a replacement component that includes multiple (e.g., eight) capillaries 202, and includes a load header 229, a detection unit 216, and a capillary head 233. In addition, if a capillary 202 is damaged or its quality deteriorates, it can be replaced with a new capillary array 217.
 キャピラリ202は、内径数十から数百ミクロン、外形数百ミクロンのガラス管で構成され、強度を向上させるために表面をポリイミドでコーティングしている。ただし、レーザ光が照射される光照射部は、内部の発光が外部に漏れやすいように、ポリイミド被膜が除去された構造になっている。キャピラリ202の内部には、電気泳動時に泳動速度差を与えるための分離媒体が充填される。分離媒体は、流動性及び非流動性の双方が存在するが、実施例1では流動性のポリマを用いる。 The capillary 202 is made of a glass tube with an inner diameter of several tens to several hundred microns and an outer diameter of several hundred microns, and its surface is coated with polyimide to improve its strength. However, the light irradiation section where the laser light is irradiated has a structure where the polyimide coating has been removed so that the internal light emission can easily leak to the outside. The inside of the capillary 202 is filled with a separation medium that creates a difference in migration speed during electrophoresis. Separation media come in both fluid and non-fluid types, but in Example 1, a fluid polymer is used.
 高圧電源204は、キャピラリ202に高い電圧を印加する。第1電流計205は、高圧電源204から発せられる電流を検出する。第2電流計212は、陽極側電極211に流れる電流を検出する。 The high-voltage power supply 204 applies a high voltage to the capillary 202. The first ammeter 205 detects the current emitted from the high-voltage power supply 204. The second ammeter 212 detects the current flowing through the anode electrode 211.
 サンプルより得られる情報光を検出する光学検出部は、検出部216に励起光を照射する光源214と、検出部216内の発光を検出するための光学検出器215、回折格子232で構成されている。検出部216は、サンプルに依存した情報を取得する部材である。 The optical detection unit that detects the information light obtained from the sample is composed of a light source 214 that irradiates the detection unit 216 with excitation light, an optical detector 215 for detecting the light emitted within the detection unit 216, and a diffraction grating 232. The detection unit 216 is a component that acquires information that depends on the sample.
 電気泳動により分離されたキャピラリ202中のサンプルを検出する場合、検出部216に光源214から励起光が照射することによって、サンプルに依存した波長を有する蛍光を情報光として生じさせる。さらに、回折格子232が波長方向に情報光を分光し、光学検出器215が分光された情報光を検出し、サンプルを解析する。 When detecting a sample in the capillary 202 that has been separated by electrophoresis, the detection unit 216 is irradiated with excitation light from the light source 214, generating fluorescence having a wavelength that depends on the sample as information light. Furthermore, the diffraction grating 232 separates the information light in the wavelength direction, and the optical detector 215 detects the separated information light to analyze the sample.
 キャピラリ陰極端227は、それぞれ金属製の中空電極226を通して固定されており、キャピラリ202の先端が中空電極226から0.5mm程度突き出た状態になっている。また、各キャピラリ202に装備された中空電極226は、すべてが一体となってロードヘッダ229に装着される。さらに、すべての中空電極226は、装置本体に搭載されている高圧電源204と導通しており、電気泳動及びサンプル導入等、電圧を印加する必要がある場合、陰極電極として機能する。 The capillary cathode ends 227 are each fixed through a metallic hollow electrode 226, with the tip of the capillary 202 protruding from the hollow electrode 226 by approximately 0.5 mm. The hollow electrodes 226 provided on each capillary 202 are all attached together to the load header 229. Furthermore, all hollow electrodes 226 are electrically connected to the high-voltage power supply 204 mounted on the main body of the device, and function as cathode electrodes when voltage application is required for electrophoresis, sample introduction, etc.
 キャピラリ陰極端227と反対側のキャピラリ端部(他端部)は、キャピラリヘッド233により一つに束ねられている。キャピラリヘッド233は、ブロック207に耐圧機密で接続できる。高圧電源204による高電圧はロードヘッダ229及びキャピラリヘッド233の間にかけられる。そして、シリンジ206によって、他端部からキャピラリ202内に新規ポリマが充填される。キャピラリ202中のポリマ詰め替えは、測定の性能を向上するために測定ごとに実施される。 The capillary end opposite the capillary cathode end 227 (the other end) is bound together by the capillary head 233. The capillary head 233 can be connected to the block 207 in a pressure-tight manner. A high voltage is applied between the load header 229 and the capillary head 233 from the high-voltage power supply 204. Then, new polymer is filled into the capillary 202 from the other end by the syringe 206. The polymer in the capillary 202 is refilled for each measurement to improve the measurement performance.
 ポンプ機構203は、シリンジ206と当該シリンジ206を加圧するための機構系とで構成され、キャピラリ202にポリマを注入する。 The pump mechanism 203 is composed of a syringe 206 and a mechanism for pressurizing the syringe 206, and injects the polymer into the capillary 202.
 ブロック207は、シリンジ206、キャピラリアレイ217、陽極バッファ容器210、及びポリマ容器209をそれぞれ連通させるための接続部である。 Block 207 is a connection part for connecting the syringe 206, the capillary array 217, the anode buffer container 210, and the polymer container 209.
 恒温槽218は、恒温槽218内のキャピラリ202を一定の温度に保つために、断熱材で覆われ、加熱冷却機構220によって温度が制御される。また、ファン219が恒温槽218内の空気を循環及び攪拌し、キャピラリアレイ217の温度を位置的に均一かつ一定に保つ。 The thermostatic chamber 218 is covered with a heat insulating material to keep the capillaries 202 in the thermostatic chamber 218 at a constant temperature, and the temperature is controlled by a heating and cooling mechanism 220. In addition, a fan 219 circulates and stirs the air in the thermostatic chamber 218, keeping the temperature of the capillary array 217 uniform and constant in position.
 搬送機225は、キャピラリ陰極端227に様々な容器を搬送する。搬送機225は、三つの電動モータ及びリニアアクチュエータを備え、上下、左右、及び奥行きの3軸方向に移動可能である。また、搬送機225の移動ステージ230には少なくとも一つ以上の容器を載せることができる。さらに、移動ステージ230には電動のグリップ231が備えられており、各容器を掴むこと及び放すことができる。このため、バッファ容器221、洗浄容器222、廃液容器223、及びサンプルプレート224を必要に応じて、キャピラリ陰極端227まで搬送できる。なお、不必要な容器は、電気泳動装置105内の所定収容所に保管される。 The transporter 225 transports various containers to the capillary cathode end 227. The transporter 225 is equipped with three electric motors and linear actuators, and can move in three axial directions: up and down, left and right, and depth. At least one container can be placed on the moving stage 230 of the transporter 225. Furthermore, the moving stage 230 is equipped with an electric grip 231, which can grasp and release each container. Therefore, the buffer container 221, the washing container 222, the waste liquid container 223, and the sample plate 224 can be transported to the capillary cathode end 227 as necessary. Unnecessary containers are stored in a designated storage location within the electrophoresis device 105.
 ユーザは、データ解析装置112を用いて、電気泳動装置105の各種機能を制御し、光学検出部によって検出された泳動データを取得することができる。 The user can use the data analysis device 112 to control various functions of the electrophoresis device 105 and obtain the electrophoresis data detected by the optical detection unit.
 電気泳動装置105には、電気泳動に影響を与える観測環境に関する情報(観測環境情報)を取得するためのセンサが存在してもよい。図2の電気泳動装置105は、装置内センサ240、ポリマセンサ241、緩衝液センサ242を有する。 The electrophoresis device 105 may have sensors for acquiring information about the observation environment that affects electrophoresis (observation environment information). The electrophoresis device 105 in FIG. 2 has an in-device sensor 240, a polymer sensor 241, and a buffer solution sensor 242.
 装置内センサ240は、電気泳動装置105の内部環境に関する情報を取得するためのセンサであり、例えば、電気泳動装置105内の温度センサ、湿度センサ、及び気圧センサ等を計測する。 The internal sensor 240 is a sensor for acquiring information about the internal environment of the electrophoresis device 105, and measures, for example, a temperature sensor, a humidity sensor, and an air pressure sensor within the electrophoresis device 105.
 ポリマセンサ241は、ポリマの品質に関する情報を取得するためのセンサであり、例えば、PHセンサ及び電気伝導率センサ等である。ポリマセンサ241は、図2ではポリマ容器209内に設置されているが、設置場所はこれに限定されない。 The polymer sensor 241 is a sensor for acquiring information about the quality of the polymer, such as a pH sensor and an electrical conductivity sensor. In FIG. 2, the polymer sensor 241 is installed inside the polymer container 209, but the installation location is not limited to this.
 緩衝液センサ242は、緩衝液の品質に関する情報を取得するためのセンサであり、例えば、温度センサがある。緩衝液センサ242は、図2では陽極バッファ容器210内に設置されているが、設置場所はこれに限定されない。例えば、バッファ容器221内に設定されていてもよい。 The buffer solution sensor 242 is a sensor for obtaining information regarding the quality of the buffer solution, and may be, for example, a temperature sensor. In FIG. 2, the buffer solution sensor 242 is installed in the anode buffer container 210, but the installation location is not limited to this. For example, the buffer solution sensor 242 may be installed in the buffer container 221.
 図3は、実施例1の遺伝子解析装置101が実行する処理の概要を説明するフローチャートである。 FIG. 3 is a flowchart outlining the processing executed by the genetic analysis device 101 of the first embodiment.
 遺伝子解析装置101の電気泳動装置105は、解析対象のサンプルに対して電気泳動処理を実行する(ステップS301)。電気泳動処理の詳細は図4を用いて説明する。 The electrophoresis device 105 of the genetic analysis device 101 performs electrophoresis processing on the sample to be analyzed (step S301). Details of the electrophoresis processing will be explained using FIG. 4.
 次に、遺伝子解析装置101のデータ解析装置112は、機器の波長特性を補正するスペクトル補正を行い(ステップS302)、泳動データを用いた蛍光強度計算処理を実行する(ステップS303)。具体的には、蛍光強度計算部110が、泳動データから蛍光色素の蛍光強度の時系列データを算出し、蛍光強度の時系列データからピークの中心位置、高さ、及び幅等を検出する。 Then, the data analysis device 112 of the genetic analysis device 101 performs spectrum correction to correct the wavelength characteristics of the device (step S302), and executes a fluorescence intensity calculation process using the electrophoresis data (step S303). Specifically, the fluorescence intensity calculation unit 110 calculates time series data of the fluorescence intensity of the fluorescent dye from the electrophoresis data, and detects the center position, height, width, etc. of the peak from the time series data of the fluorescence intensity.
 次に、遺伝子解析装置101のデータ解析装置112は、蛍光強度の時系列データに対して移動度補正処理を実行する(ステップS304)。
 次に、遺伝子解析装置101のデータ解析装置112は、移動度補正処理の結果に基づいて補正された蛍光強度の時系列データを用いてベースコールを実行する(ステップS305)。具体的には、ベースコール部107が、補正された蛍光強度の時系列データを用いてサンプルの塩基配列を特定する。
Next, the data analyzer 112 of the genetic analyzer 101 executes a mobility correction process on the time series data of the fluorescence intensity (step S304).
Next, the data analyzer 112 of the genetic analyzer 101 executes base calling using the time series data of the fluorescence intensity corrected based on the result of the mobility correction process (step S305). Specifically, the base calling unit 107 identifies the base sequence of the sample using the time series data of the corrected fluorescence intensity.
 図4は、S301における実サンプルの電気泳動処理のフローを示している。電気泳動の基本的手順は、サンプル準備(S401)、分析開始イベント(S402)、泳動媒体充填(S403)、予備泳動(S404)、サンプル導入(S405)、泳動分析(S406)、泳動分析終了(S407)に大別できる。 Figure 4 shows the flow of electrophoresis processing of an actual sample in S301. The basic steps of electrophoresis can be broadly divided into sample preparation (S401), analysis start event (S402), loading of migration medium (S403), preliminary migration (S404), sample introduction (S405), migration analysis (S406), and end of migration analysis (S407).
 本装置のオペレータは、分析開始前のサンプル準備(S401)として、サンプルや試薬を本装置にセットする。より具体的には、まず、バッファ容器221と陽極バッファ容器210に、通電路の一部を形成する緩衝液を満たす。緩衝液は、例えば、各社から電気泳動用として市販されている電解質液である。また、サンプルプレート224のウェル内に、分析対象であるサンプルを分注する。サンプルは、例えば、DNAのPCR産物である。また、洗浄容器222に、キャピラリ陰極端227を洗浄する為の洗浄溶液を分注する。洗浄溶液は、例えば、純水である。また、シリンジ206内に、サンプルを電気泳動する為の泳動媒体を注入する。泳動媒体は、例えば各社から電気泳動用として市販されているポリアクリルアミド系分離ゲルやポリマなどである。さらに、キャピラリ202の劣化が予想される場合や、キャピラリ202の長さを変更する場合、キャピラリアレイ217を交換する。 The operator of this device sets the samples and reagents in this device as sample preparation (S401) before starting the analysis. More specifically, first, the buffer container 221 and the anode buffer container 210 are filled with a buffer solution that forms part of the current path. The buffer solution is, for example, an electrolyte solution commercially available from various companies for electrophoresis. The sample to be analyzed is dispensed into the wells of the sample plate 224. The sample is, for example, a PCR product of DNA. A cleaning solution for cleaning the capillary cathode end 227 is dispensed into the cleaning container 222. The cleaning solution is, for example, pure water. A migration medium for electrophoresis of the sample is injected into the syringe 206. The migration medium is, for example, a polyacrylamide separation gel or polymer commercially available from various companies for electrophoresis. The capillary array 217 is replaced if degradation of the capillary 202 is expected or if the length of the capillary 202 is to be changed.
 このときに、サンプルプレート224にセットされるサンプルとしては、解析の対象であるDNAの実サンプルの他、ポジティブコントロール、ネガティブコントロール、アレリックラダーとがあり、それぞれ異なるキャピラリにおいて電気泳動される。ポジティブコントロールは、例えば既知のDNAを含むPCR産物であり、PCRによってDNAが正しく増幅されていることを確認するための対照実験用のサンプルである。ネガティブコントロールとは、DNAを含まないPCR産物であり、PCRの増幅物にオペレータのDNAや塵などのコンタミネーションが生じていないことを確認するための対照実験用のサンプルである。 The samples set on the sample plate 224 at this time include the actual DNA sample to be analyzed, as well as a positive control, a negative control, and an allelic ladder, each of which is electrophoresed in a different capillary. The positive control is, for example, a PCR product containing known DNA, and is a sample used in a control experiment to confirm that DNA has been correctly amplified by PCR. The negative control is a PCR product that does not contain DNA, and is a sample used in a control experiment to confirm that the PCR amplified product has not been contaminated by the operator's DNA, dust, etc.
 アレリックラダーとは、DNAマーカに一般的に含まれる可能性のあるアレルを多く含む人工的なサンプルであり、通常、DNA鑑定用の試薬キットとして試薬メーカから提供される。アレリックラダーは、個々のDNAマーカのDNA断片長とアレルとの対応関係を微調整する目的で使用される。 An allelic ladder is an artificial sample that contains many alleles that may commonly be contained in a DNA marker, and is usually provided by reagent manufacturers as part of a reagent kit for DNA identification. Allelic ladders are used to fine-tune the correspondence between the DNA fragment length of each DNA marker and the allele.
 また上記の実サンプル、ポジティブコントロール、ネガティブコントロール、及びアレリックラダー、のサンプル全てに対して、サイズスタンダードと呼ばれる、特定の蛍光色素で標識された既知のDNA断片が混ぜられる。使用する試薬キットによってサイズスタンダードに割り当てられる蛍光色素の種類は異なる。  In addition, all of the above samples, including the actual sample, positive control, negative control, and allelic ladder, are mixed with known DNA fragments labeled with specific fluorescent dyes, called size standards. The type of fluorescent dye assigned to the size standard varies depending on the reagent kit used.
 オペレータは、アレリックラダーの種類やサイズスタンダードの種類、蛍光試薬の種類、それぞれのキャピラリに対応するサンプルプレート224上のウェルにセットされたサンプルの種類などを指定する。本実施例ではサンプルの種類として、実サンプル、ポジティブコントロール、ネガティブコントロール、及びアレリックラダーのいずれかの種類が指定される。これらの情報の設定は、データ解析装置112上にて、ユーザインタフェース部103を介し、サンプル情報設定部106に設定される。 The operator specifies the type of allelic ladder, the type of size standard, the type of fluorescent reagent, and the type of sample set in the wells on the sample plate 224 corresponding to each capillary. In this embodiment, the type of sample specified is any one of real sample, positive control, negative control, and allelic ladder. This information is set in the sample information setting section 106 on the data analysis device 112 via the user interface section 103.
 そして、上記のようなサンプル準備(S401)が完了した後、オペレータはデータ解析装置112上にて、ユーザインタフェース部103を操作して、分析開始を指示する。この分析開始の指示は電気泳動装置制御部108に渡される。電気泳動装置制御部108が、分析開始の信号を電気泳動装置105に送信することで、分析が開始される(S402)。 After completing the above sample preparation (S401), the operator operates the user interface unit 103 on the data analysis device 112 to instruct the start of analysis. This instruction to start analysis is passed to the electrophoresis device control unit 108. The electrophoresis device control unit 108 sends an analysis start signal to the electrophoresis device 105, thereby starting the analysis (S402).
 次に電気泳動装置105では、泳動媒体充填(S403)が開始される。このステップは、分析開始後に自動的に行われてもよいし、逐次、電気泳動装置制御部108から制御信号が送信されることによって行われてもよい。泳動媒体充填とは、キャピラリ202内に新しい泳動媒体を充填し、泳動路を形成する手順である。 Next, the electrophoresis device 105 starts filling the migration medium (S403). This step may be performed automatically after the start of the analysis, or may be performed sequentially by sending a control signal from the electrophoresis device control unit 108. Filling the migration medium is a procedure in which new migration medium is filled into the capillary 202 to form a migration path.
 本実施例における泳動媒体充填(S403)では、まず、搬送機225により廃液容器223をロードヘッダ229の直下に運び、電磁弁213を閉じ、キャピラリ陰極端227から排出される使用済の泳動媒体を受け止められるようにする。そして、シリンジ206を駆動して、キャピラリ202に新しい泳動媒体を充填し、使用済の泳動媒体を廃棄する。最後に、洗浄容器222内の洗浄溶液にキャピラリ陰極端227を浸し、泳動媒体により汚れたキャピラリ陰極端227を洗浄する。 In filling the migration medium in this embodiment (S403), first, the waste liquid container 223 is transported directly below the load header 229 by the transport machine 225, and the solenoid valve 213 is closed so that the used migration medium discharged from the capillary cathode end 227 can be received. Then, the syringe 206 is driven to fill the capillary 202 with new migration medium, and the used migration medium is discarded. Finally, the capillary cathode end 227 is immersed in a cleaning solution in the cleaning container 222, and the capillary cathode end 227 contaminated by the migration medium is cleaned.
 次に予備泳動(S404)が行われる。このステップは、自動的に行われてもよいし、逐次、電気泳動装置制御部108から制御信号が送信されることによって行われてもよい。予備泳動とは、泳動媒体に所定の電圧を印加し、泳動媒体を電気泳動に適した状態にする手順である。本実施例における予備泳動(S404)では、まず、搬送機225により、バッファ容器221内の緩衝液にキャピラリ陰極端227を浸し、通電路を形成する。そして、高圧電源204により、泳動媒体に数~数十キロボルト程度の電圧を数~数十分間加え、泳動媒体を電気泳動に適した状態とする。最後に、洗浄容器222内の洗浄溶液にキャピラリ陰極端227を浸し、緩衝液により汚れたキャピラリ陰極端227を洗浄する。 Next, preliminary electrophoresis (S404) is performed. This step may be performed automatically or sequentially by sending a control signal from the electrophoresis device control unit 108. Preliminary electrophoresis is a procedure in which a predetermined voltage is applied to the electrophoretic medium to make the electrophoretic medium suitable for electrophoresis. In the preliminary electrophoresis (S404) in this embodiment, first, the capillary cathode end 227 is immersed in the buffer solution in the buffer container 221 by the conveyor 225 to form a current path. Then, a voltage of several to several tens of kilovolts is applied to the electrophoretic medium by the high-voltage power supply 204 for several to several tens of minutes to make the electrophoretic medium suitable for electrophoresis. Finally, the capillary cathode end 227 is immersed in the cleaning solution in the cleaning container 222 to clean the capillary cathode end 227 contaminated by the buffer solution.
 次にサンプル導入(S405)が行われる。このステップは、自動的に行われてもよいし、逐次、電気泳動装置制御部108から制御信号が送信されることによって行われてもよい。サンプル導入(S405)では、サンプル成分が泳動路に導入される。本実施例におけるサンプル導入(S405)では、まず、搬送機225により、サンプルプレート224のウェル内に保持されたサンプルにキャピラリ陰極端227を浸し、その後電磁弁213を開く。これにより、通電路が形成され、泳動路にサンプル成分を導入することが状態となる。そして、高圧電源204によりパルス電圧を通電路に印加し、泳動路にサンプル成分を導入する。最後に、洗浄容器222内の洗浄溶液にキャピラリ陰極端227を浸し、サンプルにより汚れたキャピラリ陰極端227を洗浄する。 Next, sample introduction (S405) is performed. This step may be performed automatically or sequentially by sending a control signal from the electrophoresis device control unit 108. In sample introduction (S405), sample components are introduced into the migration path. In sample introduction (S405) in this embodiment, first, the capillary cathode end 227 is immersed in the sample held in the well of the sample plate 224 by the conveyor 225, and then the solenoid valve 213 is opened. This forms a current path, and the sample components are ready to be introduced into the migration path. Then, a pulse voltage is applied to the current path by the high-voltage power supply 204, and the sample components are introduced into the migration path. Finally, the capillary cathode end 227 is immersed in a cleaning solution in the cleaning container 222, and the capillary cathode end 227 contaminated by the sample is washed.
 次に泳動分析(S406)が行われる。このステップは、自動的に行われてもよいし、逐次、電気泳動装置制御部108から制御信号が送信されることによって行われてもよい。泳動分析(S406)では、電気泳動により、サンプル中に含まれる各サンプル成分が分離分析される。本実施例における泳動分析(S406)では、まず、搬送機225により、バッファ容器221内の緩衝液にキャピラリ陰極端227を浸し、通電路を形成する。次に、高圧電源204により、通電路に15kV前後の高電圧を印加し、泳動路に電界を発生させる。発生した電界により、泳動路内の各サンプル成分は、各サンプル成分の性質に依存した速度で検出部216へ移動する。つまり、サンプル成分は、その移動速度の差により分離される。そして、検出部216に到達したサンプル成分から順番に検出される。例えば、サンプルが、塩基長の異なるDNAを多数含む場合は、その塩基長により移動速度に差が生じ、塩基長の短いDNAから順に検出部216に到達する。各DNAには、その末端塩基配列に依存した蛍光色素が取り付けられている。検出部216に光源214から励起光が照射されると、サンプルから情報光、すなわちサンプルに依存した波長を有する蛍光が生じ、外部に放出される。この情報光を光学検出器215により検出する。泳動分析中は、光学検出器215では、一定の時間間隔でこの情報光を検出し、画像データをデータ解析装置112へ送信する。もしくは送信する情報量を減らすため、画像データではなく、画像データ中の一部の領域のみの輝度を送信してもよい。例えば、キャピラリ毎に、一定間隔の波長位置のみサンプリングされた輝度値を送信してもよい。この輝度値データは各キャピラリのスペクトル波形を表している。このスペクトル波形が記憶部104へ格納される。 Next, electrophoretic analysis (S406) is performed. This step may be performed automatically or sequentially by sending a control signal from the electrophoretic device control unit 108. In electrophoretic analysis (S406), each sample component contained in the sample is separated and analyzed by electrophoresis. In the electrophoretic analysis (S406) in this embodiment, first, the capillary cathode end 227 is immersed in the buffer solution in the buffer container 221 by the conveyor 225 to form a current path. Next, a high voltage of about 15 kV is applied to the current path by the high-voltage power supply 204 to generate an electric field in the electrophoretic path. Due to the generated electric field, each sample component in the electrophoretic path moves to the detection unit 216 at a speed that depends on the properties of each sample component. In other words, the sample components are separated due to the difference in their moving speed. Then, the sample components that reach the detection unit 216 are detected in order. For example, when a sample contains many DNAs with different base lengths, the migration speed differs depending on the base length, and the DNAs reach the detection unit 216 in order starting from the shortest base length. A fluorescent dye that depends on the terminal base sequence is attached to each DNA. When the detection unit 216 is irradiated with excitation light from the light source 214, information light, that is, fluorescence having a wavelength that depends on the sample, is generated from the sample and released to the outside. This information light is detected by the optical detector 215. During the electrophoretic analysis, the optical detector 215 detects this information light at regular time intervals and transmits image data to the data analysis device 112. Alternatively, in order to reduce the amount of information to be transmitted, the luminance of only a part of the image data may be transmitted instead of the image data. For example, luminance values sampled only at wavelength positions at regular intervals may be transmitted for each capillary. This luminance value data represents the spectral waveform of each capillary. This spectral waveform is stored in the memory unit 104.
 最後に、予定していた画像データを取得し終えたら電圧印加を停止し、泳動分析を終了する(S407)。以上が、図4における電気泳動処理(S301)の処理の一例である。 Finally, when the planned image data has been acquired, the voltage application is stopped and the electrophoretic analysis is terminated (S407). The above is an example of the electrophoretic process (S301) in FIG. 4.
 図5は、S305におけるベースコールのフローを示している。
 まず、ベースコール部107の解析区間検出部109が、補正された蛍光強度の時系列データから信号区間を検出する(ステップS501)。
 ベースコール部107は、検出した信号区間について解析を行い、サンプルの塩基配列を特定する(ステップS502)。
FIG. 5 shows the flow of base calling in S305.
First, the analysis interval detection unit 109 of the base calling unit 107 detects a signal interval from the time-series data of the corrected fluorescence intensity (step S501).
The base calling unit 107 analyzes the detected signal section and identifies the base sequence of the sample (step S502).
 図6は、S501における信号区間検出のフローを示している。S501における信号区間検出は、ステップS601からステップS604を含む。
 ステップS601 解析区間検出部109は、時系列データ全体を複数の小区間に分割する。その後、ステップS602に進む。
 ステップS602 解析区間検出部109は、小区間の1つを選択し、その区間に含まれる各信号の非信号特徴量を生成する。各信号とは、4つの塩基に対応する4つの蛍光強度データである。解析区間検出部109は、選択中の小区間の4つの蛍光強度データについて、それぞれ非信号特徴量を生成する。その後、ステップS603に進む。
6 shows a flow of the signal section detection in S501. The signal section detection in S501 includes steps S601 to S604.
Step S601: The analysis interval detection unit 109 divides the entire time series data into a plurality of small intervals. Then, the process proceeds to step S602.
In step S602, the analysis interval detection unit 109 selects one of the small intervals and generates non-signal feature values for each signal included in that interval. Each signal is four pieces of fluorescence intensity data corresponding to four bases. The analysis interval detection unit 109 generates non-signal feature values for each of the four pieces of fluorescence intensity data in the selected small interval. Then, the process proceeds to step S603.
 ステップS603 解析区間検出部109は、選択中の小区間の非信号特徴量を決定する。具体的には、解析区間検出部109は、4つの蛍光強度データから求めた4つの非信号特徴量のうち、最小の特徴量を当該小区間の非信号特徴量とする。ステップS603の後、非信号特徴量を決定していない小区間が残っていれば、ステップS602に戻る。全ての小区間について非信号特徴量を決定したならば、ステップS604に進む。なお、前述のように、非信号特徴量の代わりに信号特徴量を用いる場合は、最大の信号特徴量を当該小区間の信号特徴量とし、上記と同様の流れで処理を行う。 Step S603: The analysis section detection unit 109 determines the non-signal feature of the selected subsection. Specifically, the analysis section detection unit 109 sets the smallest feature of the four non-signal features calculated from the four fluorescence intensity data as the non-signal feature of the subsection. After step S603, if there are still subsections remaining for which non-signal features have not been determined, the process returns to step S602. Once non-signal features have been determined for all subsections, the process proceeds to step S604. Note that, as described above, when signal features are used instead of non-signal features, the largest signal feature is set as the signal feature of the subsection, and processing is performed in the same manner as above.
 ステップS604 解析区間検出部109は、小区間のそれぞれについて決定した非信号特徴量を用いて、非信号区間と信号区間との境界を決定し、処理を終了する。 Step S604: The analysis section detection unit 109 uses the non-signal features determined for each small section to determine the boundary between the non-signal section and the signal section, and ends the process.
 ここで、非信号特徴量について説明する。図7は非信号区間の特徴を説明する図である。図8は、信号区間の特徴を説明する図である。図7及び図8において、Dye1~Dye4は4つの塩基に対応する4つの蛍光色素を示す。図7及び図8において、横軸は時間であり、縦軸は蛍光強度である。 Here, we will explain the non-signal features. Figure 7 is a diagram explaining the features of the non-signal section. Figure 8 is a diagram explaining the features of the signal section. In Figures 7 and 8, Dye1 to Dye4 indicate four fluorescent dyes corresponding to four bases. In Figures 7 and 8, the horizontal axis is time and the vertical axis is fluorescence intensity.
 図7と図8の蛍光強度データを比較すると、非信号区間では凹凸や平坦が多く、信号区間では凹凸や平坦が少ない。特に、信号区間では、蛍光強度の推移がなだらかとなるから、平坦部分が顕著に少なくなる。 Comparing the fluorescence intensity data in Figures 7 and 8, there are many unevennesses and flatnesses in the non-signal sections, and fewer unevennesses and flatnesses in the signal sections. In particular, in the signal sections, the transition in fluorescence intensity is gradual, so there are significantly fewer flat areas.
 そこで、解析区間検出部109は、蛍光強度データの3つの形状パターン(極大、極小、平坦)の出現回数に基づいて非信号特徴量を生成する。形状パターンからの非信号特徴量の生成は、ステップS602に対応する。 Then, the analysis section detection unit 109 generates non-signal features based on the number of occurrences of the three shape patterns (maximum, minimum, flat) in the fluorescence intensity data. The generation of non-signal features from the shape patterns corresponds to step S602.
 図9は、形状パターンからの非信号特徴量の生成の説明図である。
 形状パターン「平坦」は、隣り合う点の強度差が±h1とする。すなわち、次の式を満たす場合である。なお、ここでいう点とは電気泳動信号の個々のサンプル値であり、前述の光学検出器215がデータを取得する時間間隔、もしくはサンプリングレートによって定まる。この時間間隔は予めユーザ、もしくは装置のデフォルト値として定められる。
  -h1≦(y[k+1]-y[k])<=h1
 形状パターン「極大」は、次の式を満たす場合である。
  y[k]-y[k-1]>h2 && y[k]-y[k+1]>h2
 形状パターン「極小」は、次の式を満たす場合である。
  y[k-1]-y[k]>h3 && y[k+1]-y[k]>h3
 上記のh1、h2、h3はサンプリングレートや電気泳動電圧に応じて予め定めた値としてよい。
FIG. 9 is an explanatory diagram of generation of non-signal features from a shape pattern.
The shape pattern "flat" is defined as a case where the intensity difference between adjacent points is ±h1. In other words, the following formula is satisfied. Note that the points referred to here are individual sample values of the electrophoretic signal, and are determined by the time interval or sampling rate at which the optical detector 215 acquires data. This time interval is determined in advance by the user or as a default value for the device.
-h1≦(y[k+1]-y[k])<=h1
The shape pattern "maximum" is a pattern that satisfies the following formula.
y[k]-y[k-1]>h2 &&y[k]-y[k+1]>h2
The shape pattern is "minimal" when the following formula is satisfied.
y[k-1]-y[k]>h3 &&y[k+1]-y[k]>h3
The above h1, h2, and h3 may be values that are determined in advance according to the sampling rate and the electrophoretic voltage.
 解析区間検出部109は、3つのパターンの出現回数を、当該区間における当該蛍光強度データの非信号特徴量とする。なお、区間長で正規化し、3つのパターンの出現頻度を非信号特徴量としてもよい。 The analysis interval detection unit 109 regards the number of times the three patterns appear as non-signal features of the fluorescence intensity data in that interval. Note that it is also possible to normalize by the interval length and regard the frequency of appearance of the three patterns as non-signal features.
 また、3つのパターンのうち、形状パターン「平坦」は、信号区間に現れにくく、非信号区間の特徴として重要度が高い。そこで、形状パターン「平坦」は他の形状パターンよりも大きい重みをつけて非信号特徴量を生成してもよい。 Furthermore, of the three patterns, the "flat" shape pattern is unlikely to appear in signal sections and is therefore highly important as a feature of non-signal sections. Therefore, the "flat" shape pattern may be weighted more heavily than the other shape patterns to generate non-signal features.
 図10は、S603における区間の非信号特徴量決定の説明図である。図10では、区間内の蛍光強度データの非信号特徴量の最小値を、その区間の非信号特徴量としている。図10に示したグラフは、Dye1~Dye4の蛍光強度データである。F(Dye1)~F(Dye4)は、Dye1~Dye4の蛍光強度データから生成した非信号特徴量である。 FIG. 10 is an explanatory diagram of the determination of the non-signal feature of a section in S603. In FIG. 10, the minimum value of the non-signal feature of the fluorescence intensity data within the section is set as the non-signal feature of that section. The graph shown in FIG. 10 is the fluorescence intensity data of Dye1 to Dye4. F(Dye1) to F(Dye4) are the non-signal feature generated from the fluorescence intensity data of Dye1 to Dye4.
 解析区間検出部109は、区間qの非信号特徴量FqをMin(F(Dye1),F(Dye2),F(Dye3),F(Dye4))により求める。図10では、Dye1の蛍光強度データが最もなだらかであるため、F(Dye1)はF(Dye2)~F(Dye4)より小さくなる。このため、Fq=F(Dye1)となる。 The analysis section detection unit 109 finds the non-signal feature Fq for section q using Min(F(Dye1), F(Dye2), F(Dye3), F(Dye4)). In FIG. 10, the fluorescence intensity data for Dye1 has the gentlest slope, so F(Dye1) is smaller than F(Dye2) to F(Dye4). Therefore, Fq = F(Dye1).
 図11及び図12は、S604における信号境界の決定の説明図である。図11に示すように、解析区間検出部109は、各区間の非信号特徴量をプロットし、平滑化と補間を行う。これにより、非信号特徴量の細かな変動による影響を抑えることができる。解析区間検出部109は、平滑化及び補間を行った非信号特徴量が、閾値を超えた時刻を信号区間境界とする。 FIGS. 11 and 12 are explanatory diagrams of the determination of the signal boundary in S604. As shown in FIG. 11, the analysis interval detection unit 109 plots the non-signal features of each interval and performs smoothing and interpolation. This makes it possible to suppress the effects of small fluctuations in the non-signal features. The analysis interval detection unit 109 determines the time at which the smoothed and interpolated non-signal features exceed a threshold value as the signal interval boundary.
 図12に示すように、解析区間検出部109は、連続して閾値を超えた区間が一定マージン以上ある時刻を境界としてもよい。これにより、境界付近の変動の影響に対してロバストにできる。 As shown in FIG. 12, the analysis interval detection unit 109 may determine the boundary to be the time when the interval in which the threshold value is exceeded continuously is equal to or exceeds a certain margin. This makes it possible to be robust against the effects of fluctuations near the boundary.
 図13は、区間の非信号特徴量の分布から閾値を決定する場合の説明図である。非信号特徴量の分布は二峰性があると想定される。信号部分は低く、非信号部分は高い。解析区間検出部109は、この分布を元に閾値を決定できる。例えば、高い方(非信号)の山のピークFpのX%を閾値とすることができる。また、山の傾きがある程度、平坦になる値を閾値としてもよい。 FIG. 13 is an explanatory diagram of a case where a threshold is determined from the distribution of non-signal features in a section. The distribution of non-signal features is assumed to be bimodal. The signal portion is low, and the non-signal portion is high. The analysis section detection unit 109 can determine the threshold based on this distribution. For example, X% of the peak Fp of the higher (non-signal) mountain can be set as the threshold. Alternatively, a value at which the slope of the mountain becomes relatively flat can be set as the threshold.
 ただしユーザの設定によっては、取得した電気泳動データに、非信号区間が含まれない場合もあり得るので、常に予め定めた固定値を用いてもよい。
 もしくは特徴量の分布に二峰性があるかないかを判定し、動的に決めるか、固定値を求めるかを決めてもよい。
However, depending on the user's settings, there may be cases where the acquired electrophoretic data does not include non-signal sections, so a predetermined fixed value may always be used.
Alternatively, it may be possible to determine whether the distribution of the feature amount is bimodal or not, and then determine whether to dynamically determine the distribution or to obtain a fixed value.
 図14~図18は、非信号特徴量と他の特徴量とを組み合わせて用いる場合の説明図である。
 図14では、閾値を信号レベルに合わせて変化させている。すなわち、次の(1)(2)に従って閾値を決定している。
  (1)ある程度の信号強度以上であれば常に信号区間と判定する。
  (2)ある程度の信号強度以下であれば信号強度に合わせて閾値を下げる。
 ここで、(2)の範囲では、信号強度が低いほど非信号区間と判定しやすくしている。
 この他、差分信号を用いてもよい。差分が大きいほど閾値を下げることで、信号区間の立ち上がりや立下りを検出できる。
14 to 18 are explanatory diagrams for cases in which non-signal features are used in combination with other features.
14, the threshold value is changed according to the signal level. That is, the threshold value is determined according to the following (1) and (2).
(1) If the signal strength is above a certain level, it is always determined to be a signal section.
(2) If the signal strength is below a certain level, the threshold is lowered according to the signal strength.
Here, in the range (2), the lower the signal strength, the easier it is to determine that it is a non-signal section.
Alternatively, a differential signal may be used. By lowering the threshold as the difference increases, the rising and falling edges of the signal section can be detected.
 図15では、非信号特徴量と他の特徴を含めた特徴量ベクトルを信号区間識別器121に入力として与え、出力として信号区間を得ている。他の特徴としては、信号強度や差分信号など任意の特徴を用いることができる。
 信号区間識別器121としては、DNN(Deep Neural Network)、SVM(Support-Vector Machine)やRandomForest等、任意のモデルを用いることができる。
 出力は、信号区間か非信号区間かの判別結果でもよいし、信号区間である確率(尤度)等であってもよい。
15, a feature vector including non-signal features and other features is given as input to a signal section identifier 121, and a signal section is obtained as output. Any other feature such as signal intensity or a differential signal can be used.
As the signal interval identifier 121, any model such as a Deep Neural Network (DNN), a Support-Vector Machine (SVM), or Random Forest can be used.
The output may be a discrimination result of whether it is a signal section or a non-signal section, or may be the probability (likelihood) of it being a signal section, or the like.
 図16では、複数の識別器を組み合わせたアンサンブル学習を示している。図16に示した例では、第1の信号区間識別器122に第1の特徴量ベクトルを入力として与え、第2の信号区間識別器123に第2の特徴量ベクトルを入力として与えている。そして、信号区間識別器122~123の出力を判別器124の入力として、最終的に出力を得ている。 FIG. 16 shows ensemble learning that combines multiple classifiers. In the example shown in FIG. 16, a first feature vector is provided as input to the first signal section classifier 122, and a second feature vector is provided as input to the second signal section classifier 123. The outputs of the signal section classifiers 122-123 are then input to the discriminator 124, which finally obtains an output.
 第1の特徴量ベクトルは、例えば非信号特徴量と信号強度を含み、第2の特徴量ベクトルは、例えば差分信号を含む。
 判別器124は、多数決等により出力を決定する。この他のバギング、ブースティング、スタッキングを用いてもよい。
The first feature vector includes, for example, a non-signal feature and a signal intensity, and the second feature vector includes, for example, a differential signal.
The classifier 124 determines the output by majority vote, etc. Other methods such as bagging, boosting, and stacking may also be used.
 図17は、装置上で学習を行う構成の説明図である。信号区間情報格納部125は、遺伝子解析装置に設けられた記憶部である。信号区間情報格納部125は、ユーザが測定を行う都度、信号区間情報(信号区間か否かを示すラベル)と特徴量ベクトルを対応付けて記憶する。解析区間検出部109は、特徴量ベクトルとラベルを信号区間情報格納部125から読み出し、信号区間識別器に与えて教師有り学習を行うことができる。 FIG. 17 is an explanatory diagram of the configuration for learning on the device. The signal section information storage unit 125 is a memory unit provided in the genetic analysis device. Each time the user performs a measurement, the signal section information storage unit 125 stores the signal section information (a label indicating whether it is a signal section or not) in association with the feature vector. The analysis section detection unit 109 can read the feature vector and label from the signal section information storage unit 125 and provide them to a signal section classifier to perform supervised learning.
 図18は、ユーザの調整結果を反映した学習の説明図である。ユーザが解析区間を調整して再解析を行った場合、その操作と結果を所定の情報格納部に格納し、次回学習に反映する。これにより、信号区間の検出精度を向上できる。図18では、信号区間と非信号区間の境界をユーザが操作する例を示したが、他のパラメータの調整結果を用いることもできる。例えば、非信号特徴量パラメータ(平坦部の条件、極大点・極小点の条件)、閾値設定、信号境界判定に関するパラメータをユーザが調整したならば、その操作を結果を記憶して学習すればよい。 FIG. 18 is an explanatory diagram of learning that reflects the results of user adjustments. When the user adjusts the analysis section and performs reanalysis, the operation and the results are stored in a specified information storage unit and reflected in the next learning. This improves the accuracy of signal section detection. Although FIG. 18 shows an example in which the user operates the boundary between the signal section and the non-signal section, the results of adjustments to other parameters can also be used. For example, if the user adjusts non-signal feature parameters (conditions for flat sections, conditions for maximum and minimum points), threshold settings, and parameters related to signal boundary determination, the results of that operation can be stored and learned.
 遺伝子解析装置101におけるベースコール部107は、蛍光強度計算部110において電気泳動結果から生成された蛍光強度データ以外の蛍光強度データを再解析してもよい。この場合の例としては、蛍光強度データは記憶部104に格納されていてもよいし、通信ケーブルを通じて送信されてもよい。この場合、ユーザが蛍光強度データを編集することで解析区間を調整することが可能である。図19は、蛍光強度データの編集により解析区間を修正する場合の説明図である。同図上の蛍光強度データに対し、同図下に示すように先頭付近の蛍光強度データに対して平坦部を多く含むようにデータを修正することで、前記の非信号特徴量が大きくなり、信号開始位置を移動させることができる。なお、先頭付近の蛍光強度をゼロ値にしてしまうと、蛍光強度が大きく変化することにより、先頭付近のベースコール結果が、修正前に比べて変わってしまう可能性がある。このため、同図下では、信号強度を修正前の信号強度と一定範囲内(同図下グレーの範囲)のずれになるように、平坦部を増やすことで信号開始位置のみを変更している。なお、平坦部だけでなく、極大点や極小点を追加してもよい。このような蛍光強度データの編集は外部のツールを用いて行われてよい。または遺伝子解析装置101がこのような蛍光強度データの編集機能を備えていてもよい。 The base call unit 107 in the genetic analysis device 101 may reanalyze fluorescence intensity data other than the fluorescence intensity data generated from the electrophoresis results in the fluorescence intensity calculation unit 110. In this case, for example, the fluorescence intensity data may be stored in the storage unit 104 or may be transmitted through a communication cable. In this case, the user can adjust the analysis interval by editing the fluorescence intensity data. FIG. 19 is an explanatory diagram of a case where the analysis interval is corrected by editing the fluorescence intensity data. For the fluorescence intensity data in the upper part of the figure, as shown in the lower part of the figure, the data is corrected so that the fluorescence intensity data near the beginning includes many flat parts, so that the non-signal feature amount increases and the signal start position can be moved. Note that if the fluorescence intensity near the beginning is set to a zero value, the fluorescence intensity will change significantly, and the base call result near the beginning may change compared to before the correction. For this reason, in the lower part of the figure, only the signal start position is changed by increasing the flat parts so that the signal intensity is within a certain range (gray range in the lower part of the figure) from the signal intensity before the correction. Note that not only the flat parts but also maximum and minimum points may be added. Such editing of the fluorescence intensity data may be performed using an external tool. Alternatively, the genetic analysis device 101 may have a function for editing such fluorescence intensity data.
 上述してきたように、開示の遺伝子解析装置101は、サンプルの電気泳動の結果を示す時系列データを取得する取得部としての蛍光強度計算部110と、前記時系列データから前記サンプルの塩基配列を解析する解析部としてのベースコール部107と、を備える。前記時系列データは、複数の塩基に対応する複数の蛍光強度データを含み、前記解析部は、前記時系列データを複数の区間に分割し、各区間における前記蛍光強度データの極大部分、極小部分、および平坦部分のうち少なくとも一つの出現頻度を示す特徴量を前記複数の蛍光強度データごとに生成し、前記複数の蛍光強度データについて生成した複数の特徴量の中から、特徴量の大小関係に基づいて区間特徴量を決定し、前記区間特徴量を用いて、前記時系列データにおける塩基配列の解析対象領域である信号領域を検出する。
 この構成によれば、電気泳動の結果を示す時系列データから高精度に信号区間(信号領域)を検出できる。
 例えば、強度の低い信号を含む信号区間も精度よく検出できる。また、信号強度のバラツキが大きい場合にも信号区間を精度よく検出できる。具体的には、サンプル前処理起因の意図しない突発的に強度が高い信号(Dye Blob)が含まれるデータ、PCR反応の末端に現れる強度が高い信号が含まれるデータ、特殊サンプルや前処理起因により信号強度が減衰するデータについて、信号区間の検出精度が向上する。
As described above, the disclosed genetic analysis device 101 includes a fluorescence intensity calculation unit 110 as an acquisition unit that acquires time-series data showing the results of electrophoresis of a sample, and a base calling unit 107 as an analysis unit that analyzes the base sequence of the sample from the time-series data. The time-series data includes a plurality of fluorescence intensity data corresponding to a plurality of bases, and the analysis unit divides the time-series data into a plurality of intervals, generates for each of the plurality of fluorescence intensity data a feature amount indicating the frequency of occurrence of at least one of a maximum portion, a minimum portion, and a flat portion of the fluorescence intensity data in each interval, determines an interval feature amount from the plurality of feature amounts generated for the plurality of fluorescence intensity data based on the magnitude relationship of the feature amounts, and detects a signal region, which is an analysis target region of the base sequence in the time-series data, using the interval feature amount.
According to this configuration, a signal section (signal region) can be detected with high accuracy from time-series data indicating the results of electrophoresis.
For example, a signal section including a low-intensity signal can be detected with high accuracy. In addition, a signal section can be detected with high accuracy even when the signal intensity varies widely. Specifically, the detection accuracy of a signal section is improved for data including an unintended, suddenly high-intensity signal (dye blob) caused by sample pretreatment, data including a high-intensity signal that appears at the end of a PCR reaction, and data in which the signal intensity is attenuated due to a special sample or pretreatment.
 また、前記解析部は、複数の蛍光強度データの特徴量のうち、特徴量が最小である特徴量を前記区間特徴量として決定する。
 このように、信号区間ではなく非信号区間に特徴的に出現する形状パターンを利用して、非信号区間を特定することで、信号区間を検出できる。
 また、電気泳動の特性上、複数の蛍光強度データのうち、同一の位置には1つの塩基しか存在しないため、複数の蛍光強度データから最も信号らしい特徴量を区間の代表の特徴量として選択することで、電気泳動の特性を利用して信号区間を高精度に検出できる。
Moreover, the analysis section determines, as the section feature, the feature that is the smallest among the feature values of the plurality of pieces of fluorescence intensity data.
In this way, the signal section can be detected by identifying the non-signal section using a shape pattern that appears characteristically not in the signal section but in the non-signal section.
Furthermore, due to the characteristics of electrophoresis, only one base will exist at the same position among multiple fluorescence intensity data, so by selecting the feature that is most likely to be a signal from the multiple fluorescence intensity data as the representative feature of the section, the characteristics of electrophoresis can be utilized to detect the signal section with high accuracy.
 また、前記解析部は、前記区間の特徴量を閾値と比較して、当該区間が非信号区間であるか否かを判定する。前記閾値は、予め定めた固定の値、若しくは前記区間の特徴量の分布から求めた値を用いることができる。
 この構成では、非信号特徴量の分布を考慮して精度よく信号区間を検出できる。
The analysis unit may compare the feature amount of the section with a threshold value to determine whether the section is a non-signal section. The threshold value may be a predetermined fixed value or a value calculated from the distribution of the feature amount of the section.
In this configuration, the signal section can be detected with high accuracy by taking into account the distribution of non-signal features.
 また、前記解析部は、前記区間の特徴量が前記閾値より大きい区間が一定数連続した場合に、当該連続区間と、当該連続区間に隣接する信号領域との境界を、前記信号領域と非信号領域との境界とする。
 この構成では、信号区間の境界付近の変動の影響を抑制できる。
In addition, when a certain number of consecutive sections have a feature amount greater than the threshold value, the analysis unit determines that the boundary between the consecutive sections and the signal area adjacent to the consecutive sections is the boundary between the signal area and non-signal area.
This configuration can suppress the effects of fluctuations near the boundaries of the signal sections.
 前記解析部は、前記区間の特徴量を入力の1つとする識別モデルを用いて前記信号領域を検出する。
 この構成では、非信号特徴量と他の特徴量から柔軟に信号区間を検出できる。
The analysis unit detects the signal region using a discrimination model in which a feature amount of the section is one of inputs.
In this configuration, signal sections can be flexibly detected from non-signal features and other features.
 また、前記解析部は、前記蛍光強度データの平坦部分に対し、前記蛍光強度データの極大部分及び極小部分よりも大きい重みを用いて前記蛍光強度データの特徴量を生成する。
 このように、非信号区間の特徴的な形状パターンを重視することで、信号区間を精度よく検出できる。
Furthermore, the analysis unit generates feature quantities of the fluorescence intensity data by using a weight that is greater for flat portions of the fluorescence intensity data than for maximum and minimum portions of the fluorescence intensity data.
In this way, by placing emphasis on the characteristic shape pattern of the non-signal section, the signal section can be detected with high accuracy.
 また、前記解析部は、前記蛍光強度データを編集した第2の蛍光強度データを用いて、前記蛍光強度データとは異なる第2の信号区間を検出することができる。ここで、該第2の蛍光強度データは、前記蛍光強度データの強度と一定範囲内のずれ量とする。
 このような第2の蛍光強度データを用いて第2の信号区間を検出すれば、解析結果への影響を抑えつつ、解析区間を変更できる。
The analysis unit can detect a second signal section different from the fluorescence intensity data by using second fluorescence intensity data edited from the fluorescence intensity data, where the second fluorescence intensity data is a deviation amount within a certain range from the intensity of the fluorescence intensity data.
By detecting the second signal interval using such second fluorescence intensity data, it is possible to change the analysis interval while minimizing the effect on the analysis results.
 なお、本発明は上記の実施例に限定されるものではなく、様々な変形例が含まれる。例えば、上記した実施例は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、かかる構成の削除に限らず、構成の置き換えや追加も可能である。
 例えば、上記の実施例では、データから学習を行う装置と信号区間識別器を更新する装置が一体となった構成を例示したが、学習と信号区間識別器の更新を別の装置が行う構成としてもよい。
The present invention is not limited to the above-mentioned embodiment, and various modifications are included. For example, the above-mentioned embodiment is described in detail to easily explain the present invention, and is not necessarily limited to the embodiment having all the described configurations. Moreover, the present invention is not limited to the deletion of the configurations, and it is also possible to replace or add the configurations.
For example, in the above embodiment, a configuration was exemplified in which a device that learns from data and a device that updates the signal section discriminator are integrated, but the learning and updating of the signal section discriminator may be performed by separate devices.
101:遺伝子解析装置、102:中央制御部、103:ユーザインタフェース部、104:記憶部、105:電気泳動装置、106:サンプル情報設定部、107:ベースコール部、108:電気泳動装置制御部、109:解析区間検出部、110:蛍光強度計算部、112:データ解析装置、121~123:信号区間識別器、124:判別器、125:信号区間情報格納部
 
Reference Signs List 101: Genetic analysis device, 102: Central control unit, 103: User interface unit, 104: Memory unit, 105: Electrophoresis device, 106: Sample information setting unit, 107: Base calling unit, 108: Electrophoresis device control unit, 109: Analysis section detection unit, 110: Fluorescence intensity calculation unit, 112: Data analysis device, 121 to 123: Signal section discriminator, 124: Discriminator, 125: Signal section information storage unit

Claims (8)

  1.  サンプルの電気泳動の結果を示す時系列データを取得する取得部と、
     前記時系列データから前記サンプルの塩基配列を解析する解析部と、
     を備え、
     前記時系列データは、複数の塩基に対応する複数の蛍光強度データを含み、
     前記解析部は、
      前記時系列データを複数の区間に分割し、
      各区間における前記蛍光強度データの極大部分、極小部分、および平坦部分のうち少なくとも一つの出現頻度を示す特徴量を前記複数の蛍光強度データごとに生成し、
      前記複数の蛍光強度データについて生成した複数の特徴量の中から、特徴量の大小関係に基づいて区間特徴量を決定し、
      前記区間特徴量を用いて、前記時系列データにおける塩基配列の解析対象領域である信号領域を検出することを特徴とする遺伝子解析装置。
    an acquisition unit that acquires time-series data indicating the results of electrophoresis of a sample;
    an analysis unit that analyzes the base sequence of the sample from the time-series data;
    Equipped with
    the time-series data includes a plurality of fluorescence intensity data corresponding to a plurality of bases,
    The analysis unit is
    Dividing the time series data into a plurality of intervals;
    generating a feature quantity indicating an occurrence frequency of at least one of a maximum portion, a minimum portion, and a flat portion of the fluorescence intensity data in each section for each of the plurality of fluorescence intensity data;
    determining an interval feature value from among the plurality of feature values generated for the plurality of fluorescence intensity data based on a magnitude relationship between the feature values;
    A genetic analysis device comprising: a signal region, which is a region to be analyzed of a base sequence in the time-series data, detected by using the section feature amount.
  2.  請求項1に記載の遺伝子解析装置であって、
     前記解析部は、
      前記複数の特徴量のうち、特徴量が最小である特徴量を前記区間特徴量として決定することを特徴とする遺伝子解析装置。
    The genetic analysis device according to claim 1 ,
    The analysis unit is
    A gene analysis device comprising: a feature quantity that is the smallest among the plurality of feature quantities; and a feature quantity that is the smallest among the plurality of feature quantities.
  3.  請求項1に記載の遺伝子解析装置であって、
     前記解析部は、前記区間特徴量を閾値と比較して、当該区間が信号領域であるか否かを判定し、
     前記閾値は、予め定めた固定の値、若しくは前記区間特徴量の分布から求めた値であることを特徴とする遺伝子解析装置。
    The genetic analysis device according to claim 1 ,
    The analysis unit compares the section feature amount with a threshold value to determine whether the section is a signal region;
    The gene analysis device, wherein the threshold value is a predetermined fixed value or a value obtained from a distribution of the section feature amount.
  4.  請求項3に記載の遺伝子解析装置であって、
     前記解析部は、前記区間特徴量が前記閾値より大きい区間が一定数連続した場合に、当該連続区間と、当該連続区間に隣接する信号領域との境界を、前記信号領域と非信号領域との境界とする遺伝子解析装置。
    The genetic analysis device according to claim 3,
    The analysis unit, when a certain number of consecutive sections have section features whose section characteristics are greater than the threshold value, determines that the boundary between the consecutive sections and a signal region adjacent to the consecutive sections is the boundary between the signal region and non-signal region.
  5.  請求項1に記載の遺伝子解析装置であって、
     前記解析部は、前記区間の特徴量を入力の1つとする識別モデルを用いて前記信号領域を検出することを特徴とする遺伝子解析装置。
    The genetic analysis device according to claim 1 ,
    The gene analysis device is characterized in that the analysis unit detects the signal region using a discrimination model in which a feature amount of the section is one of the inputs.
  6.  請求項1に記載の遺伝子解析装置であって、
     前記解析部は、前記蛍光強度データの平坦部分に対し、前記蛍光強度データの極大部分及び極小部分よりも大きい重みを用いて前記特徴量を生成することを特徴とする遺伝子解析装置。
    The genetic analysis device according to claim 1 ,
    The gene analysis device according to claim 1, wherein the analysis unit generates the feature quantity by using a weight that is greater for flat portions of the fluorescence intensity data than for maximum and minimum portions of the fluorescence intensity data.
  7.  請求項1に記載の遺伝子解析装置であって、
     前記解析部は、前記蛍光強度データを編集した第2の蛍光強度データを用いて、前記蛍光強度データとは異なる第2の信号区間を検出し、
     該第2の蛍光強度データは、前記蛍光強度データの強度と一定範囲内のずれ量であることを特徴とする遺伝子解析装置。
    The genetic analysis device according to claim 1 ,
    the analysis unit detects a second signal section different from the fluorescence intensity data by using second fluorescence intensity data obtained by editing the fluorescence intensity data,
    The second fluorescence intensity data is a deviation amount within a certain range from the intensity of the fluorescence intensity data.
  8.  サンプルの電気泳動の結果を示す、複数の塩基に対応する複数の蛍光強度データを含む時系列データを取得するステップと、
     前記時系列データを複数の区間に分割するステップと、
     各区間における前記蛍光強度データの極大部分、極小部分、平坦部分のうち少なくとも1つの出現頻度に基づいて各区間における蛍光強度データの非信号度合いを示す特徴量を前記複数の蛍光強度データごとに生成するステップと、
     前記複数の蛍光強度データについて生成した複数の特徴量の中から、特徴量の大小関係に基づいて区間特徴量を決定するステップと、
     前記区間特徴量を用いて、前記時系列データにおける塩基配列の解析対象領域である信号領域を検出するステップと、
     を含むことを特徴とする遺伝子解析方法。
     
    acquiring time-series data indicating the result of electrophoresis of the sample, the time-series data including a plurality of fluorescence intensity data corresponding to a plurality of bases;
    Dividing the time series data into a plurality of intervals;
    generating, for each of the plurality of fluorescence intensity data, a feature quantity indicative of a degree of non-signal of the fluorescence intensity data in each section based on an occurrence frequency of at least one of a maximum portion, a minimum portion, and a flat portion of the fluorescence intensity data in each section;
    determining an interval feature from among a plurality of feature values generated for the plurality of fluorescence intensity data based on a magnitude relationship between the feature values;
    detecting a signal region, which is a region to be analyzed of a base sequence in the time-series data, by using the section feature;
    A genetic analysis method comprising the steps of:
PCT/JP2023/014893 2023-04-12 2023-04-12 Genetic analysis device and genetic analysis method WO2024214217A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2023/014893 WO2024214217A1 (en) 2023-04-12 2023-04-12 Genetic analysis device and genetic analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2023/014893 WO2024214217A1 (en) 2023-04-12 2023-04-12 Genetic analysis device and genetic analysis method

Publications (1)

Publication Number Publication Date
WO2024214217A1 true WO2024214217A1 (en) 2024-10-17

Family

ID=93059143

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/014893 WO2024214217A1 (en) 2023-04-12 2023-04-12 Genetic analysis device and genetic analysis method

Country Status (1)

Country Link
WO (1) WO2024214217A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS62225956A (en) * 1986-03-26 1987-10-03 Fuji Photo Film Co Ltd Signal processing for determining base sequence nucleic acid
JPS63210769A (en) * 1987-02-27 1988-09-01 Shimadzu Corp Data processor
JPH11118760A (en) * 1997-10-14 1999-04-30 Hitachi Ltd Method for analyzing cataphoresis pattern of nucleic acid piece
JP2003079366A (en) * 2001-09-11 2003-03-18 Hitachi Ltd Information processing system for assisting primer walking
JP2012177568A (en) * 2011-02-25 2012-09-13 Arkray Inc Data processing device, data processing method, and data processing program
JP2018042560A (en) * 2010-07-05 2018-03-22 ソニー株式会社 Living organism information processing device and method as well as program

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS62225956A (en) * 1986-03-26 1987-10-03 Fuji Photo Film Co Ltd Signal processing for determining base sequence nucleic acid
JPS63210769A (en) * 1987-02-27 1988-09-01 Shimadzu Corp Data processor
JPH11118760A (en) * 1997-10-14 1999-04-30 Hitachi Ltd Method for analyzing cataphoresis pattern of nucleic acid piece
JP2003079366A (en) * 2001-09-11 2003-03-18 Hitachi Ltd Information processing system for assisting primer walking
JP2018042560A (en) * 2010-07-05 2018-03-22 ソニー株式会社 Living organism information processing device and method as well as program
JP2012177568A (en) * 2011-02-25 2012-09-13 Arkray Inc Data processing device, data processing method, and data processing program

Similar Documents

Publication Publication Date Title
JP2020510822A (en) Automatic quality control and spectral error correction for sample analyzers
US20150337360A1 (en) Device for genotypic analysis and method for genotypic analysis
US6635164B1 (en) Capillary electrophoresis system
EP1367388B1 (en) Capillary electrophoresis method
US20100088255A1 (en) Method and system for determining the accuracy of dna base identifications
WO1999053423A1 (en) Expert system for analysis of dna sequencing electropherograms
JP2008122169A (en) Electrophoresis device and electrophoretic analysis method
US10041884B2 (en) Nucleic acid analyzer and nucleic acid analysis method using same
US20030102221A1 (en) Multi-capillary electrophoresis apparatus
WO2024214217A1 (en) Genetic analysis device and genetic analysis method
US20190383742A1 (en) Analysis system and analysis method
JP2006119158A (en) Electrophoresis device and electrophoresis method
JP7253066B2 (en) Biological sample analyzer, biological sample analysis method
CN113439117B (en) Genotype analysis device and method
JP7340095B2 (en) electrophoresis system
WO2022244058A1 (en) Method for analyzing base sequences and gene analyzer
WO2023195077A1 (en) Method for analyzing base sequence and gene analyzer
EP0572023A2 (en) Electrophoretic apparatus
JP3727031B2 (en) Fluorescence-based electrophoresis system for polynucleotide analysis
JP7240524B2 (en) Electrophoresis device and foreign matter detection method
JP2000009690A (en) Solution filling apparatus for capillary electrophoresis
JP4994250B2 (en) Capillary electrophoresis apparatus and electrophoretic medium leak inspection method
WO2023139711A1 (en) Electrophoresis system
Griess et al. Matrix conditioning for lengthened capillary DNA sequencing
JP2004333190A (en) Electrophoresis device