CN102880621B - The method and apparatus extracting similar Time Sub-series - Google Patents
The method and apparatus extracting similar Time Sub-series Download PDFInfo
- Publication number
- CN102880621B CN102880621B CN201110203979.9A CN201110203979A CN102880621B CN 102880621 B CN102880621 B CN 102880621B CN 201110203979 A CN201110203979 A CN 201110203979A CN 102880621 B CN102880621 B CN 102880621B
- Authority
- CN
- China
- Prior art keywords
- time
- series
- sequence
- consensus sequence
- sub
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to the method and apparatus for extracting the Time Sub-series similar to consensus sequence from time serieses.Wherein, the method for extracting the Time Sub-series similar to consensus sequence from time serieses includes:Variation tendency according to time serieses and consensus sequence enters line translation to time serieses and consensus sequence respectively;Time serieses after conversion are divided into multiple Time Sub-series;For each Time Sub-series in multiple Time Sub-series, calculate the editing distance between the consensus sequence after each Time Sub-series and conversion;And the Time Sub-series similar to consensus sequence are extracted from multiple Time Sub-series according to the editing distance calculating.
Description
Technical field
The present invention relates to time Series Processing field is and in particular to be used for similar to consensus sequence from time serieses extraction
The method and apparatus of Time Sub-series.
Background technology
Similar sub-sequence extracts to be meaned to extract the subsequence similar with given benchmark.It is should that similar Time Sub-series extract
Basic technology for tasks such as time series forecasting, cluster, abnormality detection.For example, need in time series forecasting to extract phase
Like subsequence, for training.
In general, similar sub-sequence extracts is related to two steps.First step is time serieses segmentation, i.e. by when
Between sequences segmentation be some.Second step is to extract similar sub-sequence based on Similarity measures.For existing phase
It is primarily present three below problem like subsequence extractive technique:(1) rigid segmentation makes candidate's setting little;(2) expend storage empty
Between and process time;And (3) seldom consider physical significance.
Thus, it is desirable to present a kind of technology that can solve the problem that the problems referred to above.
Content of the invention
Brief overview with regard to the present invention is given below, to provide the basic reason with regard to certain aspects of the invention
Solution.It should be appreciated that this general introduction is not the exhaustive general introduction with regard to the present invention.It is not intended to determine the key of the present invention
Or pith, nor is it intended to limit the scope of the present invention.Its purpose only provides some designs in simplified form, with
This is as the preamble in greater detail discussed after a while.
One main purpose of the present invention is, provide a kind of for extracting the son similar to consensus sequence from time serieses
Seasonal effect in time series method and apparatus.
According to an aspect of the invention, it is provided a kind of for extracting the period of the day from 11 p.m. to 1 a.m similar to consensus sequence from time serieses
Between sequence method, including:Variation tendency according to time serieses and consensus sequence is entered to time serieses and consensus sequence respectively
Line translation;Time serieses after conversion are divided into multiple Time Sub-series;For each period of the day from 11 p.m. to 1 a.m in multiple Time Sub-series
Between sequence, calculate each Time Sub-series and conversion after consensus sequence between editing distance;And according to the volume calculating
Collect distance and extract the Time Sub-series similar to consensus sequence from multiple Time Sub-series.
According to another aspect of the present invention, there is provided a kind of for extracting the son similar to consensus sequence from time serieses
Seasonal effect in time series device, including:Sequence transformation unit, is configured to the variation tendency according to time serieses and consensus sequence
Line translation is entered to time serieses and consensus sequence;Time Sub-series cutting unit, is configured to divide the time serieses after conversion
It is slit into multiple Time Sub-series;Editing distance computing unit, is configured to the time sub- for each in multiple Time Sub-series
Sequence, calculates the editing distance between the consensus sequence after each Time Sub-series and conversion;And similar Time Sub-series carry
Take unit, be configured to the editing distance according to calculating and extract the period of the day from 11 p.m. to 1 a.m similar to consensus sequence from multiple Time Sub-series
Between sequence.
In addition, embodiments of the invention additionally provide the computer program for realizing said method.
Additionally, embodiments of the invention additionally provide the computer program of at least computer-readable medium form, its
Upper record has the computer program code for realizing said method.
By the detailed description to highly preferred embodiment of the present invention below in conjunction with accompanying drawing, the these and other of the present invention is excellent
Point will be apparent from.
Brief description
Below with reference to the accompanying drawings illustrate embodiments of the invention, can be more readily understood that the above of the present invention and its
Its objects, features and advantages.Part in accompanying drawing is intended merely to illustrate the principle of the present invention.In the accompanying drawings, identical or similar
Technical characteristic or part will be represented using same or similar reference.
Fig. 1 is to illustrate to be used for according to an embodiment of the invention to extract the sub- time similar to consensus sequence from time serieses
The flow chart of the method for sequence;
Fig. 2 is the curve chart of the load time sequence illustrating a week;
Fig. 3 is the curve chart illustrating two load Time Sub-series and a consensus sequence;
Fig. 4 is to illustrate to be used for according to an embodiment of the invention to extract the sub- time similar to consensus sequence from time serieses
The block diagram of the device of sequence;And
Fig. 5 be illustrate to can be used for implement the present invention for extracting the sub- time similar to consensus sequence from time serieses
The structure chart of the citing of the computing device of the method and apparatus of sequence.
Specific embodiment
Embodiments of the invention to be described with reference to the accompanying drawings.An accompanying drawing or a kind of embodiment of the present invention are retouched
The element stated and feature can be combined with the element shown in one or more other accompanying drawings or embodiment and feature.Should
Work as attention, for purposes of clarity, eliminate in accompanying drawing and explanation known to unrelated to the invention, those of ordinary skill in the art
Part and process expression and description.
To describe referring to Fig. 1 be used for according to an embodiment of the invention from time serieses extract similar to consensus sequence
Time Sub-series method 100.
As shown in figure 1, in step s 102, can respectively the variation tendency according to time serieses and consensus sequence to the time
Sequence and consensus sequence enter line translation.
Specifically, can according to the currentElement in time serieses with respect to previous element or front multiple element change Lai
Line translation is entered to time serieses.Furthermore, it is possible to according to the currentElement in consensus sequence with respect to previous element or front multiple unit
The change of element enters line translation to consensus sequence.
Alternatively, it is possible to according to the currentElement in time serieses with respect to a rear element or multiple elements afterwards change
To enter line translation to time serieses.Furthermore, it is possible to according to the currentElement in consensus sequence with respect to a rear element or multiple afterwards
The change of element enters line translation to consensus sequence.
Herein, the conversion to time serieses and consensus sequence adopts identical transformation rule.Additionally, to time serieses and base
The conversion of quasi- sequence is not limited to upper type, and can with using it may occur to persons skilled in the art that any other mode come
Conversion time sequence and consensus sequence, as long as this conversion can reflect the variation tendency of sequence.
Next, in step S104, the time serieses after conversion can be divided into multiple Time Sub-series.
It is alternatively possible to according to allowing predetermined segmentation step-lengths different from the length of consensus sequence by the time sequence after conversion
Column split becomes multiple Time Sub-series.Furthermore, it is possible to will according to the permission predetermined segmentation length different from the length of consensus sequence
Time serieses after conversion are divided into multiple Time Sub-series.By such flexible partition, Ke Yigeng are executed to time serieses
Plus neatly sliced time sequence, to obtain corresponding segmentation result as needed, and then the similar sub-sequence required for obtaining.
It is for instance possible to obtain and length similar to consensus sequence is different from the similar sub-sequence of consensus sequence.
Next, in step s 106, can be calculated each for each Time Sub-series in multiple Time Sub-series
The editing distance between consensus sequence after Time Sub-series and conversion.
It is alternatively possible in above-mentioned calculating, calculate adding between the consensus sequence after each Time Sub-series and conversion
Power editing distance, wherein can meet one or more of claimed below:For update, the insertion to different elements can
To allow to give different weights;For deletion action, the deletion to different elements can allow to give different weights;And
For replacement operation, the replacement to different elements pair can allow to give different weights.Compared to do not use weight some
Traditional method, by calculate each Time Sub-series and conversion after consensus sequence between weighing edit distance, with benchmark sequence
Arrange even more like Time Sub-series and will have the shorter editing distance to consensus sequence.
In step S108, can be extracted and consensus sequence from multiple Time Sub-series according to the editing distance calculating
Similar Time Sub-series.
Specifically, can extract from multiple Time Sub-series and there is one or many of editing distance less than predetermined threshold
Individual Time Sub-series are as the Time Sub-series similar to consensus sequence.
Alternatively, Time Sub-series that extract predetermined quantity from multiple Time Sub-series, that there is smallest edit distance
As the Time Sub-series similar to consensus sequence.
Certainly, extract the Time Sub-series similar to consensus sequence to be not necessarily intended to by executing with upper type, and can lead to
Cross any other modes that those skilled in the art are contemplated that to execute.
To describe referring to Fig. 2 with Fig. 3 from the load time sequential extraction procedures Time Sub-series similar to consensus sequence
Method.Wherein, Fig. 2 is the curve chart of the load time sequence illustrating a week, and Fig. 3 be illustrate two load Time Sub-series with
The curve chart of one consensus sequence.In figs. 2 and 3, the time serieses being associated with load are given.However, it can easily reason
Solve, the time serieses being associated with load are only examples.In fact, the time serieses handled by the present invention can be appointed
Meaning time serieses and be not limited to Fig. 2 and 3 form.
It is possible, firstly, to the variation tendency according to time serieses and consensus sequence is carried out to time serieses and consensus sequence respectively
Conversion, carries out multiple conversion by comparing currentElement with the change of previous element or front multiple element.For example, time serieses
For c1, c2, c3... ..., cn, wherein n is the integer more than 1.Consensus sequence is b1, b2... ..., bm, wherein m is whole more than 1
Number.Usually, m is less than n, is certainly also not excluded for the situation that m is more than n.
For example, it is possible to conversion time sequence is come according to following formula (1), and consensus sequence is converted according to following formula (2)
Wherein it is possible to preset ε as required, or ε is obtained by parameter training.
Or conversion time sequence can be carried out according to following formula (3), and consensus sequence can be converted according to following formula (4).
Wherein it is possible to preset γ as required, or γ is obtained by parameter training.
Formula (1) and (2) or formula (3) and (4) are only the exemplary mapping modes being given, in fact, as needed may be used
With using it may occur to persons skilled in the art that other mapping modes any, as long as this mapping mode can be with reflecting time sequence
Just permissible with the variation tendency of consensus sequence.
Next, taking formula (1) as a example to describe the conversion of the load time sequence shown in Fig. 2.According to formula (1), can be by
Curve in Fig. 2 is transformed to following form:
200222222120000122221000012222222000000022210000002222220000000122221
00000222222000000002222000000022222200000002220000000002222200000002220000000
222222000000000220000.
It is then possible to line translation is entered to consensus sequence (not shown) according to formula (2).
Next, above-mentioned time serieses can be split.
Preferably, can by flexible partition mode i.e. according to can be different from the length of consensus sequence pre- fixed step size with
And the time serieses after converting are divided into multiple Time Sub-series by predetermined length.
Assume conversion after the length of consensus sequence for 8.Making a reservation for step according to the length identical with consensus sequence
In the case that long and predetermined length carrys out sliced time sequence, i.e. the pre- fixed step size being used in dividing processing and predetermined length are equal
In the case of 8, above-mentioned time serieses for example can be divided into:
20022222,21200001,22221000,01222222......
Also assume that the consensus sequence after conversion length for 8.According to different from the length of consensus sequence pre-
Fixed step size and in the case of carrying out sliced time sequence according to the length identical predetermined length with consensus sequence, for example, in segmentation
In the case of processing the pre- fixed step size that used predetermined length being 8 for 6, above-mentioned time serieses can be divided into:
20022222,22212000,00012222,22100001......
It will again be assumed that conversion after the length of consensus sequence for 8.Making a reservation for according to the length identical with consensus sequence
Step-length and in the case of carrying out sliced time sequence according to the predetermined length different from the length of consensus sequence, for example, in segmentation portion
In the case of managing the pre- fixed step size that used predetermined length being 6 for 8, above-mentioned time serieses can be divided into:
200222,212000,222210,012222......
Assume again that the consensus sequence after conversion length for 8.According to different from the length of consensus sequence pre-
Fixed step size and in the case of carrying out sliced time sequence according to the predetermined length different from the length of consensus sequence, for example, in segmentation
In the case of processing the pre- fixed step size that used predetermined length being 6 for 6, above-mentioned time serieses can be divided into:
200222,222120,000122,221000......
It should be understood that here 6 and 8 is all exemplary, and do not have any restricted.
Calculate the weighting editor between each Time Sub-series in multiple Time Sub-series and the consensus sequence after conversion
Distance, for example:Time Sub-series X { c after conversion1, c2..., ciAnd consensus sequence Y { b1, b2..., bjEditing distance D
(i, j) can be calculated by following formula:
Editing distance refer to from by some operations by a Sequence Transformed minimum cost needed for another sequence, wherein
ωiFor the cost of update, ωdFor the cost of deletion action, ωrCost for replacement operation.In fact, as retouched before
State, for update, the insertion to different elements can allow to give different weights;For deletion action, to difference
The deletion of element can allow to give different weights;And for replacement operation, the replacement to different elements pair can allow
Give different weights.
Hereinafter, for update, the insertion to different elements gives identical weight;For deletion action, to difference
The deletion of element gives identical weight;And for replacement operation, the replacement to different elements pair gives different weights.Can
To determine ω according to formula (6)-(8)i, ωdAnd ωr:
ωi=1 (6)
ωd=1 (7)
Wherein, xiAnd yiIt is intended to respectively calculate the element in two sequences of similarity distance.
After calculating editing distance according to formula (5)-(8), extract the period of the day from 11 p.m. to 1 a.m that weighing edit distance is less than predetermined threshold
Between sequence, or the minimum a number of subsequence of distance is as similar Time Sub-series.
Fig. 3 is the curve chart illustrating two load Time Sub-series and a consensus sequence.As shown in figure 3, A and B is respectively
Represent Time Sub-series, C represents consensus sequence.Note, the time serieses of the Fig. 2 that do not continue here providing Time Sub-series,
And merely to the purpose of formula (5)-(8), arbitrarily give Time Sub-series A and Time Sub-series B.It should be understood that
Time Sub-series to obtained by it is also possible to being split to the time serieses in Fig. 2 and being obtained Time Sub-series, then calculate
Editing distance and consensus sequence C between.
According to formula (1), Time Sub-series A can be transformed to:
00112222111000122221000.
According to formula (1), Time Sub-series B can be transformed to:
00112222222000122221000.
According to formula (2), consensus sequence C can be transformed to:
00112222000200122221000.
According to the computational methods of traditional editing distance, that is, in the case of not introducing weight, Time Sub-series A and benchmark
Editing distance D (A, C)=4 between sequence C, the editing distance D (B, C)=4 between Time Sub-series B and consensus sequence C.
Volume as follows according to the weighing edit distance that formula (5)-(8) calculate, between Time Sub-series A and consensus sequence C
Collect apart from D ' (A, C)=2.5, the editing distance D ' (B, C)=4 between Time Sub-series B and consensus sequence C.
As can be seen from Figure 3 most of coincidence of Time Sub-series A, Time Sub-series B and consensus sequence C, only it
Mid portion differ.The mid portion of Time Sub-series B is above, during the mid portion of Time Sub-series A is located at
Between, the mid portion of consensus sequence C is located at lower section, therefore Time Sub-series A closer to consensus sequence C.According to weighting editor
The result that the computational methods of distance are calculated more conforms to the truth in Fig. 3, Time Sub-series A and consensus sequence C it
Between editing distance D ' (A, C)=2.5, the editing distance D ' (B, C)=4 between Time Sub-series B and consensus sequence C, i.e. D '
(A, C) < D ' (B, C).By contrast, in the traditional editing distance computational methods not being introduced into weight, Time Sub-series A and
Editing distance D (A, C)=4 between consensus sequence C, the editing distance D (B, C) between Time Sub-series B and consensus sequence C=
4, i.e. D (A, C)=D (B, C), and such result cannot reflect the truth in Fig. 3.
To describe hereinafter with reference to Fig. 4 and to be used for according to an embodiment of the invention extracting and consensus sequence phase from time serieses
As Time Sub-series device 400.Wherein, with the similar portion of said method not repeating, and it is only description device 400
Basic framework.
As shown in figure 4, device 400 can include sequence transformation unit 402, Time Sub-series cutting unit 404, editor away from
From computing unit 406 and similar Time Sub-series extraction unit 408.
Wherein, sequence transformation unit 402 can be configured to the variation tendency according to time serieses and consensus sequence
Line translation is entered to time serieses and consensus sequence.
Sequence transformation unit 402 can include time serieses conversion subelement and consensus sequence conversion subelement (does not all show
Go out).Time serieses conversion subelement can be according to the currentElement in time serieses with respect to previous element or front multiple element
Change to enter line translation to time serieses.Consensus sequence conversion subelement can according to the currentElement in consensus sequence relatively
Change in previous element or front multiple element enters line translation to consensus sequence.
Alternatively, time serieses conversion subelement can be according to the currentElement in time serieses with respect to a rear element
Or after the change of multiple elements to enter line translation to time serieses.Consensus sequence conversion subelement can be according in consensus sequence
CurrentElement enters line translation to consensus sequence with respect to a rear element or the afterwards change of multiple elements.
Time Sub-series cutting unit 404 can be configured to for the time serieses after conversion to be divided into multiple sub- time sequences
Row.
Time Sub-series cutting unit 404 is additionally configured to the predetermined segmentations different from the length of consensus sequence according to permission
Step-length and/or allow predetermined segmentation length different from the length of consensus sequence that the time serieses after conversion are divided into many height
Time serieses.
Editing distance computing unit 406 can be configured to for each Time Sub-series in multiple Time Sub-series,
Calculate the editing distance between the consensus sequence after each Time Sub-series and conversion.
Wherein, editing distance computing unit 406 can calculate between the consensus sequence after each Time Sub-series and conversion
Weighing edit distance, wherein meet one or more of claimed below:For update, the insertion to different elements is permitted
Permitted to give different weights;For deletion action, the deletion to different elements allows to give different weights;Grasp for replacing
Make, the replacement to different elements pair allows to give different weights.
Similar Time Sub-series extraction unit 408 can be configured to according to the editing distance calculating from multiple sub- times
The Time Sub-series similar to consensus sequence are extracted in sequence.
Similar Time Sub-series extraction unit 408 can be configured to extract from multiple Time Sub-series to be had less than pre-
The one or more Time Sub-series of editing distance determining threshold value are as the Time Sub-series similar to consensus sequence.
Alternatively, similar Time Sub-series extraction unit 408 is additionally configured to extract in advance from multiple Time Sub-series
Fixed number amount, the Time Sub-series with smallest edit distance are as the Time Sub-series similar to consensus sequence.
Technical scheme according to an embodiment of the invention, can extract similar Time Sub-series, it is right to realize more quickly
Seasonal effect in time series flexible partition, and as needed the physical significance according to sequence extracting similar sub-sequence.
Describe the ultimate principle of the present invention above in association with specific embodiment, however, it is desirable to it is noted that to this area
It is to be understood that whole or any steps of methods and apparatus of the present invention or part, Ke Yi for those of ordinary skill
In any computing device (including processor, storage medium etc.) or the network of computing device, with hardware, firmware, software or
Combinations thereof is realized, and this is that those of ordinary skill in the art use them in the case of the explanation having read the present invention
Basic programming skill can be achieved with.
Therefore, the purpose of the present invention can also by run on any computing device a program or batch processing Lai
Realize.Described computing device can be known fexible unit.Therefore, the purpose of the present invention can also comprise only by offer
The program product of program code realizing methods described or device is realizing.That is, such program product is also constituted
The present invention, and the storage medium of such program product that is stored with also constitutes the present invention.Obviously, described storage medium can be
Any known storage medium or any storage medium being developed in the future.
In the case that embodiments of the invention are realized by software and/or firmware, from storage medium or network to having
The computer of specialized hardware structure, such as the general purpose computer 500 shown in Fig. 5 installs the program constituting this software, this computer
When being provided with various program, it is able to carry out various functions etc..
In Figure 5, CPU (CPU) 501 is according to the program of storage in read only memory (ROM) 502 or from depositing
Storage part 508 is loaded into the various process of program performing of random access memory (RAM) 503.In RAM 503, also according to need
Store the data required when CPU 501 executes various process etc..CPU 501, ROM 502 and RAM 503 are via bus
504 links each other.Input/output interface 505 also link to bus 504.
Components described below link is to input/output interface 505:Importation 506 (including keyboard, mouse etc.), output section
Divide 507 (including display, such as cathode ray tube (CRT), liquid crystal display (LCD) etc., and speakers etc.), storage part
508 (including hard disks etc.), communications portion 509 (including NIC such as LAN card, modem etc.).Communications portion 509
Via network such as the Internet execution communication process.As needed, driver 510 also can link to input/output interface 505.
Detachable media 511 such as disk, CD, magneto-optic disk, semiconductor memory etc. are installed in driver 510 as needed
Above so that the computer program reading out is installed in storage part 508 as needed.
In the case that above-mentioned series of processes is realized by software, such as removable from network such as the Internet or storage medium
Unload medium 511 and the program constituting software is installed.
It will be understood by those of skill in the art that this storage medium be not limited to wherein having program stored therein shown in Fig. 5,
Separately distribute with equipment to provide a user with the detachable media 511 of program.The example of detachable media 511 comprises disk
(comprising floppy disk (registered trade mark)), CD (comprising compact disc read-only memory (CD-ROM) and digital universal disc (DVD)), magneto-optic disk
(comprising mini-disk (MD) (registered trade mark)) and semiconductor memory.Or, storage medium can be ROM 502, storage part
Hard disk comprising in 508 etc., wherein computer program stored, and it is distributed to user together with the equipment comprising them.
The present invention also proposes a kind of program product of the instruction code of the machine-readable that is stored with.Instruction code is read by machine
When taking and executing, can perform above-mentioned method according to embodiments of the present invention.
Correspondingly, the storage medium for carrying the program product of the instruction code of the above-mentioned machine-readable that is stored with also wraps
Include in disclosure of the invention.Storage medium includes but is not limited to floppy disk, CD, magneto-optic disk, storage card, memory stick etc..
It should be appreciated by those skilled in the art that enumerated at this is exemplary, the invention is not limited in this.
In this manual, " first ", " second " and " n-th " etc. statement be in order to by described feature in word
On distinguish, so that the present invention is explicitly described.Therefore, should not serve to that there is any determinate implication.
As an example, each step of said method and all modules of the said equipment and/or unit can
To be embodied as software, firmware, hardware or a combination thereof, and as the part in relevant device.In said apparatus, each forms mould
Block, unit when being configured by way of software, firmware, hardware or a combination thereof spendable specific means or mode be ability
Known to field technique personnel, will not be described here.
As an example, in the case of being realized by software or firmware, can be from storage medium or network to having
The computer (general purpose computer 500 for example shown in Fig. 5) of specialized hardware structure installs the program constituting this software, this computer
When being provided with various program, it is able to carry out various functions etc..
In the description to the specific embodiment of the invention above, for a kind of description of embodiment and/or the feature that illustrates
Can be used in one or more other embodiments in same or similar mode, with the feature in other embodiment
Combined, or substitute the feature in other embodiment.
It should be emphasized that term "comprises/comprising" refers to the presence of feature, key element, step or assembly herein when using, but simultaneously
It is not excluded for other features one or more, the presence of key element, step or assembly or additional.
Additionally, the method for the present invention be not limited to specifications described in time sequencing executing it is also possible to according to it
He time sequencing ground, concurrently or independently execute.Therefore, the execution sequence of the method described in this specification is not to this
Bright technical scope is construed as limiting.
Although being had been disclosed to the present invention by the description of the specific embodiment to the present invention above, should
This understanding, those skilled in the art can design in the spirit and scope of the appended claims the various modifications to the present invention,
Improve or equivalent.These modifications, improvement or equivalent should also be as being to be considered as included in protection scope of the present invention.
With regard to including the embodiment of above example, following remarks is also disclosed.
Remarks
A kind of method for extracting the Time Sub-series similar to consensus sequence from time serieses of remarks 1., including:
Variation tendency according to described time serieses and described consensus sequence is to described time serieses and described benchmark respectively
Sequence enters line translation;
Time serieses after conversion are divided into multiple Time Sub-series;
For each Time Sub-series in the plurality of Time Sub-series, calculate each Time Sub-series described and conversion
The editing distance between consensus sequence afterwards;And
The son similar to described consensus sequence is extracted from the plurality of Time Sub-series according to the editing distance calculating
Time serieses.
Method according to remarks 1 for the remarks 2., wherein, the described change according to time serieses and described consensus sequence respectively
The step that change trend enters line translation to described time serieses and described consensus sequence includes:
According to the currentElement in described time serieses with respect to previous element or front multiple element change come to described
Time serieses enter line translation;And
According to the currentElement in described consensus sequence with respect to previous element or front multiple element change come to described
Consensus sequence enters line translation.
Method according to remarks 1 for the remarks 3., wherein, the described change according to time serieses and described consensus sequence respectively
The step that change trend enters line translation to described time serieses and described consensus sequence includes:
According to the currentElement in described time serieses with respect to a rear element or multiple elements afterwards change come to described
Time serieses enter line translation;And
According to the currentElement in described consensus sequence with respect to a rear element or multiple elements afterwards change come to described
Consensus sequence enters line translation.
Method according to remarks 1 for the remarks 4., wherein, the described base calculating after each Time Sub-series described and conversion
The step of the editing distance between quasi- sequence is included between the consensus sequence after calculating each Time Sub-series described and conversion
Weighing edit distance, wherein meets one or more of claimed below:
For update, the insertion to different elements allows to give different weights;
For deletion action, the deletion to different elements allows to give different weights;And
For replacement operation, the replacement to different elements pair allows to give different weights.
Method according to remarks 1 for the remarks 5., wherein, the editing distance that described basis calculates is from the plurality of period of the day from 11 p.m. to 1 a.m
Between extract the step of the Time Sub-series similar to described consensus sequence in sequence and include:
One or more period of the day from 11 p.m. to 1 a.m with the editing distance less than predetermined threshold are extracted from the plurality of Time Sub-series
Between sequence as the Time Sub-series similar to described consensus sequence.
Method according to any one of remarks 1 to 5 for the remarks 6., wherein, described will conversion after time serieses segmentation
The step becoming multiple Time Sub-series includes:
According to the permission predetermined segmentation step-lengths different from the length of described consensus sequence and/or permission and described consensus sequence
The different predetermined segmentation length of length the time serieses after described conversion are divided into the plurality of Time Sub-series.
A kind of device for extracting the Time Sub-series similar to consensus sequence from time serieses of remarks 7., including:
Sequence transformation unit, is configured to variation tendency according to described time serieses and described consensus sequence to institute
State time serieses and described consensus sequence enters line translation;
Time Sub-series cutting unit, is configured to for the time serieses after conversion to be divided into multiple Time Sub-series;
Editing distance computing unit, is configured to, for each Time Sub-series in the plurality of Time Sub-series, count
Calculate the editing distance between the consensus sequence after each Time Sub-series described and conversion;And
Similar Time Sub-series extraction unit, is configured to according to the editing distance calculating from the plurality of sub- time sequence
The Time Sub-series similar to described consensus sequence are extracted in row.
Device according to remarks 7 for the remarks 8., wherein, described sequence transformation unit includes:
Time serieses convert subelement, are configured to according to the currentElement in described time serieses with respect to previous element
Or the change of front multiple element enters line translation to described time serieses;And
Consensus sequence converts subelement, is configured to according to the currentElement in described consensus sequence with respect to previous element
Or the change of front multiple element enters line translation to described consensus sequence.
Device according to remarks 7 for the remarks 9., wherein, described sequence transformation unit includes:
Time serieses convert subelement, are configured to according to the currentElement in described time serieses with respect to a rear element
Or after the change of multiple elements to enter line translation to described time serieses;And
Consensus sequence converts subelement, is configured to according to the currentElement in described consensus sequence with respect to a rear element
Or after the change of multiple elements to enter line translation to described consensus sequence.
Device according to remarks 7 for the remarks 10., wherein, it is described that described editing distance computing unit is configured to calculating
Each Time Sub-series and conversion after consensus sequence between weighing edit distance, wherein meet one of claimed below or
Multiple:
For update, the insertion to different elements allows to give different weights;
For deletion action, the deletion to different elements allows to give different weights;And
For replacement operation, the replacement to different elements pair allows to give different weights.
Device according to remarks 7 for the remarks 11., wherein, described similar Time Sub-series extraction unit is additionally configured to:
One or more period of the day from 11 p.m. to 1 a.m with the editing distance less than predetermined threshold are extracted from the plurality of Time Sub-series
Between sequence as the Time Sub-series similar to described consensus sequence.
Device according to any one of remarks 7 to 11 for the remarks 12., wherein, described Time Sub-series cutting unit is also
It is configured to:
According to the permission predetermined segmentation step-lengths different from the length of described consensus sequence and/or permission and described consensus sequence
The different predetermined segmentation length of length the time serieses after described conversion are divided into the plurality of Time Sub-series.
A kind of program product of the instruction code of the machine-readable that is stored with of remarks 13., described instruction code is read by machine
When taking and executing, can perform similar to consensus sequence for extracting from time serieses according to any one of remarks 1-6
The method of Time Sub-series.
A kind of storage medium carrying the program product according to remarks 13 of remarks 14..
Claims (10)
1. a kind of time series forecasting, cluster, method for detecting abnormality, including from time serieses ciExtract and consensus sequence biSimilar
Time Sub-series, described from time serieses ciExtract and consensus sequence biSimilar Time Sub-series include:
Respectively according to following formula 5 or by the obtained formula after i+1 replacement of the i-1 in following formula 5, by the described time
Sequence ciAnd described consensus sequence biIt is transformed to represent described time serieses ciAnd described consensus sequence biVariation tendency
Sequence;
Time serieses after conversion are divided into multiple Time Sub-series;
For each Time Sub-series in the plurality of Time Sub-series, after calculating each Time Sub-series described and converting
Editing distance between consensus sequence;And
Extracted the sub- time similar to described consensus sequence from the plurality of Time Sub-series according to the editing distance calculating
Sequence,
Formula 5:
, wherein, γ is coefficient.
2. method according to claim 1, wherein,
The sequence representing described seasonal effect in time series variation tendency is to represent currentElement in described time serieses with respect to previous
The sequence of the change of element or front multiple element,
The sequence representing the variation tendency of described consensus sequence is to represent currentElement in described consensus sequence with respect to previous
The sequence of the change of element or front multiple element.
3. method according to claim 1, wherein,
The sequence representing described seasonal effect in time series variation tendency is to represent currentElement in described time serieses with respect to rear one
Element or the sequence of the change of multiple elements afterwards,
The sequence representing the variation tendency of described consensus sequence is to represent currentElement in described consensus sequence with respect to rear one
Element or the sequence of the change of multiple elements afterwards.
4. method according to claim 1, wherein, the described benchmark sequence calculating after each Time Sub-series described and conversion
The step of the editing distance between row includes the weighting between the consensus sequence after calculating each Time Sub-series described and conversion
Editing distance, wherein meets one or more of claimed below:
For update, the insertion to different elements allows to give different weights;
For deletion action, the deletion to different elements allows to give different weights;And
For replacement operation, the replacement to different elements pair allows to give different weights.
5. method according to claim 1, wherein, the editing distance that described basis calculates is from the plurality of sub- time sequence
The step extracting the Time Sub-series similar to described consensus sequence in row includes:
The one or more sub- time sequence with the editing distance less than predetermined threshold is extracted from the plurality of Time Sub-series
Row are as the Time Sub-series similar to described consensus sequence.
6. method according to any one of claim 1 to 5, wherein, described by conversion after time serieses be divided into many
Individual sub- seasonal effect in time series step includes:
According to the length allowing the predetermined segmentation step-lengths different from the length of described consensus sequence and permission and described consensus sequence
Time serieses after described conversion are divided into the plurality of Time Sub-series by different predetermined segmentation length.
7. a kind of time series forecasting, cluster, abnormality detecting apparatus, including for from time serieses ciExtract and consensus sequence bi
The device of similar Time Sub-series, described for from time serieses ciExtract and consensus sequence biSimilar Time Sub-series
Device includes:
Sequence transformation unit, be configured to according to following formula 6 or by the i-1 in formula 6 with i+1 replacement after obtained
Formula, by described time serieses ciAnd described consensus sequence biIt is transformed to represent described time serieses ciAnd described benchmark
Sequence biVariation tendency sequence;
Time Sub-series cutting unit, is configured to for the time serieses after conversion to be divided into multiple Time Sub-series;
Editing distance computing unit, is configured to, for each Time Sub-series in the plurality of Time Sub-series, calculate institute
State the editing distance between the consensus sequence after each Time Sub-series and conversion;And
Similar Time Sub-series extraction unit, is configured to according to the editing distance calculating from the plurality of Time Sub-series
Extract the Time Sub-series similar to described consensus sequence,
Formula 6:
, wherein, γ is coefficient.
8. equipment according to claim 7, wherein,
The sequence representing described seasonal effect in time series variation tendency is to represent currentElement in described time serieses with respect to previous
The sequence of the change of element or front multiple element,
The sequence representing the variation tendency of described consensus sequence is to represent currentElement in described consensus sequence with respect to previous
The sequence of the change of element or front multiple element.
9. equipment according to claim 7, wherein,
The sequence representing described seasonal effect in time series variation tendency is to represent currentElement in described time serieses with respect to rear one
Element or the sequence of the change of multiple elements afterwards,
The sequence representing the variation tendency of described consensus sequence is to represent currentElement in described consensus sequence with respect to rear one
Element or the sequence of the change of multiple elements afterwards.
10. equipment according to claim 7, wherein, described editing distance computing unit is configured to each described in calculating
The weighing edit distance between consensus sequence after Time Sub-series and conversion, wherein meets one of claimed below or many
Individual:
For update, the insertion to different elements allows to give different weights;
For deletion action, the deletion to different elements allows to give different weights;And
For replacement operation, the replacement to different elements pair allows to give different weights.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110203979.9A CN102880621B (en) | 2011-07-14 | 2011-07-14 | The method and apparatus extracting similar Time Sub-series |
JP2012156266A JP6111543B2 (en) | 2011-07-14 | 2012-07-12 | Method and apparatus for extracting similar sub time series |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110203979.9A CN102880621B (en) | 2011-07-14 | 2011-07-14 | The method and apparatus extracting similar Time Sub-series |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102880621A CN102880621A (en) | 2013-01-16 |
CN102880621B true CN102880621B (en) | 2017-03-01 |
Family
ID=47481949
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201110203979.9A Expired - Fee Related CN102880621B (en) | 2011-07-14 | 2011-07-14 | The method and apparatus extracting similar Time Sub-series |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP6111543B2 (en) |
CN (1) | CN102880621B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103577562B (en) * | 2013-10-24 | 2016-08-31 | 河海大学 | A kind of many measuring periods sequence similarity analyzes method |
CN104714953A (en) * | 2013-12-12 | 2015-06-17 | 日本电气株式会社 | Time series data motif identification method and device |
CN104809134B (en) * | 2014-01-27 | 2018-03-09 | 国际商业机器公司 | The method and apparatus for detecting the abnormal subsequence in data sequence |
CN105224543A (en) * | 2014-05-30 | 2016-01-06 | 国际商业机器公司 | For the treatment of seasonal effect in time series method and apparatus |
CN108573059B (en) * | 2018-04-26 | 2021-02-19 | 哈尔滨工业大学 | Time sequence classification method and device based on feature sampling |
CN109902703B (en) * | 2018-09-03 | 2021-09-21 | 华为技术有限公司 | Time series abnormity detection method and device |
CN109543083B (en) * | 2018-11-19 | 2020-12-22 | 国网陕西省电力公司电力科学研究院 | Method for detecting abnormal data in real-time data of multi-element power grid |
CN111027606B (en) * | 2019-11-29 | 2022-05-31 | 中国科学院空间应用工程与技术中心 | Multi-mode time series anomaly detection method, storage medium and equipment |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101996174A (en) * | 2009-08-11 | 2011-03-30 | 上海汉光知识产权数据科技有限公司 | Time series analysis method of patent documentations |
CN102004795A (en) * | 2010-12-08 | 2011-04-06 | 中国科学院自动化研究所 | Hand language searching method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000194713A (en) * | 1998-12-25 | 2000-07-14 | Nippon Telegr & Teleph Corp <Ntt> | Method and device for retrieving character string, and storage medium stored with character string retrieval program |
-
2011
- 2011-07-14 CN CN201110203979.9A patent/CN102880621B/en not_active Expired - Fee Related
-
2012
- 2012-07-12 JP JP2012156266A patent/JP6111543B2/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101996174A (en) * | 2009-08-11 | 2011-03-30 | 上海汉光知识产权数据科技有限公司 | Time series analysis method of patent documentations |
CN102004795A (en) * | 2010-12-08 | 2011-04-06 | 中国科学院自动化研究所 | Hand language searching method |
Also Published As
Publication number | Publication date |
---|---|
JP2013025805A (en) | 2013-02-04 |
JP6111543B2 (en) | 2017-04-12 |
CN102880621A (en) | 2013-01-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102880621B (en) | The method and apparatus extracting similar Time Sub-series | |
Hickey et al. | Genotyping structural variants in pangenome graphs using the vg toolkit | |
Collister et al. | Calculating polygenic risk scores (PRS) in UK Biobank: a practical guide for epidemiologists | |
Liu et al. | BinPacker: packing-based de novo transcriptome assembly from RNA-seq data | |
Sanderson et al. | Phylogenomics with incomplete taxon coverage: the limits to inference | |
Baele et al. | Bayesian evolutionary model testing in the phylogenomics era: matching model complexity with computational efficiency | |
Stram et al. | Choosing haplotype-tagging SNPS based on unphased genotype data using a preliminary sample of unrelated subjects with an example from the Multiethnic Cohort Study | |
Izquierdo-Carrasco et al. | Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees | |
US8725734B2 (en) | Sorting multiple records of data using ranges of key values | |
EP2759952B1 (en) | Efficient genomic read alignment in an in-memory database | |
CN105989080A (en) | Apparatus and method for determining entity attribute values | |
Jewett et al. | A coalescent model for genotype imputation | |
KR20190024701A (en) | Concurrent multi-bit adder | |
CN112735596A (en) | Similar patient determination method and device, electronic equipment and storage medium | |
JP2000293494A (en) | Device and method for parallel computation | |
Excoffier et al. | An integrated software package for population genetics data analysis | |
US20130060728A1 (en) | Generating a mixed integer linear programming matrix from an annotated entity-relationship data model and a symbolic matrix | |
Wu | An algorithm for computing the gene tree probability under the multispecies coalescent and its application in the inference of population tree | |
CN116130002A (en) | DNA sequence polymorphism analysis method and system | |
Zhou et al. | Analysis of secondary phenotypes in multigroup association studies | |
CN106445975B (en) | Item set mining method and device | |
Li et al. | A novel scaffolding algorithm based on contig error correction and path extension | |
Nato Jr et al. | PBAP: a pipeline for file processing and quality control of pedigree data with dense genetic markers | |
Juan et al. | PGsim: a comprehensive and highly customizable personal genome simulator | |
US8341564B1 (en) | Method and system for optimizing migrated implementation of a system design |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20170301 Termination date: 20180714 |