[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN102880621B - The method and apparatus extracting similar Time Sub-series - Google Patents

The method and apparatus extracting similar Time Sub-series Download PDF

Info

Publication number
CN102880621B
CN102880621B CN201110203979.9A CN201110203979A CN102880621B CN 102880621 B CN102880621 B CN 102880621B CN 201110203979 A CN201110203979 A CN 201110203979A CN 102880621 B CN102880621 B CN 102880621B
Authority
CN
China
Prior art keywords
time
series
sequence
consensus sequence
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201110203979.9A
Other languages
Chinese (zh)
Other versions
CN102880621A (en
Inventor
杨宇航
孟遥
夏迎炬
陆应亮
于浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to CN201110203979.9A priority Critical patent/CN102880621B/en
Priority to JP2012156266A priority patent/JP6111543B2/en
Publication of CN102880621A publication Critical patent/CN102880621A/en
Application granted granted Critical
Publication of CN102880621B publication Critical patent/CN102880621B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to the method and apparatus for extracting the Time Sub-series similar to consensus sequence from time serieses.Wherein, the method for extracting the Time Sub-series similar to consensus sequence from time serieses includes:Variation tendency according to time serieses and consensus sequence enters line translation to time serieses and consensus sequence respectively;Time serieses after conversion are divided into multiple Time Sub-series;For each Time Sub-series in multiple Time Sub-series, calculate the editing distance between the consensus sequence after each Time Sub-series and conversion;And the Time Sub-series similar to consensus sequence are extracted from multiple Time Sub-series according to the editing distance calculating.

Description

The method and apparatus extracting similar Time Sub-series
Technical field
The present invention relates to time Series Processing field is and in particular to be used for similar to consensus sequence from time serieses extraction The method and apparatus of Time Sub-series.
Background technology
Similar sub-sequence extracts to be meaned to extract the subsequence similar with given benchmark.It is should that similar Time Sub-series extract Basic technology for tasks such as time series forecasting, cluster, abnormality detection.For example, need in time series forecasting to extract phase Like subsequence, for training.
In general, similar sub-sequence extracts is related to two steps.First step is time serieses segmentation, i.e. by when Between sequences segmentation be some.Second step is to extract similar sub-sequence based on Similarity measures.For existing phase It is primarily present three below problem like subsequence extractive technique:(1) rigid segmentation makes candidate's setting little;(2) expend storage empty Between and process time;And (3) seldom consider physical significance.
Thus, it is desirable to present a kind of technology that can solve the problem that the problems referred to above.
Content of the invention
Brief overview with regard to the present invention is given below, to provide the basic reason with regard to certain aspects of the invention Solution.It should be appreciated that this general introduction is not the exhaustive general introduction with regard to the present invention.It is not intended to determine the key of the present invention Or pith, nor is it intended to limit the scope of the present invention.Its purpose only provides some designs in simplified form, with This is as the preamble in greater detail discussed after a while.
One main purpose of the present invention is, provide a kind of for extracting the son similar to consensus sequence from time serieses Seasonal effect in time series method and apparatus.
According to an aspect of the invention, it is provided a kind of for extracting the period of the day from 11 p.m. to 1 a.m similar to consensus sequence from time serieses Between sequence method, including:Variation tendency according to time serieses and consensus sequence is entered to time serieses and consensus sequence respectively Line translation;Time serieses after conversion are divided into multiple Time Sub-series;For each period of the day from 11 p.m. to 1 a.m in multiple Time Sub-series Between sequence, calculate each Time Sub-series and conversion after consensus sequence between editing distance;And according to the volume calculating Collect distance and extract the Time Sub-series similar to consensus sequence from multiple Time Sub-series.
According to another aspect of the present invention, there is provided a kind of for extracting the son similar to consensus sequence from time serieses Seasonal effect in time series device, including:Sequence transformation unit, is configured to the variation tendency according to time serieses and consensus sequence Line translation is entered to time serieses and consensus sequence;Time Sub-series cutting unit, is configured to divide the time serieses after conversion It is slit into multiple Time Sub-series;Editing distance computing unit, is configured to the time sub- for each in multiple Time Sub-series Sequence, calculates the editing distance between the consensus sequence after each Time Sub-series and conversion;And similar Time Sub-series carry Take unit, be configured to the editing distance according to calculating and extract the period of the day from 11 p.m. to 1 a.m similar to consensus sequence from multiple Time Sub-series Between sequence.
In addition, embodiments of the invention additionally provide the computer program for realizing said method.
Additionally, embodiments of the invention additionally provide the computer program of at least computer-readable medium form, its Upper record has the computer program code for realizing said method.
By the detailed description to highly preferred embodiment of the present invention below in conjunction with accompanying drawing, the these and other of the present invention is excellent Point will be apparent from.
Brief description
Below with reference to the accompanying drawings illustrate embodiments of the invention, can be more readily understood that the above of the present invention and its Its objects, features and advantages.Part in accompanying drawing is intended merely to illustrate the principle of the present invention.In the accompanying drawings, identical or similar Technical characteristic or part will be represented using same or similar reference.
Fig. 1 is to illustrate to be used for according to an embodiment of the invention to extract the sub- time similar to consensus sequence from time serieses The flow chart of the method for sequence;
Fig. 2 is the curve chart of the load time sequence illustrating a week;
Fig. 3 is the curve chart illustrating two load Time Sub-series and a consensus sequence;
Fig. 4 is to illustrate to be used for according to an embodiment of the invention to extract the sub- time similar to consensus sequence from time serieses The block diagram of the device of sequence;And
Fig. 5 be illustrate to can be used for implement the present invention for extracting the sub- time similar to consensus sequence from time serieses The structure chart of the citing of the computing device of the method and apparatus of sequence.
Specific embodiment
Embodiments of the invention to be described with reference to the accompanying drawings.An accompanying drawing or a kind of embodiment of the present invention are retouched The element stated and feature can be combined with the element shown in one or more other accompanying drawings or embodiment and feature.Should Work as attention, for purposes of clarity, eliminate in accompanying drawing and explanation known to unrelated to the invention, those of ordinary skill in the art Part and process expression and description.
To describe referring to Fig. 1 be used for according to an embodiment of the invention from time serieses extract similar to consensus sequence Time Sub-series method 100.
As shown in figure 1, in step s 102, can respectively the variation tendency according to time serieses and consensus sequence to the time Sequence and consensus sequence enter line translation.
Specifically, can according to the currentElement in time serieses with respect to previous element or front multiple element change Lai Line translation is entered to time serieses.Furthermore, it is possible to according to the currentElement in consensus sequence with respect to previous element or front multiple unit The change of element enters line translation to consensus sequence.
Alternatively, it is possible to according to the currentElement in time serieses with respect to a rear element or multiple elements afterwards change To enter line translation to time serieses.Furthermore, it is possible to according to the currentElement in consensus sequence with respect to a rear element or multiple afterwards The change of element enters line translation to consensus sequence.
Herein, the conversion to time serieses and consensus sequence adopts identical transformation rule.Additionally, to time serieses and base The conversion of quasi- sequence is not limited to upper type, and can with using it may occur to persons skilled in the art that any other mode come Conversion time sequence and consensus sequence, as long as this conversion can reflect the variation tendency of sequence.
Next, in step S104, the time serieses after conversion can be divided into multiple Time Sub-series.
It is alternatively possible to according to allowing predetermined segmentation step-lengths different from the length of consensus sequence by the time sequence after conversion Column split becomes multiple Time Sub-series.Furthermore, it is possible to will according to the permission predetermined segmentation length different from the length of consensus sequence Time serieses after conversion are divided into multiple Time Sub-series.By such flexible partition, Ke Yigeng are executed to time serieses Plus neatly sliced time sequence, to obtain corresponding segmentation result as needed, and then the similar sub-sequence required for obtaining. It is for instance possible to obtain and length similar to consensus sequence is different from the similar sub-sequence of consensus sequence.
Next, in step s 106, can be calculated each for each Time Sub-series in multiple Time Sub-series The editing distance between consensus sequence after Time Sub-series and conversion.
It is alternatively possible in above-mentioned calculating, calculate adding between the consensus sequence after each Time Sub-series and conversion Power editing distance, wherein can meet one or more of claimed below:For update, the insertion to different elements can To allow to give different weights;For deletion action, the deletion to different elements can allow to give different weights;And For replacement operation, the replacement to different elements pair can allow to give different weights.Compared to do not use weight some Traditional method, by calculate each Time Sub-series and conversion after consensus sequence between weighing edit distance, with benchmark sequence Arrange even more like Time Sub-series and will have the shorter editing distance to consensus sequence.
In step S108, can be extracted and consensus sequence from multiple Time Sub-series according to the editing distance calculating Similar Time Sub-series.
Specifically, can extract from multiple Time Sub-series and there is one or many of editing distance less than predetermined threshold Individual Time Sub-series are as the Time Sub-series similar to consensus sequence.
Alternatively, Time Sub-series that extract predetermined quantity from multiple Time Sub-series, that there is smallest edit distance As the Time Sub-series similar to consensus sequence.
Certainly, extract the Time Sub-series similar to consensus sequence to be not necessarily intended to by executing with upper type, and can lead to Cross any other modes that those skilled in the art are contemplated that to execute.
To describe referring to Fig. 2 with Fig. 3 from the load time sequential extraction procedures Time Sub-series similar to consensus sequence Method.Wherein, Fig. 2 is the curve chart of the load time sequence illustrating a week, and Fig. 3 be illustrate two load Time Sub-series with The curve chart of one consensus sequence.In figs. 2 and 3, the time serieses being associated with load are given.However, it can easily reason Solve, the time serieses being associated with load are only examples.In fact, the time serieses handled by the present invention can be appointed Meaning time serieses and be not limited to Fig. 2 and 3 form.
It is possible, firstly, to the variation tendency according to time serieses and consensus sequence is carried out to time serieses and consensus sequence respectively Conversion, carries out multiple conversion by comparing currentElement with the change of previous element or front multiple element.For example, time serieses For c1, c2, c3... ..., cn, wherein n is the integer more than 1.Consensus sequence is b1, b2... ..., bm, wherein m is whole more than 1 Number.Usually, m is less than n, is certainly also not excluded for the situation that m is more than n.
For example, it is possible to conversion time sequence is come according to following formula (1), and consensus sequence is converted according to following formula (2)
Wherein it is possible to preset ε as required, or ε is obtained by parameter training.
Or conversion time sequence can be carried out according to following formula (3), and consensus sequence can be converted according to following formula (4).
Wherein it is possible to preset γ as required, or γ is obtained by parameter training.
Formula (1) and (2) or formula (3) and (4) are only the exemplary mapping modes being given, in fact, as needed may be used With using it may occur to persons skilled in the art that other mapping modes any, as long as this mapping mode can be with reflecting time sequence Just permissible with the variation tendency of consensus sequence.
Next, taking formula (1) as a example to describe the conversion of the load time sequence shown in Fig. 2.According to formula (1), can be by Curve in Fig. 2 is transformed to following form:
200222222120000122221000012222222000000022210000002222220000000122221 00000222222000000002222000000022222200000002220000000002222200000002220000000 222222000000000220000.
It is then possible to line translation is entered to consensus sequence (not shown) according to formula (2).
Next, above-mentioned time serieses can be split.
Preferably, can by flexible partition mode i.e. according to can be different from the length of consensus sequence pre- fixed step size with And the time serieses after converting are divided into multiple Time Sub-series by predetermined length.
Assume conversion after the length of consensus sequence for 8.Making a reservation for step according to the length identical with consensus sequence In the case that long and predetermined length carrys out sliced time sequence, i.e. the pre- fixed step size being used in dividing processing and predetermined length are equal In the case of 8, above-mentioned time serieses for example can be divided into:
20022222,21200001,22221000,01222222......
Also assume that the consensus sequence after conversion length for 8.According to different from the length of consensus sequence pre- Fixed step size and in the case of carrying out sliced time sequence according to the length identical predetermined length with consensus sequence, for example, in segmentation In the case of processing the pre- fixed step size that used predetermined length being 8 for 6, above-mentioned time serieses can be divided into:
20022222,22212000,00012222,22100001......
It will again be assumed that conversion after the length of consensus sequence for 8.Making a reservation for according to the length identical with consensus sequence Step-length and in the case of carrying out sliced time sequence according to the predetermined length different from the length of consensus sequence, for example, in segmentation portion In the case of managing the pre- fixed step size that used predetermined length being 6 for 8, above-mentioned time serieses can be divided into:
200222,212000,222210,012222......
Assume again that the consensus sequence after conversion length for 8.According to different from the length of consensus sequence pre- Fixed step size and in the case of carrying out sliced time sequence according to the predetermined length different from the length of consensus sequence, for example, in segmentation In the case of processing the pre- fixed step size that used predetermined length being 6 for 6, above-mentioned time serieses can be divided into:
200222,222120,000122,221000......
It should be understood that here 6 and 8 is all exemplary, and do not have any restricted.
Calculate the weighting editor between each Time Sub-series in multiple Time Sub-series and the consensus sequence after conversion Distance, for example:Time Sub-series X { c after conversion1, c2..., ciAnd consensus sequence Y { b1, b2..., bjEditing distance D (i, j) can be calculated by following formula:
Editing distance refer to from by some operations by a Sequence Transformed minimum cost needed for another sequence, wherein ωiFor the cost of update, ωdFor the cost of deletion action, ωrCost for replacement operation.In fact, as retouched before State, for update, the insertion to different elements can allow to give different weights;For deletion action, to difference The deletion of element can allow to give different weights;And for replacement operation, the replacement to different elements pair can allow Give different weights.
Hereinafter, for update, the insertion to different elements gives identical weight;For deletion action, to difference The deletion of element gives identical weight;And for replacement operation, the replacement to different elements pair gives different weights.Can To determine ω according to formula (6)-(8)i, ωdAnd ωr
ωi=1 (6)
ωd=1 (7)
Wherein, xiAnd yiIt is intended to respectively calculate the element in two sequences of similarity distance.
After calculating editing distance according to formula (5)-(8), extract the period of the day from 11 p.m. to 1 a.m that weighing edit distance is less than predetermined threshold Between sequence, or the minimum a number of subsequence of distance is as similar Time Sub-series.
Fig. 3 is the curve chart illustrating two load Time Sub-series and a consensus sequence.As shown in figure 3, A and B is respectively Represent Time Sub-series, C represents consensus sequence.Note, the time serieses of the Fig. 2 that do not continue here providing Time Sub-series, And merely to the purpose of formula (5)-(8), arbitrarily give Time Sub-series A and Time Sub-series B.It should be understood that Time Sub-series to obtained by it is also possible to being split to the time serieses in Fig. 2 and being obtained Time Sub-series, then calculate Editing distance and consensus sequence C between.
According to formula (1), Time Sub-series A can be transformed to:
00112222111000122221000.
According to formula (1), Time Sub-series B can be transformed to:
00112222222000122221000.
According to formula (2), consensus sequence C can be transformed to:
00112222000200122221000.
According to the computational methods of traditional editing distance, that is, in the case of not introducing weight, Time Sub-series A and benchmark Editing distance D (A, C)=4 between sequence C, the editing distance D (B, C)=4 between Time Sub-series B and consensus sequence C.
Volume as follows according to the weighing edit distance that formula (5)-(8) calculate, between Time Sub-series A and consensus sequence C Collect apart from D ' (A, C)=2.5, the editing distance D ' (B, C)=4 between Time Sub-series B and consensus sequence C.
As can be seen from Figure 3 most of coincidence of Time Sub-series A, Time Sub-series B and consensus sequence C, only it Mid portion differ.The mid portion of Time Sub-series B is above, during the mid portion of Time Sub-series A is located at Between, the mid portion of consensus sequence C is located at lower section, therefore Time Sub-series A closer to consensus sequence C.According to weighting editor The result that the computational methods of distance are calculated more conforms to the truth in Fig. 3, Time Sub-series A and consensus sequence C it Between editing distance D ' (A, C)=2.5, the editing distance D ' (B, C)=4 between Time Sub-series B and consensus sequence C, i.e. D ' (A, C) < D ' (B, C).By contrast, in the traditional editing distance computational methods not being introduced into weight, Time Sub-series A and Editing distance D (A, C)=4 between consensus sequence C, the editing distance D (B, C) between Time Sub-series B and consensus sequence C= 4, i.e. D (A, C)=D (B, C), and such result cannot reflect the truth in Fig. 3.
To describe hereinafter with reference to Fig. 4 and to be used for according to an embodiment of the invention extracting and consensus sequence phase from time serieses As Time Sub-series device 400.Wherein, with the similar portion of said method not repeating, and it is only description device 400 Basic framework.
As shown in figure 4, device 400 can include sequence transformation unit 402, Time Sub-series cutting unit 404, editor away from From computing unit 406 and similar Time Sub-series extraction unit 408.
Wherein, sequence transformation unit 402 can be configured to the variation tendency according to time serieses and consensus sequence Line translation is entered to time serieses and consensus sequence.
Sequence transformation unit 402 can include time serieses conversion subelement and consensus sequence conversion subelement (does not all show Go out).Time serieses conversion subelement can be according to the currentElement in time serieses with respect to previous element or front multiple element Change to enter line translation to time serieses.Consensus sequence conversion subelement can according to the currentElement in consensus sequence relatively Change in previous element or front multiple element enters line translation to consensus sequence.
Alternatively, time serieses conversion subelement can be according to the currentElement in time serieses with respect to a rear element Or after the change of multiple elements to enter line translation to time serieses.Consensus sequence conversion subelement can be according in consensus sequence CurrentElement enters line translation to consensus sequence with respect to a rear element or the afterwards change of multiple elements.
Time Sub-series cutting unit 404 can be configured to for the time serieses after conversion to be divided into multiple sub- time sequences Row.
Time Sub-series cutting unit 404 is additionally configured to the predetermined segmentations different from the length of consensus sequence according to permission Step-length and/or allow predetermined segmentation length different from the length of consensus sequence that the time serieses after conversion are divided into many height Time serieses.
Editing distance computing unit 406 can be configured to for each Time Sub-series in multiple Time Sub-series, Calculate the editing distance between the consensus sequence after each Time Sub-series and conversion.
Wherein, editing distance computing unit 406 can calculate between the consensus sequence after each Time Sub-series and conversion Weighing edit distance, wherein meet one or more of claimed below:For update, the insertion to different elements is permitted Permitted to give different weights;For deletion action, the deletion to different elements allows to give different weights;Grasp for replacing Make, the replacement to different elements pair allows to give different weights.
Similar Time Sub-series extraction unit 408 can be configured to according to the editing distance calculating from multiple sub- times The Time Sub-series similar to consensus sequence are extracted in sequence.
Similar Time Sub-series extraction unit 408 can be configured to extract from multiple Time Sub-series to be had less than pre- The one or more Time Sub-series of editing distance determining threshold value are as the Time Sub-series similar to consensus sequence.
Alternatively, similar Time Sub-series extraction unit 408 is additionally configured to extract in advance from multiple Time Sub-series Fixed number amount, the Time Sub-series with smallest edit distance are as the Time Sub-series similar to consensus sequence.
Technical scheme according to an embodiment of the invention, can extract similar Time Sub-series, it is right to realize more quickly Seasonal effect in time series flexible partition, and as needed the physical significance according to sequence extracting similar sub-sequence.
Describe the ultimate principle of the present invention above in association with specific embodiment, however, it is desirable to it is noted that to this area It is to be understood that whole or any steps of methods and apparatus of the present invention or part, Ke Yi for those of ordinary skill In any computing device (including processor, storage medium etc.) or the network of computing device, with hardware, firmware, software or Combinations thereof is realized, and this is that those of ordinary skill in the art use them in the case of the explanation having read the present invention Basic programming skill can be achieved with.
Therefore, the purpose of the present invention can also by run on any computing device a program or batch processing Lai Realize.Described computing device can be known fexible unit.Therefore, the purpose of the present invention can also comprise only by offer The program product of program code realizing methods described or device is realizing.That is, such program product is also constituted The present invention, and the storage medium of such program product that is stored with also constitutes the present invention.Obviously, described storage medium can be Any known storage medium or any storage medium being developed in the future.
In the case that embodiments of the invention are realized by software and/or firmware, from storage medium or network to having The computer of specialized hardware structure, such as the general purpose computer 500 shown in Fig. 5 installs the program constituting this software, this computer When being provided with various program, it is able to carry out various functions etc..
In Figure 5, CPU (CPU) 501 is according to the program of storage in read only memory (ROM) 502 or from depositing Storage part 508 is loaded into the various process of program performing of random access memory (RAM) 503.In RAM 503, also according to need Store the data required when CPU 501 executes various process etc..CPU 501, ROM 502 and RAM 503 are via bus 504 links each other.Input/output interface 505 also link to bus 504.
Components described below link is to input/output interface 505:Importation 506 (including keyboard, mouse etc.), output section Divide 507 (including display, such as cathode ray tube (CRT), liquid crystal display (LCD) etc., and speakers etc.), storage part 508 (including hard disks etc.), communications portion 509 (including NIC such as LAN card, modem etc.).Communications portion 509 Via network such as the Internet execution communication process.As needed, driver 510 also can link to input/output interface 505. Detachable media 511 such as disk, CD, magneto-optic disk, semiconductor memory etc. are installed in driver 510 as needed Above so that the computer program reading out is installed in storage part 508 as needed.
In the case that above-mentioned series of processes is realized by software, such as removable from network such as the Internet or storage medium Unload medium 511 and the program constituting software is installed.
It will be understood by those of skill in the art that this storage medium be not limited to wherein having program stored therein shown in Fig. 5, Separately distribute with equipment to provide a user with the detachable media 511 of program.The example of detachable media 511 comprises disk (comprising floppy disk (registered trade mark)), CD (comprising compact disc read-only memory (CD-ROM) and digital universal disc (DVD)), magneto-optic disk (comprising mini-disk (MD) (registered trade mark)) and semiconductor memory.Or, storage medium can be ROM 502, storage part Hard disk comprising in 508 etc., wherein computer program stored, and it is distributed to user together with the equipment comprising them.
The present invention also proposes a kind of program product of the instruction code of the machine-readable that is stored with.Instruction code is read by machine When taking and executing, can perform above-mentioned method according to embodiments of the present invention.
Correspondingly, the storage medium for carrying the program product of the instruction code of the above-mentioned machine-readable that is stored with also wraps Include in disclosure of the invention.Storage medium includes but is not limited to floppy disk, CD, magneto-optic disk, storage card, memory stick etc..
It should be appreciated by those skilled in the art that enumerated at this is exemplary, the invention is not limited in this.
In this manual, " first ", " second " and " n-th " etc. statement be in order to by described feature in word On distinguish, so that the present invention is explicitly described.Therefore, should not serve to that there is any determinate implication.
As an example, each step of said method and all modules of the said equipment and/or unit can To be embodied as software, firmware, hardware or a combination thereof, and as the part in relevant device.In said apparatus, each forms mould Block, unit when being configured by way of software, firmware, hardware or a combination thereof spendable specific means or mode be ability Known to field technique personnel, will not be described here.
As an example, in the case of being realized by software or firmware, can be from storage medium or network to having The computer (general purpose computer 500 for example shown in Fig. 5) of specialized hardware structure installs the program constituting this software, this computer When being provided with various program, it is able to carry out various functions etc..
In the description to the specific embodiment of the invention above, for a kind of description of embodiment and/or the feature that illustrates Can be used in one or more other embodiments in same or similar mode, with the feature in other embodiment Combined, or substitute the feature in other embodiment.
It should be emphasized that term "comprises/comprising" refers to the presence of feature, key element, step or assembly herein when using, but simultaneously It is not excluded for other features one or more, the presence of key element, step or assembly or additional.
Additionally, the method for the present invention be not limited to specifications described in time sequencing executing it is also possible to according to it He time sequencing ground, concurrently or independently execute.Therefore, the execution sequence of the method described in this specification is not to this Bright technical scope is construed as limiting.
Although being had been disclosed to the present invention by the description of the specific embodiment to the present invention above, should This understanding, those skilled in the art can design in the spirit and scope of the appended claims the various modifications to the present invention, Improve or equivalent.These modifications, improvement or equivalent should also be as being to be considered as included in protection scope of the present invention.
With regard to including the embodiment of above example, following remarks is also disclosed.
Remarks
A kind of method for extracting the Time Sub-series similar to consensus sequence from time serieses of remarks 1., including:
Variation tendency according to described time serieses and described consensus sequence is to described time serieses and described benchmark respectively Sequence enters line translation;
Time serieses after conversion are divided into multiple Time Sub-series;
For each Time Sub-series in the plurality of Time Sub-series, calculate each Time Sub-series described and conversion The editing distance between consensus sequence afterwards;And
The son similar to described consensus sequence is extracted from the plurality of Time Sub-series according to the editing distance calculating Time serieses.
Method according to remarks 1 for the remarks 2., wherein, the described change according to time serieses and described consensus sequence respectively The step that change trend enters line translation to described time serieses and described consensus sequence includes:
According to the currentElement in described time serieses with respect to previous element or front multiple element change come to described Time serieses enter line translation;And
According to the currentElement in described consensus sequence with respect to previous element or front multiple element change come to described Consensus sequence enters line translation.
Method according to remarks 1 for the remarks 3., wherein, the described change according to time serieses and described consensus sequence respectively The step that change trend enters line translation to described time serieses and described consensus sequence includes:
According to the currentElement in described time serieses with respect to a rear element or multiple elements afterwards change come to described Time serieses enter line translation;And
According to the currentElement in described consensus sequence with respect to a rear element or multiple elements afterwards change come to described Consensus sequence enters line translation.
Method according to remarks 1 for the remarks 4., wherein, the described base calculating after each Time Sub-series described and conversion The step of the editing distance between quasi- sequence is included between the consensus sequence after calculating each Time Sub-series described and conversion Weighing edit distance, wherein meets one or more of claimed below:
For update, the insertion to different elements allows to give different weights;
For deletion action, the deletion to different elements allows to give different weights;And
For replacement operation, the replacement to different elements pair allows to give different weights.
Method according to remarks 1 for the remarks 5., wherein, the editing distance that described basis calculates is from the plurality of period of the day from 11 p.m. to 1 a.m Between extract the step of the Time Sub-series similar to described consensus sequence in sequence and include:
One or more period of the day from 11 p.m. to 1 a.m with the editing distance less than predetermined threshold are extracted from the plurality of Time Sub-series Between sequence as the Time Sub-series similar to described consensus sequence.
Method according to any one of remarks 1 to 5 for the remarks 6., wherein, described will conversion after time serieses segmentation The step becoming multiple Time Sub-series includes:
According to the permission predetermined segmentation step-lengths different from the length of described consensus sequence and/or permission and described consensus sequence The different predetermined segmentation length of length the time serieses after described conversion are divided into the plurality of Time Sub-series.
A kind of device for extracting the Time Sub-series similar to consensus sequence from time serieses of remarks 7., including:
Sequence transformation unit, is configured to variation tendency according to described time serieses and described consensus sequence to institute State time serieses and described consensus sequence enters line translation;
Time Sub-series cutting unit, is configured to for the time serieses after conversion to be divided into multiple Time Sub-series;
Editing distance computing unit, is configured to, for each Time Sub-series in the plurality of Time Sub-series, count Calculate the editing distance between the consensus sequence after each Time Sub-series described and conversion;And
Similar Time Sub-series extraction unit, is configured to according to the editing distance calculating from the plurality of sub- time sequence The Time Sub-series similar to described consensus sequence are extracted in row.
Device according to remarks 7 for the remarks 8., wherein, described sequence transformation unit includes:
Time serieses convert subelement, are configured to according to the currentElement in described time serieses with respect to previous element Or the change of front multiple element enters line translation to described time serieses;And
Consensus sequence converts subelement, is configured to according to the currentElement in described consensus sequence with respect to previous element Or the change of front multiple element enters line translation to described consensus sequence.
Device according to remarks 7 for the remarks 9., wherein, described sequence transformation unit includes:
Time serieses convert subelement, are configured to according to the currentElement in described time serieses with respect to a rear element Or after the change of multiple elements to enter line translation to described time serieses;And
Consensus sequence converts subelement, is configured to according to the currentElement in described consensus sequence with respect to a rear element Or after the change of multiple elements to enter line translation to described consensus sequence.
Device according to remarks 7 for the remarks 10., wherein, it is described that described editing distance computing unit is configured to calculating Each Time Sub-series and conversion after consensus sequence between weighing edit distance, wherein meet one of claimed below or Multiple:
For update, the insertion to different elements allows to give different weights;
For deletion action, the deletion to different elements allows to give different weights;And
For replacement operation, the replacement to different elements pair allows to give different weights.
Device according to remarks 7 for the remarks 11., wherein, described similar Time Sub-series extraction unit is additionally configured to:
One or more period of the day from 11 p.m. to 1 a.m with the editing distance less than predetermined threshold are extracted from the plurality of Time Sub-series Between sequence as the Time Sub-series similar to described consensus sequence.
Device according to any one of remarks 7 to 11 for the remarks 12., wherein, described Time Sub-series cutting unit is also It is configured to:
According to the permission predetermined segmentation step-lengths different from the length of described consensus sequence and/or permission and described consensus sequence The different predetermined segmentation length of length the time serieses after described conversion are divided into the plurality of Time Sub-series.
A kind of program product of the instruction code of the machine-readable that is stored with of remarks 13., described instruction code is read by machine When taking and executing, can perform similar to consensus sequence for extracting from time serieses according to any one of remarks 1-6 The method of Time Sub-series.
A kind of storage medium carrying the program product according to remarks 13 of remarks 14..

Claims (10)

1. a kind of time series forecasting, cluster, method for detecting abnormality, including from time serieses ciExtract and consensus sequence biSimilar Time Sub-series, described from time serieses ciExtract and consensus sequence biSimilar Time Sub-series include:
Respectively according to following formula 5 or by the obtained formula after i+1 replacement of the i-1 in following formula 5, by the described time Sequence ciAnd described consensus sequence biIt is transformed to represent described time serieses ciAnd described consensus sequence biVariation tendency Sequence;
Time serieses after conversion are divided into multiple Time Sub-series;
For each Time Sub-series in the plurality of Time Sub-series, after calculating each Time Sub-series described and converting Editing distance between consensus sequence;And
Extracted the sub- time similar to described consensus sequence from the plurality of Time Sub-series according to the editing distance calculating Sequence,
Formula 5:
c i = 0 c i ≤ γc i - 1 1 c i > γc i - 1
, wherein, γ is coefficient.
2. method according to claim 1, wherein,
The sequence representing described seasonal effect in time series variation tendency is to represent currentElement in described time serieses with respect to previous The sequence of the change of element or front multiple element,
The sequence representing the variation tendency of described consensus sequence is to represent currentElement in described consensus sequence with respect to previous The sequence of the change of element or front multiple element.
3. method according to claim 1, wherein,
The sequence representing described seasonal effect in time series variation tendency is to represent currentElement in described time serieses with respect to rear one Element or the sequence of the change of multiple elements afterwards,
The sequence representing the variation tendency of described consensus sequence is to represent currentElement in described consensus sequence with respect to rear one Element or the sequence of the change of multiple elements afterwards.
4. method according to claim 1, wherein, the described benchmark sequence calculating after each Time Sub-series described and conversion The step of the editing distance between row includes the weighting between the consensus sequence after calculating each Time Sub-series described and conversion Editing distance, wherein meets one or more of claimed below:
For update, the insertion to different elements allows to give different weights;
For deletion action, the deletion to different elements allows to give different weights;And
For replacement operation, the replacement to different elements pair allows to give different weights.
5. method according to claim 1, wherein, the editing distance that described basis calculates is from the plurality of sub- time sequence The step extracting the Time Sub-series similar to described consensus sequence in row includes:
The one or more sub- time sequence with the editing distance less than predetermined threshold is extracted from the plurality of Time Sub-series Row are as the Time Sub-series similar to described consensus sequence.
6. method according to any one of claim 1 to 5, wherein, described by conversion after time serieses be divided into many Individual sub- seasonal effect in time series step includes:
According to the length allowing the predetermined segmentation step-lengths different from the length of described consensus sequence and permission and described consensus sequence Time serieses after described conversion are divided into the plurality of Time Sub-series by different predetermined segmentation length.
7. a kind of time series forecasting, cluster, abnormality detecting apparatus, including for from time serieses ciExtract and consensus sequence bi The device of similar Time Sub-series, described for from time serieses ciExtract and consensus sequence biSimilar Time Sub-series Device includes:
Sequence transformation unit, be configured to according to following formula 6 or by the i-1 in formula 6 with i+1 replacement after obtained Formula, by described time serieses ciAnd described consensus sequence biIt is transformed to represent described time serieses ciAnd described benchmark Sequence biVariation tendency sequence;
Time Sub-series cutting unit, is configured to for the time serieses after conversion to be divided into multiple Time Sub-series;
Editing distance computing unit, is configured to, for each Time Sub-series in the plurality of Time Sub-series, calculate institute State the editing distance between the consensus sequence after each Time Sub-series and conversion;And
Similar Time Sub-series extraction unit, is configured to according to the editing distance calculating from the plurality of Time Sub-series Extract the Time Sub-series similar to described consensus sequence,
Formula 6:
c i = 0 c i ≤ γc i - 1 1 c i > γc i - 1
, wherein, γ is coefficient.
8. equipment according to claim 7, wherein,
The sequence representing described seasonal effect in time series variation tendency is to represent currentElement in described time serieses with respect to previous The sequence of the change of element or front multiple element,
The sequence representing the variation tendency of described consensus sequence is to represent currentElement in described consensus sequence with respect to previous The sequence of the change of element or front multiple element.
9. equipment according to claim 7, wherein,
The sequence representing described seasonal effect in time series variation tendency is to represent currentElement in described time serieses with respect to rear one Element or the sequence of the change of multiple elements afterwards,
The sequence representing the variation tendency of described consensus sequence is to represent currentElement in described consensus sequence with respect to rear one Element or the sequence of the change of multiple elements afterwards.
10. equipment according to claim 7, wherein, described editing distance computing unit is configured to each described in calculating The weighing edit distance between consensus sequence after Time Sub-series and conversion, wherein meets one of claimed below or many Individual:
For update, the insertion to different elements allows to give different weights;
For deletion action, the deletion to different elements allows to give different weights;And
For replacement operation, the replacement to different elements pair allows to give different weights.
CN201110203979.9A 2011-07-14 2011-07-14 The method and apparatus extracting similar Time Sub-series Expired - Fee Related CN102880621B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201110203979.9A CN102880621B (en) 2011-07-14 2011-07-14 The method and apparatus extracting similar Time Sub-series
JP2012156266A JP6111543B2 (en) 2011-07-14 2012-07-12 Method and apparatus for extracting similar sub time series

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110203979.9A CN102880621B (en) 2011-07-14 2011-07-14 The method and apparatus extracting similar Time Sub-series

Publications (2)

Publication Number Publication Date
CN102880621A CN102880621A (en) 2013-01-16
CN102880621B true CN102880621B (en) 2017-03-01

Family

ID=47481949

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110203979.9A Expired - Fee Related CN102880621B (en) 2011-07-14 2011-07-14 The method and apparatus extracting similar Time Sub-series

Country Status (2)

Country Link
JP (1) JP6111543B2 (en)
CN (1) CN102880621B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577562B (en) * 2013-10-24 2016-08-31 河海大学 A kind of many measuring periods sequence similarity analyzes method
CN104714953A (en) * 2013-12-12 2015-06-17 日本电气株式会社 Time series data motif identification method and device
CN104809134B (en) * 2014-01-27 2018-03-09 国际商业机器公司 The method and apparatus for detecting the abnormal subsequence in data sequence
CN105224543A (en) * 2014-05-30 2016-01-06 国际商业机器公司 For the treatment of seasonal effect in time series method and apparatus
CN108573059B (en) * 2018-04-26 2021-02-19 哈尔滨工业大学 Time sequence classification method and device based on feature sampling
CN109902703B (en) * 2018-09-03 2021-09-21 华为技术有限公司 Time series abnormity detection method and device
CN109543083B (en) * 2018-11-19 2020-12-22 国网陕西省电力公司电力科学研究院 Method for detecting abnormal data in real-time data of multi-element power grid
CN111027606B (en) * 2019-11-29 2022-05-31 中国科学院空间应用工程与技术中心 Multi-mode time series anomaly detection method, storage medium and equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101996174A (en) * 2009-08-11 2011-03-30 上海汉光知识产权数据科技有限公司 Time series analysis method of patent documentations
CN102004795A (en) * 2010-12-08 2011-04-06 中国科学院自动化研究所 Hand language searching method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000194713A (en) * 1998-12-25 2000-07-14 Nippon Telegr & Teleph Corp <Ntt> Method and device for retrieving character string, and storage medium stored with character string retrieval program

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101996174A (en) * 2009-08-11 2011-03-30 上海汉光知识产权数据科技有限公司 Time series analysis method of patent documentations
CN102004795A (en) * 2010-12-08 2011-04-06 中国科学院自动化研究所 Hand language searching method

Also Published As

Publication number Publication date
JP2013025805A (en) 2013-02-04
JP6111543B2 (en) 2017-04-12
CN102880621A (en) 2013-01-16

Similar Documents

Publication Publication Date Title
CN102880621B (en) The method and apparatus extracting similar Time Sub-series
Hickey et al. Genotyping structural variants in pangenome graphs using the vg toolkit
Collister et al. Calculating polygenic risk scores (PRS) in UK Biobank: a practical guide for epidemiologists
Liu et al. BinPacker: packing-based de novo transcriptome assembly from RNA-seq data
Sanderson et al. Phylogenomics with incomplete taxon coverage: the limits to inference
Baele et al. Bayesian evolutionary model testing in the phylogenomics era: matching model complexity with computational efficiency
Stram et al. Choosing haplotype-tagging SNPS based on unphased genotype data using a preliminary sample of unrelated subjects with an example from the Multiethnic Cohort Study
Izquierdo-Carrasco et al. Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees
US8725734B2 (en) Sorting multiple records of data using ranges of key values
EP2759952B1 (en) Efficient genomic read alignment in an in-memory database
CN105989080A (en) Apparatus and method for determining entity attribute values
Jewett et al. A coalescent model for genotype imputation
KR20190024701A (en) Concurrent multi-bit adder
CN112735596A (en) Similar patient determination method and device, electronic equipment and storage medium
JP2000293494A (en) Device and method for parallel computation
Excoffier et al. An integrated software package for population genetics data analysis
US20130060728A1 (en) Generating a mixed integer linear programming matrix from an annotated entity-relationship data model and a symbolic matrix
Wu An algorithm for computing the gene tree probability under the multispecies coalescent and its application in the inference of population tree
CN116130002A (en) DNA sequence polymorphism analysis method and system
Zhou et al. Analysis of secondary phenotypes in multigroup association studies
CN106445975B (en) Item set mining method and device
Li et al. A novel scaffolding algorithm based on contig error correction and path extension
Nato Jr et al. PBAP: a pipeline for file processing and quality control of pedigree data with dense genetic markers
Juan et al. PGsim: a comprehensive and highly customizable personal genome simulator
US8341564B1 (en) Method and system for optimizing migrated implementation of a system design

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170301

Termination date: 20180714