CN102073586B

CN102073586B - Gray generalized regression neural network-based small sample software reliability prediction method

Info

Publication number: CN102073586B
Application number: CN2010106024533A
Authority: CN
Inventors: 吴玉美; 杨日盛; 陆民燕
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2010-12-23
Filing date: 2010-12-23
Publication date: 2012-05-16
Anticipated expiration: 2030-12-23
Also published as: CN102073586A

Abstract

The invention discloses a gray generalized regression neural network-based small sample software reliability prediction method. The method comprises the following steps of: first, respectively emulating and expanding failure time data and test coverage rate data in collected small sample software reliability test data by using an improved Bootstrap method to form expanded reliability data which has the same failure statistical rule as small sample reliability data; then, obtaining a three-dimensional curve of the failure time, the test coverage rate and the unreliability of the expanded reliability data; next, establishing a gray generalized regression neural network; later on, training the gray generalized regression neural network by adopting the expanded reliability data and establishing a small sample software reliability prediction model; and finally, predicting by using the model to obtain software reliability prediction information. The method avoids solving a complex multivariate likelihood equation, and solves the problem that an available prediction model can be obtained only by training a large number of models in artificial neural network modeling in software reliability prediction.

Description

Small sample software reliability method for predicting based on the grey generalized regression nerve networks

Technical field

The present invention relates to the software reliability method for predicting in a kind of software reliability test, be specifically related to a kind of small sample software reliability method for predicting, belong to software reliability and estimate technical field based on the grey generalized regression nerve networks.

Background technology

The develop rapidly of Along with computer technology, people strengthen the dependence of computer software day by day, and are also increasingly high to the reliability requirement of software.Software reliability estimates to be faced with the problem that will estimate modeling to the software that reliability requirement is high, the test data number is few.In addition, if estimate also to exist the data number few, be difficult to satisfy the problem of traditional method for predicting to sample size requirements in the software reliability of carrying out in early days of software reliability test.If according to the conventional method, the later stage of test is carried out the modeling that software reliability is estimated after collecting enough data more by the time, and the time of then waiting for can be difficult to bear, and also exists the few risk of sample data number that is used to estimate simultaneously.

The research of software reliability method for predicting mainly refers to the software reliability growth Study of model, is to utilize the fail data that obtains in the software reliability growth test to come the model that software reliability is estimated and estimated.The modeling thinking of traditional software reliability growth model is: at first certain hypothesis is carried out in the failure mode or the behavior of software, the utilization statistical theory is set up mathematical model on the basis of these hypothesis then.The explicit definition that in these models, has comprised model parameter, and these model parameters often have clear physical meaning.At last, come match fail data curve with this model, thereby confirm model parameter, the estimation of acquisition software reliability is also estimated.

At present; Software reliability estimates that the ubiquitous insoluble problem of modeling is: many software reliability growth models can provide some fail data estimates the result preferably; But the expectation result to other fail datas provide is barely satisfactory; This is called the problem of inconsistency in the Research of reliability model, and it has seriously hindered the popularization of software reliability technology.The reason that causes this problem mainly is that the hypothesis that proposes when setting up model possibly not conform to the actual conditions.

In addition, this software reliability modeling method only depends on software failure data, and is comprehensive inadequately to the excavation of software information, caused the accuracy deficiency of estimating.Therefore; How to take all factors into consideration fail data and Software Delivery affirmation some historical informations before that software reliability test obtains; Wait like the change frequency of programmer's technical ability, test attempts degree, test coverage and program specification book and to set up unified model, the modeling that the various information of trying hard to comprehensively to consider influence software reliability are carried out software reliability becomes a kind of new thinking with assessing.Yet these information or factor be height correlation and be nonlinear relationship with reliability each other usually.This just makes the existing statistical modeling method based on mutual independence and linearity to these data modeling the time, have significant limitation.

Therefore, the neural network modeling approach that can handle multiple factor and can be in theory approach any Nonlinear Mapping with arbitrary accuracy has attracted people's attention in recent years and has obtained some achievements in research.

Artificial neural network carries out the algorithm mathematics model of distributed parallel information processing as a kind of human brain neural network's of imitation behavioural characteristic; Can be from known data learn and extract the inherent law in these data, have very strong non-linear mapping capability through the interconnected relationship between a large amount of nodes of adjustment network internal.Artificial neural network has following outstanding advantage: 1. the height concurrency; 2. the non-linear overall situation effect of height; 3. good fault-tolerance and function of associate memory; 4. ten fens strong self-adaptations, self-learning function.

Generalized regression nerve networks (General Regression Neural Network, be called for short GRNN) is that the Donald Specht by The LockheedPalo Alto research laboratory proposed in 1991, is a kind of Extended forms of probabilistic neural network.Generalized regression nerve networks is based upon on the mathematical statistics basis; Can approach wherein implicit mapping relations according to sample data; Even sample data is rare; The output result of network also can converge on the optimum regression surface, and the learning algorithm of network is simple, also has the concurrency of height in structure aspects.

Generalized regression nerve networks is as radial basis function neural network (Radius Basic Function Neural Network; Abbreviation RBFNN) a kind of improved form; Have fast convergence rate and be difficult for being absorbed in advantage such as local minimum than present widely used reverse transmittance nerve network (Back PropagationNeural Network, be called for short BPNN).In addition; Than common RBFNN, GRNN also has the fault-tolerance and the robustness of stronger Nonlinear Mapping approximation capability and pace of learning, flexible network structure and height, and in sample data more after a little while; Estimate that effect is also relatively good, can also handle unsettled data.

Artificial neural network is a kind of black box method in fact; It is few when it is modeling with respect to the advantage of classic method to being required by the experimental knowledge of modeling object; Generally needn't know in advance about by the knowledge of aspects such as the structure of modeling object, parameter and dynamic perfromance; Only need provide the I/O data of object, the mapping relations that just can imported and export through the learning functionality of network itself.

Yet, the same with software reliability modeling based on statistical theory, need the sample data of some based on the software reliability modeling of neural network, when the capacity of sample seldom the time, be difficult to set up available model, the precision of expectation may be very poor.

Gray system theory is that the famous scholar Deng Julong of China teaches in the eighties foundation in last century; Known with partial information; Small sample, poor information and probabilistic system that partial information is unknown are research object; Mainly through to the regenerating of part Given information, develop, means such as analysis, therefrom extract valuable or regular information, realize the correct description of system's moving law and effectively control.Its main research contents comprises gray system analysis, modeling, expectation, decision-making and control; It extends to abstract systems such as society, economy, ecology to general system theory, information theory, cybernatic viewpoint and method; In conjunction with the utilization mathematical method; Having developed the incomplete system of a cover solution information is the theory and the method for gray system, has formed complete theoretical system.

Be provided with original data sequence X ⁽⁰⁾=[x ⁽⁰⁾(1), x ⁽⁰⁾(2), x ⁽⁰⁾(3) ..., x ⁽⁰⁾(n)], symbol x wherein ⁽ⁱ⁾(k) expression belongs to sequence X ⁽ⁱ⁾K element, definition level ratio:

σ(k)＝x ⁽⁰⁾(k-1)/x ⁽⁰⁾(k)

The data sequence of a n dimension; Its level is bigger if level exceeds coverage than sequence than coverage is

; Then should make conversion process to former data; Make the level of the sequence after its conversion can hold covering than falling within, conversion process commonly used has: translation transformation, log-transformation, root conversion.

Gray theory is thought, although one group meets grade presentation that the objective data than coverage condition shows at random, disperses, excavates its inherent law through suitable mode, and the randomness that necessarily can weaken strengthens regular.Grey generating run (AGO) a kind of processing mode that comes to this that adds up.if k=1; 2;, n

Then claim X ⁽¹⁾=[x ⁽¹⁾(1), x ⁽¹⁾(2), x ⁽¹⁾(3) ..., x ⁽¹⁾(n)] be original series X ⁽⁰⁾1 AGO formation sequence (with original series X ⁽⁰⁾R AGO formation sequence be designated as X ^(r)).Otherwise, by X ⁽¹⁾Obtain X ⁽⁰⁾Process be called IAGO.Gray system theory estimates that the step of modeling is following:

To grey formation sequence X ⁽¹⁾Set up the first order differential albefaction equation of Grey models GM (1,1):

\frac{{dx}^{(1)}}{dk} + a \cdot x^{(1)} = u

Model parameter-a is the development coefficient, and u is the grey action, the model parameter of utilization least squares identification equation:

\hat{a} = [\begin{matrix} a \\ u \end{matrix}] = {(B^{T} B)}^{- 1} B^{T} y_{N}

Wherein, B and y _NComputing formula be:

B = (\begin{matrix} - \frac{1}{2} (X^{(1)} (1) + X^{(1)} (2)) \\ - \frac{1}{2} (X^{(1)} (2) + X^{(1)} (3)) \\ . & . \\ . & . \\ . & . \\ - \frac{1}{2} (X^{(1)} (n - 1) + X^{(1)} (n)) \end{matrix})

y _N＝[x ⁽⁰⁾(2)，x ⁽⁰⁾(3)，…，x ⁽⁰⁾(n)] ^T

First order differential albefaction equation separate for:

{\hat{x}}^{(1)} (k + 1) = (x^{(0)} (1) - \frac{u}{a}) e^{- ak} + \frac{u}{a}

This is separated and also is called as time response function, and

that then calculated by this time response function is called Model Calculation value or match value.Finally as a reduction

get is the model predictions.

In order to solve the not enough problem of sample size, need make full use of the information that the small sample data are provided.At present, the common method of solution small sample problem has Bayes's (Bayes) method and double sampling (Resampling) method.The Bayes method has obtained using widely owing to can merge prior imformations such as historical information, expertise in recent years, but its prior imformation source is different, and the selection of distribution form also has very big subjectivity, so usually controversial.The double sampling method is fully excavated the information of former sample data itself through duplicate sampling from former sample with the exptended sample capacity, has solved the problem that in reality, can't obtain great amount of samples to a certain extent.In the double sampling method, that receive most at present that people pay close attention to is bootstrap (Bootstrap).The Bootstrap method is in list of references [1] (Efron B.Bootstrap Methods:Another Look at the Jackknife [J] .The Annals of Statistics by U.S. statistician Efron; 1979,7 (1): a kind of nonparametric statistical method that systematically proposes first 1-26.).Since the eighties in 20th century, this method is widely used in fields such as medical science, military affairs, finance, economics having obtained continuous development aspect theoretical and the applied research, carries out interval estimation, test of hypothesis, parameter estimation, statistic check.The Bootstrap method is on the basis of original sample, generates the statistical nature that self-service sample is estimated certain statistic of unknown distribution through double sampling.The Bootstrap method need not made supposition to overall distribution, directly utilizes the information of original sample, thereby is a kind of method that quite has cogency.But, have research to show that also there is certain limitation in the utilization of Bootstrap method under the small sample situation.For example at list of references [2] (Duan Xiaojun; Wang Zhengming. the Bootstrap method [J] under the System in Small Sample Situation. the trajectory journal; 2003,15 (3)) point out in: under the large sample advantage of Bootstrap method clearly, but the deviation that the Bootstrap method is approached true distribution under the System in Small Sample Situation can not be ignored.The reason that produces this problem is that the Bootstrap method is that empirical distribution function according to original sample carries out double sampling in essence; Difference between sample empirical distribution function under the large sample situation and population distribution function is not obvious; But on statistical significance, can not ignore in the difference between them under the small sample situation; The self-service sample that the double sampling that carries out according to the empirical distribution function of small sample this moment produces is relatively concentrated, and its randomness can not satisfy statistical requirement.Produce the unreasonable part aspect the self-service sample to Bootstrap method double sampling under the small sample situation, and list of references [3] (Fan Lei, Lei FAN. is based on the no inefficacy small sample reliability assessment [J] of bootstrap. microcomputer information; 2008; First employing method for parameter estimation has been proposed, according to its distribution obeyed of primary data sample structure, for example Weibull distribution, normal distribution etc. 24 (33)); Replace empirical distribution function, and then self-service sample according to the distribution of being constructed; List of references [4] (Huang Wei; Feng Yunwen; Lv Zhenzhou etc. based on the System in Small Sample Situation test assessment method research [J] of Bootstrap method. machine science and technology; 2006; 25 (1)) studied the feasibility of using exponential distribution function, Boltzmann function and cubic polynomial function match correction sample empirical distribution function, and discussed, proposed System in Small Sample Situation test assessment method based on the Bootstrap method with the empirical distribution function in the sample empirical distribution function replacement traditional B ootstrap method of revising.Yet list of references [3] and list of references [4] produce aspect the self-service sample in irrational problem solving Bootstrap method double sampling under the small sample situation; All artificial hypothesis has been carried out in overall distribution; Lost the part prior imformation; Thereby brought systematic error for institute's established model, lost the Bootstrap method because of not supposing the stronger advantage of cogency brought to overall work.

Summary of the invention

The objective of the invention is low even can't set up the problem of available model for the expectation precision that solves the expectation model that the traditional software method for predicting reliability sets up under the small sample situation; Draw that gray system theory and generalized regression nerve networks are dealt with information separately and the basis of the advantage estimated on; In model, add test coverage information and adopt improved Bootstrap method that the small sample data are expanded; Reach under condition of small sample the effect of software reliability being carried out the modeling and the expectation of degree of precision, formed the software reliability method for predicting of considering test coverage under the small sample data based on the grey generalized regression nerve networks.

The present invention proposes a kind of improved Bootstrap method, at first adopt probability graph method to obtain the information of overall distribution.Probability graph method only need be judged the degree of quantile track deviation from linearity, can not cause information dropout to observation data.Confirm that with the least square method match probability distribution function of sample revises the sample empirical distribution function then;, under the prerequisite of overall distribution artificially not being supposed, solve Bootstrap method double sampling under the small sample situation and produce irrational problem aspect the self-service sample with this; Thereby construct at last the out-of-service time-unreliable write music line and test coverage-unreliable line of writing music is incorporated into test coverage and estimates in the model.

Small sample software reliability method for predicting based on the grey generalized regression nerve networks of the present invention comprises following step:

Step 1: collect test data;

Through the software reliability growth test, collect test data

t _iBe the out-of-service time, C _iBe test coverage, N _iBe the accumulative total failure number, i=1 ..., N, N are the number of the test data collected;

Step 2: confirm the distribution of out-of-service time data and test coverage data;

Regard out-of-service time in the test data and test coverage as the one-dimensional random sequence respectively; At first confirm the probability distribution type of each one-dimensional random sequence with probability graph method; After confirming the probability distribution type, use least square method to confirm that each one-dimensional random sequence obeys the distribution parameter of distribution separately;

Step 3: use and improve Bootstrap method expanding data;

The concrete probability distribution of being obeyed separately according to out-of-service time that step 2 obtained and test coverage; Random sampling produces has the emulation out-of-service time data and the emulation testing coverage data of identical statistical law with former data; Emulation out-of-service time data and emulation testing coverage data merge back acquisition expansion out-of-service time data and expand the test coverage data with original out-of-service time and test coverage data respectively; Expand the back out-of-service time data be

expand back test coverage data for

wherein M be sample size after expanding; M>=N;

then; Adopt that the method construct of empirical distribution function expands the out-of-service time-test coverage-unreliable line of writing music of unreliable write music line and expansion; At last; According to these two curves; Earlier unreliable degree multiply by total accumulative total failure number and be converted into corresponding accumulative total failure number; The three-dimensional curve of the out-of-service time that the accumulative total failure number data that obtain expanding structure expands, test coverage and accumulative total failure number, the data point set on the three-dimensional curve

is the test data of expansion;

Step 4: use Grey models GM (1,1) that data are handled;

Use Grey models GM (1; 1) the accumulative total failure number of the expansion of step 3 gained is handled, the gray model that obtains accumulative total failure number data estimates that sequence

obtains the expansion test data

of rule

Step 5: set up the generalized regression artificial neural network;

At first confirm the input and output of network, with out-of-service time t _iWith test coverage C _iAs the input vector of network, add up failure number N accordingly _iAs object vector, set up the generalized regression nerve networks of 2 input blocks, an output unit, rule of thumb confirm the initial parameter value of network;

Step 6: training generalized regression nerve networks;

The expansion test data of laws of use

is trained the neural network of being set up; Input vector is gathered for

object vector set accordingly and for

input vector and object vector is carried out the normalization processing; Fan-in network is trained it then; A cross validation method neural network training is stayed in employing, need carry out the training of M wheel altogether;

Step 7: utilize the neural network that trains to estimate;

The out-of-service time t at the some place that needs are estimated ₀With test coverage C ₀As the input vector fan-in network, network output valve N ₀Be the predicted value of corresponding accumulative total failure number.

The invention has the advantages that:

(1) the inventive method estimates that in software reliability adding test coverage information makes that the expectation model is more reasonable in the model;

(2) the inventive method need not set up the physical model of describing complex relationship between out-of-service time, test coverage and the accumulative total failure number, has avoided setting up the difficulty of physical model;

(3) the inventive method needn't be done to simplify hypothesis to the failure behaviour of system in advance, has avoided the introducing of systematic error;

(4) the inventive method need not found the solution complicated pluralism likelihood equation group;

(5) the inventive method has solved and under condition of small sample, has been difficult to set up effectively and the difficult problem of accurate software reliability expectation model;

(6) compare with common software reliability method for predicting, significantly improved the expectation precision.

Description of drawings

Fig. 1 is a process flow diagram of the present invention;

Fig. 2 is the comparison diagram of six kinds of distribution pattern probability graphs of out-of-service time data;

Fig. 3 is the comparison diagram of six kinds of distribution pattern probability graphs of piece test coverage data;

Fig. 4 is the comparison diagram of six kinds of distribution pattern probability graphs of branch testing coverage data;

Fig. 5 is the comparison diagram of six kinds of distribution pattern probability graphs of c-use coverage data;

Fig. 6 is the comparison diagram of six kinds of distribution pattern probability graphs of p-use coverage data;

Fig. 7 for expand the out-of-service time-the unreliable line of writing music;

Test coverage-unreliable the write music line of Fig. 8 for expanding;

Fig. 9 is out-of-service time, test coverage that expands and the three-dimensional curve that adds up failure number;

Figure 10 predicts the outcome for grey forecasting model;

Figure 11 is the expectation result of the embodiment of the invention.

Embodiment

To combine accompanying drawing and embodiment that the present invention is done further detailed description below.

No matter be based on the expectation model that statistical theory also is based on neural network, its main thought all is through the analysis to given data, finds the inherent relation of interdependence of data, thereby obtains the expectation ability to unknown data.But traditional statistics all is to be based under the abundant prerequisite of number of samples; The various method for predicting that proposed only when sample number is tending towards infinity its performance theoretic assurance is just arranged, so its accuracy has bigger relation with the capacity that is used for the sample data of modeling.For neural network model also is like this; When between the input of model and the output concern more complicated the time; If lack the sample data that is used to train; Neural network is not trained up so, makes neural network can not grasp the rule of data well probably, thereby can't carry out effective and accurate estimate.Yet in most actual conditions, number of samples all is limited usually, even seldom, so a lot of methods all are difficult to obtain desirable effect.

The software reliability method for predicting based on the grey generalized regression nerve networks that the present invention proposes is a kind of nonparametric technique, and this method considers that test coverage information makes that the expectation model is more reasonable; Use neural network to carry out modeling; Utilize artificial neural network can make network obtain the inherent law of data sequence and the characteristic of getting in touch and can rebuild continuous nonlinear mapping arbitrarily between each influence factor and the data sequence through this generalization procedure of study to data with existing; Thereby needn't make to simplify hypothesis in advance to the failure behaviour of system; Avoided the introducing of systematic error; Need not set up the physical model of describing complex relationship between out-of-service time, test coverage and the accumulative total failure number yet, need not find the solution complicated polynary likelihood equation group.In addition, in order to solve small sample problem, this method uses improved Bootstrap method emulation acquisition to have the expansion reliability data of identical inefficacy statistical law with the small sample reliability data from of the small sample data itself.Utilize the expansion reliability data that the required sample number of Grey models GM (1,1) is few, characteristics that need not consider its regularity of distribution and variation tendency are handled gained, strengthen the regularity of data.And then use has strengthened regular expansion reliability data training also can have better effects under unstable small sample situation generalized regression nerve networks; Obtain the inherent law of sample data through the self study of artificial neural network; Obtain estimating model, make that under the small sample situation, also can carry out high-precision software reliability estimates.

Suppose software is carried out the software reliability growth test, the test data of collecting does

Wherein each association is a tlv triple (t _i, C _i, N _i), i=1 ..., N, and t _iBe the out-of-service time, C _iBe test coverage, N _iBe the accumulative total failure number, N is the number of the test data collected.The problem that the inventive method will solve be exactly when data sample capacity N hour; On the basis of generalized regression nerve networks, utilize fail data and test coverage information to set up available software reliability expectation model through merging improved Bootstrap method and gray system theory, and reach higher expectation precision.

The present invention is a kind of small sample software reliability method for predicting based on the grey generalized regression nerve networks, and flow process is as shown in Figure 1, comprises following step:

Step 1: collect test data.

Through the software reliability growth test, collect test data

t _iBe the out-of-service time, C _iBe test coverage, N _iBe the accumulative total failure number, i=1 ..., N, N are the number of the test data collected.

Step 2: confirm the distribution of out-of-service time data and test coverage data.

Regard out-of-service time in the test data and test coverage as the one-dimensional random sequence respectively, this step will be confirmed the distribution that the one-dimensional random sequence of gained is obeyed separately exactly.The distribution of confirming random series comprises the distribution pattern of confirming this random series and distribution parameter two parts content of confirming this distribution on this basis.At first confirm the probability distribution type of each random series with probability graph method.Probability graph method only need be judged the degree of quantile track deviation from linearity, can not cause information dropout to test data.After confirming distribution pattern, use least square method to confirm that each sequence obeys the distribution parameter of distribution separately.

Step 3: use improved Bootstrap method expanding data.

Expand test data, the test data that needs to increase some emulation is in original test data.But for a software, common one of which corresponding test data that lost efficacy, and the value of accumulative total failure number can only be the positive integer value that disperses is so when total accumulative total failure number of this software seldom the time, the test data of collecting also must be seldom.Therefore, expand test data, at first will be converted into the accumulative total failure number of the value that can only disperse continuously the unreliable degree of value.Expand after the test data, convert unreliable degree into the accumulative total failure number again.The situation that this moment, decimal may appear getting in the accumulative total failure number; But the definition by unreliable degree can be known; The accumulative total failure number of this moment refers to the mean value of the accumulative total failure number of a plurality of softwares when testing simultaneously, also is rational so the accumulative total failure number of non integer value occurs getting.

Among the present invention, for the out-of-service time data, the problem of asking in this step is for to ask the change curve of unreliable degree with the out-of-service time according to the out-of-service time data.For this reason, the structure scheme is to make M identical software carry out software reliability test simultaneously, and the failure behaviour of each software all failure behaviour with the actual software of testing is identical.So; The method (being the definition of unreliable degree function) that rule of thumb distributes; The software number that takes place during to t constantly to lose efficacy is unreliable degree at this moment with the ratio of M, just can obtain unreliable degree thus with the change curve of out-of-service time, and it is defined as sample statistic.For the test coverage data, principle is the same.Concrete steps are following:

The concrete probability distribution of being obeyed separately according to the out-of-service time that is obtained in the step 2 and test coverage; Random sampling produces has the emulation out-of-service time data and the emulation testing coverage data of identical statistical law with former data; Merge back acquisition expansion out-of-service time data and expand the test coverage data with original out-of-service time and test coverage data; Expand the back out-of-service time data be

expand back test coverage data for

wherein M be sample size after expanding; M>=N;

then, adopt that the method construct of empirical distribution function expands the out-of-service time-test coverage-unreliable line of writing music of unreliable write music line and expansion.At last; According to these two curves; Earlier unreliable degree multiply by total accumulative total failure number and be converted into corresponding accumulative total failure number; The accumulative total failure number data that obtain expanding

can be constructed the three-dimensional curve of out-of-service time, test coverage and the accumulative total failure number of expansion, and the data point set on this curve is the test data of expansion.

Step 4: use Grey models GM (1,1) that data are handled.

Handle the advantage of small sample data in order to utilize gray system theory; And the speed of convergence and expectation precision of the regularity of enhancing data to improve neural network; Use Grey models GM (1; 1) the accumulative total failure number data of the expansion of step 3 gained are handled, the expectation sequence

that obtains accumulative total failure number data so just can obtain the expansion test data

of rule

Step 5: set up the generalized regression artificial neural network.

At first confirm the input and output of network.With out-of-service time t _iWith test coverage C _iAs the input vector of network, add up failure number N accordingly _iAs object vector; Set up the generalized regression nerve networks of 2 input blocks, 1 output unit; The initial parameter value of rule of thumb confirming network is (when training network; Will adjust the value of network parameter as required, the initial parameter value of general networking is got the acquiescence initial value and is got final product, and for example the initial value of the network parameter SPREAD of GRNN can be taken as 0.01).

Step 6: training generalized regression nerve networks.

The expansion test data of laws of use

is trained the neural network of being set up.At this moment; Input vector set for

corresponding object vector set for

in order to reject the singular value in the training sample; Accelerate the speed of convergence of network; Input vector and object vector are carried out the normalization processing, and fan-in network is trained it then.Generalized regression nerve networks has only a network parameter SPREAD (expansion rate); In order to make the network mapping principle of learning data better; A cross validation (Leave-One-Out Cross Validation) method training network is stayed in employing, need carry out the training of M wheel altogether.At first in the cross-training process that each is taken turns with the round-robin method according to the training in network convergence speed and error precision requirement; In the empirical solution space, constantly adjust the value of network parameter SPREAD; Up to the neural network that obtains optimal effectiveness, promptly this moment, network reached minimum in predicted value on the verification msg collection and the square error between the expectation value.Then, the network that in M neural network of this M wheel cross-training gained, is chosen at predicted value and the minimum of the square error between the expectation value on the verification msg collection is as resulting optimal network of this step.At last; Judge that the precision of prediction whether this optimal network reaches expection (for example requires; In the predicted value on the verification msg collection and the relative error between the expectation value less than 5%), if the accuracy requirement that reaches expection then with this optimal network as the neural network that trains; Otherwise the structure of needs adjustment neural network or the scope in parameter empirical solution space also rebulid and neural network training.

Step 7: utilize the neural network that trains to estimate.

Embodiment:

Two data sets collecting in the sensor management project in the inertial guidance system of publishing of subsidizing by NASA of embodiment.Each data item of each data centralization is estimated (being piece test coverage, branch testing coverage rate, c-use coverage rate and p-use coverage rate) by implementation of test cases number, accumulative total failure number and four kinds of coverage rates and is formed.Wherein data set 1 is as shown in table 1:

Table 1 test data statistical form

For the precision of research model, preceding 13 data of this data set are set up model as the known sample data, last data is as the expected sample data of the expectation extrapolability of investigating model.

1, collects test data.

With preceding 13 data in the table 1 as the test data collected in the test.

2, confirm the distribution of out-of-service time and test coverage data.

Regard out-of-service time in the test data and test coverage data as the one-dimensional random sequence respectively, at first confirm the distribution pattern of each random series with probability graph method.The probability graph of six kinds of possible distribution patterns of each data sequence such as Fig. 2-shown in Figure 6, described six kinds of possible distribution patterns are exponential distribution, The extreme value distribution, lognormal distribution, normal distribution, rayleigh distributed, Weibull distribution.In Fig. 2, be followed successively by exponential distribution, The extreme value distribution, lognormal distribution, normal distribution, the rayleigh distributed of out-of-service time sequence, the probability graph of Weibull distribution from subgraph (a) to subgraph (f).(when the accurate out-of-service time is not easy to confirm, in this area, generally can use the test case number of execution to substitute, in embodiment, use the test case number sequence of carrying out to be listed as to substitute the out-of-service time sequence.) can know that by Fig. 2 the degree of the quantile track deviation from linearity of out-of-service time sequence in the lognormal distribution probability graph is minimum, so by probability graph method, what can think out-of-service time sequence obedience is lognormal distribution.In like manner, four kinds of test coverage sequences make to use the same method confirms its distribution pattern by these four figure of Fig. 3～Fig. 6 respectively, and what can obtain four kinds of test coverage sequences obediences all is The extreme value distribution.After confirming these two kinds of distribution patterns; Use least square method to confirm that each sequence obeys the distribution parameter of distribution; (μ is a location parameter to obtain distribution parameter μ and the σ of out-of-service time; The being estimated as σ scale parameter):

wherein μ and σ is respectively the location parameter and the scale parameter of lognormal distribution, and promptly its probability density function is:

f (x) = \frac{1}{1.8628 \sqrt{2 π} x} e^{\frac{{- (\ln x - 3.2233)}^{2}}{2 \times {1.8628}^{2}}}

The distribution parameter μ of piece test coverage, branch testing coverage rate, c-use coverage rate and p-use coverage rate _I1And σ _I1Estimated value be respectively:

μ wherein _I1And σ _I1Be respectively the location parameter and the scale parameter of each The extreme value distribution.The probability density function that is piece test coverage, branch testing coverage rate, c-use coverage rate and p-use coverage rate is respectively:

f_{Block} (x) = \frac{1}{0.0926} \exp (\frac{x - 0.8097}{0.0926}) \exp (- \exp (\frac{x - 0.8097}{0.0926}))

f_{Branch} (x) = \frac{1}{0.1043} \exp (\frac{x - 0.7532}{0.1043}) \exp (- \exp (\frac{x - 0.7532}{0.1043}))

f_{c - use} (x) = \frac{1}{0.0563} \exp (\frac{x - 0.7098}{0.0563}) \exp (- \exp (\frac{x - 0.7098}{0.0563}))

f_{p - use} (x) = \frac{1}{0.0884} \exp (\frac{x - 0.5402}{0.0884}) \exp (- \exp (\frac{x - 0.5402}{0.0884}))

3, use improved Bootstrap method expanding data.

Carrying out random sampling respectively according to the concrete probability density function of out-of-service time that is obtained and test coverage produces and has the emulation out-of-service time data and the test coverage data of identical statistical law with former data; Merge the back with raw data and obtain the out-of-service time data and the test coverage data of expansion that expand: wherein M (M according to circumstances sets for sample size after expanding; Usually greater than 20; And satisfy M>=N);

each

comprises four kinds of test coverages to be estimated, and promptly

is taken as 50 with M here.

Then, adopt the method for empirical distribution function construct respectively expansion the out-of-service time-unreliable line and the test coverage-unreliable line of writing music of writing music.Empirical distribution function is:

F (t) = \frac{n (t)}{M}

Wherein, F (t) is an empirical distribution function (in this embodiment, also can be regarded as unreliable degree function), the software number of n (t) for losing efficacy constantly to t.The out-of-service time-the unreliable line of writing music is as shown in Figure 7, and test coverage-unreliable line of writing music is as shown in Figure 8.For the two-dimentional relation curve of draw test coverage and unreliable degree, the mean value of getting four kinds of test coverages is as the value on the test coverage dimension in Fig. 8, and the value on the corresponding unreliable degree dimension is constant.

At last, according to expand the out-of-service time-test coverage-unreliable line of writing music of unreliable write music line and expansion, can construct the out-of-service time, test coverage of expansion and the three-dimensional curve of failure number totally.The mean value of getting four kinds of test coverages as shown in Figure 9 is as the value on the test coverage dimension.

4, use Grey models GM (1,1) that data are handled.

Use Grey models GM (1; 1) data of the accumulative total failure number of the expansion of gained are handled, obtain the predicted value sequence of the gray model of accumulative total fail data:

is shown in figure 10.Can see gray model

The original formed curve of expansion accumulative total failure number sequence of the formed curve ratio of predicted value sequence will smoothly demonstrate

Clear regularity property, this helps the study and the convergence of neural network model, thereby improves precision of prediction.5, set up the generalized regression artificial neural network.

With out-of-service time t _iWith test coverage C _i=(c _I1, c _I2, c _I3, c _I4) as the input vector of network, add up failure number N accordingly _iAs object vector,,, rule of thumb confirm the initial parameter value (being taken as 0.01 here) of network so what set up is the generalized regression nerve networks of 5 input blocks, 1 output unit because test coverage comprises four kinds of coverage rates and estimates among the embodiment.

6, training generalized regression nerve networks.

With input vector C _iWith object vector N _iCarry out normalization and handle, fan-in network is trained it then.Adopt the method training network of cross validation; And with the round-robin method according to the training in network convergence speed and error precision requirement; In the empirical solution space constantly the value of adjustment network parameter SPREAD (among the embodiment in empirical solution space (0; 10) value of adjustment SPREAD in), up to the neural network that obtains optimal effectiveness, promptly this moment, network reached minimum in predicted value on the verification msg collection and the square error between the expectation value.The value that obtains the expansion rate SPREAD of network here, is 0.1210.

7, utilize the neural network that trains to estimate.

The out-of-service time t at the some place that needs are estimated ₀With test coverage C ₀, promptly the out-of-service time of last data item of data centralization and test coverage be as the input vector fan-in network, network output valve N ₀Be the predicted value of corresponding accumulative total failure number.Total expectation result is shown in figure 11.

Fetch data and concentrate the true accumulative total failure number of preceding 13 data be used to set up model and the square error MSE between the model predicted value accordingly as the index of evaluation model the capability of fitting of given data:

MSE = \frac{1}{n} Σ_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2};

Wherein, n is the number of known sample data, y _iBe the actual value of data,

Match value for model.Get the index of the true accumulative total failure number of last data and the average relative error E ability that extrapolation is estimated as evaluation model between the corresponding model predicted value,

E = \frac{1}{m} Σ_{i = 1}^{m} | \frac{y_{i} - {\hat{y}}_{i}}{y_{i}} |

Wherein, m is the number of expected sample data, y _iBe the actual value of data, y _iPredicted value for model.

The inventive method is expanding data or do not add test coverage information or do not combine the method for gray theory not with using GRNN, simultaneously the expanding data or do not add test coverage or do not combine the expectation effect of method under this data set of gray theory as shown in table 2 not with common use BPNN:

Table 2 expectation result's error ratio

Each item of table 2 can be found out; What the present invention proposed has successfully utilized the advantage of GRNN based on the small sample software reliability method for predicting of grey generalized regression nerve networks, fully gathered test coverage information, avoided problem that small sample brings aspect the match of sample data and expectation, all obtaining good effect:

(1) the match square error MSE to the known sample data is 0.0627; The average relative error E that estimates of extrapolating is 0 (because the accumulative total failure number of last two data of this data set all is 9; Rule is simpler; Neural network can be grasped its rule with comparalive ease, thereby can obtain higher expectation precision.), keeping high-precision extrapolation to estimate that other several methods of match MSER hang down one to two one magnitude on the basis of ability.

(2) the match square error MSE that calculates behind the improved Bootstrap method expanding data that uses the present invention to propose is 0.0627; And do not use its match square error of improvement Bootstrap method expanding data MSE is 0.1169, and fitting effect has promoted 46.36%.

(3) method proposed by the invention is owing to considered that its match square error of test coverage information MSE is merely 0.0627, does not consider that the match square error MSE of test coverage information calculations then increases to 0.3473.

(4) the BP neural network is all bigger to the match square error MSE of this data set; And the average relative error E that extrapolation is estimated is more than 10%; Further illustrate the BP neural network and when handling the small sample data, itself have defective, the availability of the model that obtains with the BP neural network is all very undesirable.

Claims

1. based on the small sample software reliability method for predicting of grey generalized regression nerve networks, it is characterized in that, comprise following step:

Step 1: collect test data;

Through the software reliability growth test, collect test data t _iBe the out-of-service time, C _iBe test coverage, N _iBe the accumulative total failure number, i=1 ..., N, N are the number of the test data collected;

Step 3: use improved Bootstrap method expanding data;

expand back test coverage data for wherein M be sample size after expanding; M>=N;

then; Adopt that the method construct of empirical distribution function expands the out-of-service time-test coverage-unreliable line of writing music of unreliable write music line and expansion; At last; According to these two curves; Earlier unreliable degree multiply by total accumulative total failure number and be converted into corresponding accumulative total failure number; The three-dimensional curve of the out-of-service time that the accumulative total failure number data that obtain expanding

structure expands, test coverage and accumulative total failure number, the data point set on the three-dimensional curve

is the test data of expansion;

Step 4: use Grey models GM (1,1) that data are handled;

Use Grey models GM (1; 1) the accumulative total failure number data of the expansion of step 3 gained are handled, the gray model that obtains accumulative total failure number data estimates that sequence

obtains the expansion test data

of rule

Step 5: set up the generalized regression artificial neural network;

At first confirm the input and output of network, with out-of-service time t _iWith test coverage C _iAs the input vector of network, add up failure number N accordingly _iAs object vector, set up the generalized regression nerve networks of 2 input blocks, 1 output unit, confirm that the network initial parameter value is the acquiescence initial value;

Step 6: training generalized regression nerve networks;

The expansion test data of laws of use

is trained the neural network of being set up; Input vector is gathered for

object vector set accordingly and for

Step 7: utilize the neural network that trains to estimate;

2. the small sample software reliability method for predicting based on the grey generalized regression nerve networks according to claim 1 is characterized in that, in the described step 6, the initial parameter value of neural network is the acquiescence initial value.

3. the small sample software reliability method for predicting based on the grey generalized regression nerve networks according to claim 1; It is characterized in that; Stay a cross validation method neural network training to be specially in the described step 6; At first in the cross-training process that each is taken turns with the round-robin method according to the training in network convergence speed and error precision requirement; In the empirical solution space, constantly adjust the value of network parameter expansion rate, up to the neural network that obtains optimal effectiveness, promptly this moment, network approached 0 in predicted value on the verification msg collection and the square error between the expectation value; Then, the network that is chosen at predicted value and the minimum of the square error between the expectation value on the verification msg collection in M the neural network of M wheel cross-training gained is as resulting optimal network; At last, judge whether this optimal network reaches the precision of prediction requirement of expection, if the accuracy requirement that reaches expection then with this optimal network as the neural network that trains; Otherwise, need to adjust the scope in empirical solution space and rebulid and neural network training.