CN102289430B

CN102289430B - Method for analyzing latent semantics of fusion probability of multi-modality data

Info

Publication number: CN102289430B
Application number: CN2011101800250A
Authority: CN
Inventors: 苗振江; 钟岑岑
Original assignee: Beijing Jiaotong University
Current assignee: Beijing Jiaotong University
Priority date: 2011-06-29
Filing date: 2011-06-29
Publication date: 2013-11-13
Anticipated expiration: 2031-06-29
Also published as: CN102289430A

Abstract

The invention discloses a method for analyzing latent semantics of the fusion probability of multi-modality data in the technical field of latent semantics analysis of probability. By starting from the essence of the multi-modality data and introducing a crossed structure, a standard probability latent semantic analysis model which is only applicable to single-modality is expanded to multi-modality, and difference between contributes of each modality to a latent semantic space and relevance of contents of each modality are modeled, so more accurate analysis and description are provided for the multi-modality data. The method has the advantages that: more accurate parameter estimation is realized by global parameter updating; reference is provided for selection of a proper subject number value range for each modality; and the working load of manual selection is reduced.

Description

The fusion probability latent semantic analysis method of multi-modal data

Technical field

The invention belongs to probability latent semantic analysis technical field, relate in particular to a kind of fusion probability latent semantic analysis method of multi-modal data.

Background technology

The probability latent semantic analysis is the production model of a kind of observed value (co-occurrence of vocabulary-document), and by the introducing of latent semantic space, the mixing that utilizes multinomial distribution and condition to distribute carrys out the probability of modeling co-occurrence.At present, it has been widely used in the research fields such as information retrieval, natural language processing, audio frequency and video processing.

Usually, probability latent semantic analysis model is only applicable to the single mode data, for more complicated multi-modal data, has certain limitation.Multi-modal data comprise a plurality of mode that are mutually related, and they attempt to describe same content, but there are differences in the contribution that the data contents table is reached.In this case, for the co-occurrence matrix that each mode obtains, existing probability latent semantic analysis method can adopt following two kinds of disposal routes: the one, with the co-occurrence matrix simultaneous of identical weight with different modalities, then carry out the semantic analysis of standard; The 2nd, adopt asymmetrical mode only the latent semantic space of a mode to be estimated, then produced the observed value of all mode by this space.

Above two kinds of methods or ignored the essential difference of different modalities, or only unilateral description the relevance of data itself, all can not give the essence of multi-modal data, namely each mode is to the otherness of latent semantic space contribution and the relevance of content between them, to describe fully.Therefore, this just needs a kind of latent semantic analysis of probability for multi-modal data model.

Summary of the invention

The deficiencies such as essential difference of having ignored different modalities for the existing method of mentioning in the above-mentioned background technology, the present invention proposes a kind of fusion probability latent semantic analysis method of multi-modal data.

Technical scheme of the present invention is that the fusion probability latent semantic analysis method of multi-modal data is characterized in that the method comprises the following steps:

Step 1: set up the standard probability latent semantic analysis model of each mode, set up on this basis Fusion Model;

Step 2: determine the work space of Fusion Model, and selected number of topics;

Step 3: Fusion Model is decomposed into asymmetrical probability latent semantic analysis model,, according to input value and the selected number of topics of Fusion Model, calculates the initial parameter value of asymmetrical probability latent semantic analysis model;

Step 4: by the greatest hope algorithm, initial parameter value is upgraded, obtained final argument;

Step 5: utilize final argument to analyze mode to be detected.

The computing formula of described greatest hope algorithm is:

L = Π_{i = 1}^{M} (Π_{p = 1}^{N_{A}} p {(w_{p}^{A}, d_{i})}^{n (w_{p}^{A}, d_{i})} Π_{q = 1}^{N_{V}} p {(w_{q}^{V}, d_{i})}^{n (w_{q}^{V}, d_{i})})

Wherein:

L is the likelihood function value;

Co-occurrence probabilities for mode A;

Co-occurrence probabilities for mode V;

For known observed value;

For known observed value;

Observed value co-occurrence matrix for mode A;

Observed value co-occurrence matrix for mode V;

N _AVocabulary number for mode A;

N _VVocabulary number for mode V;

P vocabulary for mode A;

Q vocabulary for mode V;

d _iBe i document;

M is the number of document in document sets.

Compared with the conventional method, the present invention has the following advantages:

The present invention, by each mode is carried out modeling, has embodied the contribution difference of different modalities to latent semantic space, and the relevance between mode has been described again in the introducing of decussate texture simultaneously, finally by the parameter of the overall situation, more newly arrives and realizes more accurate parameter estimation.Therefore, this Fusion Model is from multi-modal data essence, for it provides more rationally, analysis result accurately.In addition,, for the estimation of this model work space, also, for the selected suitable number of topics span of each mode provides foundation, reduced the workload of manually choosing.

Description of drawings

Fig. 1 is the structural representation of fusion probability latent semantic analysis model of the present invention;

Fig. 2 is the structural representation of fusion probability latent semantic analysis model of the present invention embodiment under two mode;

Figure a is the process by the probability latent semantic analysis model construction Fusion Model modeling of two standards; The process of figure b for by number of topics and the maximal value of standard probability latent semantic analysis model, determining the work space of Fusion Model; Figure c is for to take from Fusion Model the process that obtains the model final argument apart; Figure d is the application process of new mode.

Embodiment

Below in conjunction with accompanying drawing, preferred embodiment is elaborated.Should be emphasized that, following explanation is only exemplary, rather than in order to limit the scope of the invention and to apply.

Technical matters to be solved by this invention is to provide a kind of latent semantic analysis of probability for multi-modal data method, can represent simultaneously that each mode in multi-modal data is to the otherness of semantic space contribution and the relevance of content between them, thereby it is multi-modal that the probability latent semantic analysis is rationally expanded to from the single mode data, for it provides more accurately, describes.

The present invention includes following steps:

Step 5: utilize final argument to analyze mode to be detected.

The present invention is further detailed explanation below in conjunction with the drawings and specific embodiments.

With reference to Fig. 1, structural representation of the present invention is shown, wherein: K is the sum of mode; D is the document sets in observed value; M represents the number of document in document sets; w ^kWord finder for k mode in observed value; N _kIt is the number of vocabulary in the word finder of k mode; z ^kTheme space for k mode in observed value; L _F-kFor estimated the number of topics in this space that obtains by work space.

As shown in Figure 2, two modal datas of this instructions during take K=2 carry out module declaration as example,, for the multi-modal data of K＞2 o'clock, can carry out corresponding modeling and calculation of parameter according to same principle.

Detailed process of the present invention is:

The modeling of Fusion Model:

As shown in Figure 2 a, be respectively mode A and mode V and build the probability latent semantic analysis model of standard, can be expressed as: d → z ^A→ w ^AAnd d → z ^V→ w ^VOn this basis, then the intersection of setting up theme and vocabulary between mode A and mode V generate relation, i.e. z ^A→ w ^VAnd z ^V→ w ^A, realize the modeling of Fusion Model.

Work space is estimated:

According to number of topics (L selected in the standard probability latent semantic analysis model of mode A and mode V _A, L _V) and maximal value

Determine the work space of Fusion Model, i.e. each mode number of topics (L in multi-modal data _F-A, L _F-V) span, they should meet simultaneously

With

Parameter estimation after number of topics in selected this scope carries out again, as shown in Figure 2 b.

Model parameter estimation:

As shown in Figure 2 c, at first, this Fusion Model is taken apart, regard two asymmetrical probability latent semantic analysis models as, be expressed as respectively: d → z ^A→ w ^A+ w ^VAnd d → z ^V→ w ^A+ w ^V, and based on known observed value co-occurrence matrix With selected number of topics (L _F-AAnd L _F-V), the initial parameter value of estimation fusion model, comprise theme conditional probability (P (z ^A| d), P (z ^V| d)) and vocabulary conditional probability (P (w ^A| z ^A), P (w ^V| z ^A), P (w ^V| z ^V), P (w ^A| z ^V)), then, based on the greatest hope algorithm, from overall angle, above initial value is upgraded, obtain the model final argument.Wherein, based on the global parameter of greatest hope algorithm, upgrade and mainly comprise expectation value calculating and parameter revaluation two parts, specific as follows:

Greatest hope algorithm (based on maximum likelihood function) is as shown in formula (1), by iterating to try to achieve model parameter:

L = Π_{i = 1}^{M} (Π_{p = 1}^{N_{A}} p {(w_{p}^{A}, d_{i})}^{n (w_{p}^{A}, d_{i})} Π_{q = 1}^{N_{V}} p {(w_{q}^{V}, d_{i})}^{n (w_{q}^{V}, d_{i})}) - - - (1)

Wherein:

L is the likelihood function value;

p (w_{p}^{A}, d_{i}) = p (d_{i}) (Σ_{m = 1}^{L_{F - A}} p (w_{p}^{A} | z_{m}^{A}) p (z_{m}^{A} | d_{i}) + Σ_{n = 1}^{L_{F - V}} p (w_{p}^{A} | z_{n}^{V}) p (z_{n}^{V} | d_{i})),

For the co-occurrence probabilities of mode A, wherein,

p (d_{i}) = \frac{Σ_{p = 1}^{N^{A}} n (w_{p}^{A}, d_{i}) + Σ_{q = 1}^{N^{V}} n (w_{q}^{V}, d_{i})}{Σ_{i = 1}^{M} (Σ_{p = 1}^{N^{A}} n (w_{p}^{A}, d_{i}) + Σ_{q = 1}^{N^{V}} n (w_{q}^{V}, d_{i}))},

For document d _iThe probability that occurs, p (d _i) be fixed value, do not need to upgrade in iterative process;

p (w_{q}^{V}, d_{i}) = p (d_{i}) (Σ_{m = 1}^{L_{F - A}} p (w_{q}^{V} | z_{m}^{A}) p (z_{m}^{A} | d_{i}) + Σ_{n = 1}^{L_{F - V}} p (w_{q}^{V} | z_{n}^{V}) p (z_{n}^{V} | d_{i})),

Co-occurrence probabilities for mode V;

For known observed value;

For known observed value;

Observed value co-occurrence matrix for mode A;

Observed value co-occurrence matrix for mode V;

N _AVocabulary number for mode A;

N _VVocabulary number for mode V;

P vocabulary for mode A;

Q vocabulary for mode V;

d _iBe i document;

M is the number of document in document sets.

1. expectation value is calculated:, according to initial parameter value, calculate four expectation values

Obtain the computing formula of the expectation value of this mode or another mode theme while representing respectively known certain mode observed value:

E (φ_{pm}^{A}) = \frac{p (w_{p}^{A} | z_{m}^{A}) p (z_{m}^{A} | d_{i}) p (d_{i})}{p (w_{p}^{A}, d_{i})} - - - (2)

E (φ_{pn}^{A}) = \frac{p (w_{p}^{A} | z_{n}^{A}) p (z_{n}^{A} | d_{i}) p (d_{i})}{p (w_{p}^{A}, d_{i})} - - - (3)

E (φ_{qm}^{V}) = \frac{p (w_{q}^{V} | z_{m}^{A}) p (z_{m}^{A} | d_{i}) p (d_{i})}{p (w_{q}^{V}, d_{i})} - - - (4)

E (φ_{qn}^{V}) = \frac{p (w_{q}^{V} | z_{n}^{V}) p (z_{n}^{V} | d_{i}) p (d_{i})}{p (w_{q}^{V}, d_{i})} - - - (5)

In formula (2):

For known observed value is The time, theme

Expectation;

For observed value is

The time, theme

Probability;

The conditional probability of p the vocabulary of the mode A that obtains for m theme by mode A;

M theme for mode A;

Conditional probability for m the theme of the mode A that obtained by i document;

p(d _i) be the probability that i document occurs;

Co-occurrence probabilities for mode A.

Formula (3) arrives formula (5) by that analogy, that is:

Represent that known observed value is

The time, theme Expectation;

Represent that known observed value is

The time, theme

Expectation;

Represent that known observed value is

The time, theme

Expectation.

2. parameter revaluation:, according to (2)-(5) formula, calculate the probable value theme after upgrading

And theme

The computing formula of conditional probability be respectively:

p (z_{m}^{A} | d_{i}) = \frac{ξ_{mi}^{A} + ξ_{mi}^{V}}{Σ_{m = 1}^{L_{F - A}} (ξ_{mi}^{A} + ξ_{mi}^{V}) + Σ_{n = 1}^{L_{F - V}} (ξ_{ni}^{A} + ξ_{ni}^{V})} - - - (6)

p (z_{n}^{V} | d_{i}) = \frac{ξ_{ni}^{A} + ξ_{ni}^{V}}{Σ_{m = 1}^{L_{F - A}} (ξ_{mi}^{A} + ξ_{mi}^{V}) + Σ_{n = 1}^{L_{F - V}} (ξ_{ni}^{A} + ξ_{ni}^{V})} - - - (7)

Wherein:

For intermediate variable,

ξ_{mi}^{A} = Σ_{p = 1}^{N_{A}} n (w_{p}^{A}, d_{i}) E (φ_{pm}^{A});

For intermediate variable,

ξ_{mi}^{V} = Σ_{q = 1}^{N_{V}} n (w_{q}^{V}, d_{i}) E (φ_{qm}^{V});

For intermediate variable,

ξ_{ni}^{A} = Σ_{p = 1}^{N_{A}} n (w_{p}^{A}, d_{i}) E (φ_{pn}^{A});

For intermediate variable,

ξ_{ni}^{V} = Σ_{q = 1}^{N_{V}} n (w_{q}^{V}, d_{i}) E (φ_{qn}^{V});

L _F-AFor the number of topics in the theme space of mode A;

L _F-VFor the number of topics in the theme space of mode V.

Formula (7) by that analogy.

The computing formula that obtains the conditional probability of this mode or another mode vocabulary during known certain mode theme is:

p (w_{p}^{A} | z_{m}^{A}) = \frac{Σ_{i = 1}^{M} n (w_{p}^{A}, d_{i}) E (φ_{pm}^{A})}{Σ_{i = 1}^{M} (ξ_{mi}^{A} + ξ_{mi}^{V})} - - - (8)

p (w_{q}^{V} | z_{m}^{A}) = \frac{Σ_{i = 1}^{M} n (w_{q}^{V}, d_{i}) E (φ_{qm}^{V})}{Σ_{i = 1}^{M} (ξ_{mi}^{A} + ξ_{mi}^{V})} - - - (9)

p (w_{p}^{A} | z_{n}^{V}) = \frac{Σ_{i = 1}^{M} n (w_{p}^{A}, d_{i}) E (φ_{pn}^{A})}{Σ_{i = 1}^{M} (ξ_{ni}^{A} + ξ_{ni}^{V})} - - - (10)

p (w_{q}^{V} | z_{n}^{V}) = \frac{Σ_{i = 1}^{M} n (w_{q}^{V}, d_{i}) E (φ_{qn}^{A})}{Σ_{i = 1}^{M} (ξ_{ni}^{A} + ξ_{ni}^{V})} - - - (11)

Formula (9) arrives formula (11) by that analogy, that is:

Represent known theme

The time, vocabulary

Conditional probability;

Represent known theme The time, vocabulary

Conditional probability;

Represent known theme The time, vocabulary Conditional probability.

Alternately repeat expectation value calculation procedure and parameter revaluation step,, until formula (1) reaches convergence, just obtain final model parameter.

The model measurement calculation of parameter:

As shown in Figure 2 d, for new multi-modal data d ^New, the known conditions of this moment comprises the co-occurrence matrix of this each mode of data

With the vocabulary conditional probability (P (w that is obtained by module 3 ^A| z ^A), P (w ^V| z ^A), P (w ^V| z ^V), P (w ^A| z ^V)).Utilize equally the greatest hope algorithm to calculate, just keep the vocabulary conditional probability constant in this process, only iteration is carried out in formula (2)-(5) and (6)-(7),, until formula (1) reaches convergence, at this moment just obtained the theme conditional probability (P (z of new data ^A| d ^New), P (z ^V| d ^New)).

Fusion for multi-modal data probability latent semantic model of the present invention, by the theme space (z to each mode ^A, z ^V) modeling separately, reflect the contribution difference of different modalities to semantic space; Simultaneously, the vocabulary conditional probability (P (w of its decussate texture introducing ^V| z ^A), P (w ^A| z ^V)), the relevance between each mode has also been described respectively.Therefore, this Fusion Model is that the own characteristic for multi-modal data carries out modeling, thereby for it provides more rationally, data results accurately.In addition, the desirable theme number that is estimated as each mode of work space defines scope, has so both avoided the blindly inaccuracy of value, has reduced again by repeatedly enumerating to choose the calculated amount of optimal value, thereby has improved work efficiency.

The above; only for the better embodiment of the present invention, but protection scope of the present invention is not limited to this, anyly is familiar with those skilled in the art in the technical scope that the present invention discloses; the variation that can expect easily or replacement, within all should being encompassed in protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claim.

Claims

1. the fusion probability latent semantic analysis method of multi-modal data is characterized in that the method comprises the following steps:

Step 1: set up the standard probability latent semantic analysis model of each mode, set up on this basis Fusion Model, namely be respectively mode A and mode V and build the probability latent semantic analysis model of standard, can be expressed as: d → z ^A→ w ^AAnd d → z ^V→ w ^VOn this basis, then the intersection of setting up theme and vocabulary between mode A and mode V generate relation, i.e. z ^A→ w ^VAnd z ^V→ w ^A, realize the modeling of Fusion Model;

The computing formula of described greatest hope algorithm is: