[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN112802445B - Cross-audiovisual information conversion method based on semantic reservation - Google Patents

Cross-audiovisual information conversion method based on semantic reservation Download PDF

Info

Publication number
CN112802445B
CN112802445B CN202110140393.6A CN202110140393A CN112802445B CN 112802445 B CN112802445 B CN 112802445B CN 202110140393 A CN202110140393 A CN 202110140393A CN 112802445 B CN112802445 B CN 112802445B
Authority
CN
China
Prior art keywords
image
cross
sound
network
modal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110140393.6A
Other languages
Chinese (zh)
Other versions
CN112802445A (en
Inventor
袁媛
宁海龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202110140393.6A priority Critical patent/CN112802445B/en
Publication of CN112802445A publication Critical patent/CN112802445A/en
Application granted granted Critical
Publication of CN112802445B publication Critical patent/CN112802445B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/027Concept to speech synthesisers; Generation of natural phrases from machine-based concepts
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a cross-audio-visual information conversion method based on semantic preservation, which regards information conversion between audios and audios as a low-dimensional space expression similarity learning problem, realizes cross-modal conversion of features in the low-dimensional space by extracting semantic features of images, and finally maps the low-dimensional cross-modal features into sound waveforms based on human language. The invention solves the problem of limitation of the existing visual-to-auditory cross-mode information conversion method to accurately generate the voice waveform based on human language in the unconstrained environment. And aiming at the unconstrained environment, generating a sound waveform based on human language, so that the sound waveform is more in line with the actual situation.

Description

Cross-audiovisual information conversion method based on semantic reservation
Technical Field
The invention belongs to the technical field of machine learning, and particularly relates to a cross-audiovisual information conversion method.
Background
The visual to auditory cross-modal information conversion is beneficial to the people with visual disorder to better perceive the information of the surrounding world, so that the method has strong practicability for the people. However, due to heterogeneous semantic gaps commonly existing among audio-visual modalities and complex data structures in the audio-visual modalities, it is very difficult to realize effective cross-audio-visual information conversion. Currently, relatively few studies are conducted on the conversion of visual to auditory cross-modal information, and the flow of these works is generally as follows: firstly, extracting semantic features of visual data, then predicting spectrograms of auditory data, and finally realizing generation of sound waveforms. These studies typically generate musical instrument sounds, percussive sounds, or ambient background sounds for a particular environment;
1) Cross-Modal information based on instrument sound generation translates to a method of generating a network based on conditional challenge as proposed by Chen et al in the literature "L.Chen, S.Srivastava, Z.Duan, and C.xu, deep Cross-Modal Audio-Visual Generation, in the Thematic Workshops of ACM Multimedia,2017, pp.349-357". The method comprises the steps of generating a sound spectrogram through input instrument images, encoding the sound spectrogram to generate the instrument images, judging the generated instrument images, optimizing the generated sound spectrogram, and finally obtaining a corresponding instrument sound waveform based on the optimized sound spectrogram.
2) Cross-modal information based on tap sound generation translates to a target tap sound generation method in video based on a recurrent neural network as proposed by Owens et al in documents "A.Owens, P.Isola, J.McDermott, A.Torralba, E.Adelson, and w.freeman, visually Indicated Sounds, in the IEEE Conference on Computer Vision and Pattern Recognition,2016, pp.2405-2413. The method uses a recurrent neural network to predict sound features in a video and then uses an instance-based synthesis method to generate corresponding sound waveforms from the sound features.
3) Cross-modal information based on ambient background Sound generation translates to a video background Sound generation method based on a codec structure as proposed by Zhou et al in literature "Y.Zhou, Z.Wang, C.Fang, T.Bui, and T.berg, visual to Sound: generating Natural Sound for Videos in The Wild, in the IEEE Conference on Computer Vision and Pattern Recognition,2018, pp.3550-3558. The method comprises the steps of firstly encoding video features through a designed frame-to-frame, sequence-to-sequence and optical flow-based method, and then decoding the video features through a sampleRNN to generate sound waveforms.
The above methods have respective limitations, such as that cross-modal information conversion based on the generation of the knocking sounds can only generate regular knocking sounds, and it is difficult to accurately generate sound waveforms based on human language in an unconstrained environment, so that the practicability is not strong.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a cross-audio-visual information conversion method based on semantic preservation, which regards the information conversion between audios and videos as a low-dimensional space expression similarity learning problem, realizes the cross-modal conversion of the features in the low-dimensional space by extracting the semantic features of the images, and finally maps the low-dimensional cross-modal features into the voice waveforms based on human language. The invention solves the problem of limitation of the existing visual-to-auditory cross-mode information conversion method to accurately generate the voice waveform based on human language in the unconstrained environment. And aiming at the unconstrained environment, generating a sound waveform based on human language, so that the sound waveform is more in line with the actual situation.
The technical scheme adopted by the invention for solving the technical problems comprises the following steps:
step 1: the image-audio description data set has N pairs of image-audio pairs; taking the image-audio description data set as a training set, normalizing each image in the training set according to the formula (1) to obtain a normalized image
Figure BDA0002928335230000021
Figure BDA0002928335230000022
Wherein X is tr For images in the training set, mu and theta are the mean value and standard deviation of all the images in the training set respectively, and K is the number of pixels of a single image in the training set;
step 2: for normalized images
Figure BDA0002928335230000023
Learning image semantic features Γ through image coding networks i v I is the image-audio pair number, i=1, 2,. -%, N;
the objective function is:
Figure BDA0002928335230000024
wherein l i And
Figure BDA0002928335230000025
the method comprises the steps of respectively inputting real semantic tags of an image and semantic tags predicted by an image coding network;
when the target function formula (2) is minimum, the image coding network completes training; the output of the image coding network is the image semantic feature Γ i v
Step 3, learning sound semantic features Γ for sound of the image-audio description data set through the sound coding network i s
The sound coding network consists of 6 sequentially connected full-connection layers, network training is completed through error feedback between input and reconstructed sound, and finally the output of the 3 rd full-connection layer is a sound semantic feature;
the objective function is:
Figure BDA0002928335230000026
wherein,,
Figure BDA0002928335230000031
sound describing the dataset for image-audio, +.>
Figure BDA0002928335230000032
Sound reconstructed for the sound coding network;
when the target function formula (3) is minimum, the voice coding network finishes training; the output of the voice coding network is voice semantic feature Γ i s
Step 4, for image semantic feature Γ i v Sound semantic feature Γ i s Learning cross-modal feature expression Γ through cross-modal mapping networks i
The cross-modal mapping network consists of 2 stacked full connection layers;
the objective function is:
min∑ i (1-Γ i s Γ i /||Γ i s ||·||Γ i ||) (4)
wherein I II a modulus representing a vector;
when the target function formula (4) is minimum, training is completed by the cross-modal mapping network; the output of the cross-modal mapping network is the cross-modal feature expression Γ i
Step 5: expressing Γ to cross-modal features i Computing sound waveforms through a cross-modal feature network
Figure BDA0002928335230000033
The cross-modal characteristic network consists of 3 stacked residual blocks and 2 full-connection layers which are sequentially connected;
the objective function is:
Figure BDA0002928335230000034
Figure BDA0002928335230000035
wherein x represents any real number;
when the target function formula (5) is minimum, training is completed across the modal feature network; the output of the cross-modal mapping network is a sound waveform
Figure BDA0002928335230000036
Step 6: inputting the image to be tested to the image coding network trained in the step 2, and outputting to obtain the image semantic features of the image to be tested; inputting the image semantic features of the image to be tested into the cross-modal mapping network trained in the step 4, and outputting to obtain the cross-modal feature expression of the image to be tested; and finally, inputting the cross-modal characteristic expression of the image to be tested into the cross-modal mapping network trained in the step 5, and finally outputting to obtain the sound waveform of the test piece conversion.
Preferably, the image encoding network employs a modified VGG16 network, i.e. all fully connected layers of the VGG16 network are replaced by one randomly initialized fully connected layer.
The beneficial effects of the invention are as follows:
1. the invention realizes the direct conversion from visual data to auditory data through the semantic reservation of the low-dimensional characteristic layer, and has high robustness and accuracy.
2. The method can realize the generation of the sound waveform based on the human language by effectively decoding the low-dimensional characteristics, so that the method is more suitable for the people with vision impairment to accurately perceive the surrounding information.
3. The invention can effectively solve the problem of difficult generation between visual data and voice waveforms based on human language, has high training speed and strong generated voice intelligibility, and the highest STOI value can reach 0.9682.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The invention will be further described with reference to the drawings and examples.
According to the invention, the information conversion between audios and audios is regarded as a problem of expression similarity learning in a low-dimensional space, the cross-modal conversion of the features is realized in the low-dimensional space by extracting the semantic features of the images, and finally the low-dimensional cross-modal features are mapped into the voice waveforms based on human language. The invention solves the problem of limitation of the existing visual-to-auditory cross-mode information conversion method to accurately generate the voice waveform based on human language in the unconstrained environment. And aiming at the unconstrained environment, generating a sound waveform based on human language, so that the sound waveform is more in line with the actual situation.
As shown in fig. 1, a semantic reservation-based cross-audiovisual information conversion method includes the following steps:
step 1: the image-audio description data set has N pairs of image-audio pairs; taking the image-audio description data set as a training set, normalizing each image in the training set according to the formula (1) to obtain a normalized image
Figure BDA0002928335230000041
Figure BDA0002928335230000042
Wherein X is tr For images in the training set, mu and theta are the mean value and standard deviation of all the images in the training set respectively, and K is the number of pixels of a single image in the training set;
step 2: for normalized images
Figure BDA0002928335230000043
Learning image semantic features Γ through image coding networks i v I is the image-audio pair number, i=1, 2,. -%, N;
the objective function is:
Figure BDA0002928335230000044
wherein l i And
Figure BDA0002928335230000045
the method comprises the steps of respectively inputting real semantic tags of an image and semantic tags predicted by an image coding network;
when the target function formula (2) is minimum, the image coding network completes training; the output of the image coding network is the image semantic feature Γ i v
Step 3, learning sound semantic features Γ for sound of the image-audio description data set through the sound coding network i s
The sound coding network consists of 6 sequentially connected full-connection layers, network training is completed through error feedback between input and reconstructed sound, and finally the output of the 3 rd full-connection layer is a sound semantic feature;
the objective function is:
Figure BDA0002928335230000051
wherein,,
Figure BDA0002928335230000052
sound describing the dataset for image-audio, +.>
Figure BDA0002928335230000053
Sound reconstructed for the sound coding network;
when the target function formula (3) is minimum, the voice coding network finishes training; the output of the voice coding network is voice semantic feature Γ i s
Step 4, for image semantic feature Γ i v Sound semantic feature Γ i s Learning cross-modal feature expression Γ through cross-modal mapping networks i
The cross-modal mapping network consists of 2 stacked full connection layers;
the objective function is:
min∑ i (1-Γ i s Γ i /||Γ i s ||·||Γ i ||) (4)
wherein I II a modulus representing a vector;
when the target function formula (4) is minimum, training is completed by the cross-modal mapping network; the output of the cross-modal mapping network is the cross-modal feature expression Γ i
Step 5: expressing Γ to cross-modal features i Computing sound waveforms through a cross-modal feature network
Figure BDA0002928335230000054
The cross-modal characteristic network consists of 3 stacked residual blocks and 2 full-connection layers which are sequentially connected;
the objective function is:
Figure BDA0002928335230000055
Figure BDA0002928335230000056
wherein x represents any real number;
when the target function formula (5) is minimum, training is completed across the modal feature network; the output of the cross-modal mapping network is a sound waveform
Figure BDA0002928335230000057
Step 6: inputting the image to be tested to the image coding network trained in the step 2, and outputting to obtain the image semantic features of the image to be tested; inputting the image semantic features of the image to be tested into the cross-modal mapping network trained in the step 4, and outputting to obtain the cross-modal feature expression of the image to be tested; and finally, inputting the cross-modal characteristic expression of the image to be tested into the cross-modal mapping network trained in the step 5, and finally outputting to obtain the sound waveform of the test piece conversion.
Specific examples:
1. simulation conditions
In this embodiment, the simulation is performed on an operating system with a central processing unit of Intel (R) Xeon (R) CPU E5-2650V4@2.20GHz and a memory 500G, ubuntu by using Python and other related kits.
The data used in the simulation is an image-audio description data set obtained by autonomously adding an audio description to an existing data set.
2. Emulation content
Model training and testing was performed on MNIST, CIFAR10 and CIFAR100 audio description datasets.
In order to demonstrate the effectiveness of the algorithm, a DCMAVG model, a CMCGAN model and an I2T2A model were chosen as comparison algorithms. The DCMAVG model is presented in document "L.Chen, S.Srivastava, Z.Duan, C.Xu.Deep cross-model audio-visual generation, in Proceedings of the on Thematic Workshops of ACM Multimedia 2017, pp.349-357"; the CMCGAN model is set forth In the literature "W.Hao, Z.Zhang, H.Guan.Cmcgan: A uniform framework for cross-model visual-audio mutual generation, in third-Second AAAI Conference on Artificial Intelligence,2018, pp.6886-6893"; the I2T2A model is obtained by generating a text description of an image by the method described In document "L.Liu, J.Tang, X.Wan, Z.Guo.Generating diverse and descriptive image captions using visual paraphrases, in 2019IEEE International Conference on Computer Vision,2019,pp.4239-4248," and converting the text description into audio by the speaking software. The comparison results are shown in Table 1.
TABLE 1 results of the invention
Figure BDA0002928335230000061
It can be seen from table 1 that the various performance indicators of the present invention are in most cases superior to other methods. According to the invention, through semantic reservation of the low-dimensional feature level, the semantic gap of data in the modes and the heterogeneous gap of data among the modes are reduced, the direct conversion from visual data to auditory data can be accurately realized, and the robustness and accuracy of an algorithm are improved. Meanwhile, the invention can realize the generation of the sound waveform based on the human language, and has strong practicability for visually impaired people.

Claims (2)

1. The trans-audiovisual information conversion method based on semantic reservation is characterized by comprising the following steps of:
step 1: the image-audio description data set has N pairs of image-audio pairs; taking the image-audio description data set as a training set, normalizing each image in the training set according to the formula (1) to obtain a normalized image
Figure FDA0002928335220000011
Figure FDA0002928335220000012
Wherein X is tr For images in the training set, mu and theta are the mean value and standard deviation of all the images in the training set respectively, and K is the number of pixels of a single image in the training set;
step 2: for normalized images
Figure FDA0002928335220000013
Learning image semantic features Γ through image coding networks i v I is the image-audio pair number, i=1, 2, …, N;
the objective function is:
Figure FDA0002928335220000014
wherein l i And
Figure FDA0002928335220000015
the method comprises the steps of respectively inputting real semantic tags of an image and semantic tags predicted by an image coding network;
order of the Chinese medicineWhen the standard function (2) is minimum, the image coding network finishes training; the output of the image coding network is the image semantic feature Γ i v
Step 3, learning sound semantic features Γ for sound of the image-audio description data set through the sound coding network i s
The sound coding network consists of 6 sequentially connected full-connection layers, network training is completed through error feedback between input and reconstructed sound, and finally the output of the 3 rd full-connection layer is a sound semantic feature;
the objective function is:
Figure FDA0002928335220000016
wherein,,
Figure FDA0002928335220000017
sound describing the dataset for image-audio, +.>
Figure FDA0002928335220000018
Sound reconstructed for the sound coding network;
when the target function formula (3) is minimum, the voice coding network finishes training; the output of the voice coding network is voice semantic feature Γ i s
Step 4, for image semantic feature Γ i v Sound semantic feature Γ i s Learning cross-modal feature expression Γ through cross-modal mapping networks i
The cross-modal mapping network consists of 2 stacked full connection layers;
the objective function is:
min∑ i (1-Γ i s Γ i /‖Γ i s ‖·‖Γ i ‖) (4)
wherein II represents the modulus of the vector;
when the objective function (4) is minimum, the cross-mode is adoptedThe mapping network completes training; the output of the cross-modal mapping network is the cross-modal feature expression Γ i
Step 5: expressing Γ to cross-modal features i Computing sound waveforms through a cross-modal feature network
Figure FDA0002928335220000021
The cross-modal characteristic network consists of 3 stacked residual blocks and 2 full-connection layers which are sequentially connected;
the objective function is:
Figure FDA0002928335220000022
Figure FDA0002928335220000023
wherein x represents any real number;
when the target function formula (5) is minimum, training is completed across the modal feature network; the output of the cross-modal mapping network is a sound waveform
Figure FDA0002928335220000024
Step 6: inputting the image to be tested to the image coding network trained in the step 2, and outputting to obtain the image semantic features of the image to be tested; inputting the image semantic features of the image to be tested into the cross-modal mapping network trained in the step 4, and outputting to obtain the cross-modal feature expression of the image to be tested; and finally, inputting the cross-modal characteristic expression of the image to be tested into the cross-modal mapping network trained in the step 5, and finally outputting to obtain the sound waveform of the test piece conversion.
2. The semantic reservation-based cross-audiovisual information conversion method according to claim 1, wherein the image encoding network adopts a modified VGG16 network, i.e. all full connection layers of the VGG16 network are replaced by one full connection layer initialized randomly.
CN202110140393.6A 2021-02-02 2021-02-02 Cross-audiovisual information conversion method based on semantic reservation Active CN112802445B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110140393.6A CN112802445B (en) 2021-02-02 2021-02-02 Cross-audiovisual information conversion method based on semantic reservation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110140393.6A CN112802445B (en) 2021-02-02 2021-02-02 Cross-audiovisual information conversion method based on semantic reservation

Publications (2)

Publication Number Publication Date
CN112802445A CN112802445A (en) 2021-05-14
CN112802445B true CN112802445B (en) 2023-06-30

Family

ID=75813564

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110140393.6A Active CN112802445B (en) 2021-02-02 2021-02-02 Cross-audiovisual information conversion method based on semantic reservation

Country Status (1)

Country Link
CN (1) CN112802445B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115154216B (en) * 2022-06-02 2024-10-29 北京工业大学 Blind guiding method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018188240A1 (en) * 2017-04-10 2018-10-18 北京大学深圳研究生院 Cross-media retrieval method based on deep semantic space
CN111597298A (en) * 2020-03-26 2020-08-28 浙江工业大学 Cross-modal retrieval method and device based on deep confrontation discrete hash learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018188240A1 (en) * 2017-04-10 2018-10-18 北京大学深圳研究生院 Cross-media retrieval method based on deep semantic space
CN111597298A (en) * 2020-03-26 2020-08-28 浙江工业大学 Cross-modal retrieval method and device based on deep confrontation discrete hash learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
视听相关的多模态概念检测;奠雨洁;金琴;;计算机研究与发展(第05期);全文 *

Also Published As

Publication number Publication date
CN112802445A (en) 2021-05-14

Similar Documents

Publication Publication Date Title
Zhou et al. Vision-infused deep audio inpainting
Song et al. Multimodal sparse transformer network for audio-visual speech recognition
Lin et al. Audiovisual transformer with instance attention for audio-visual event localization
US10679643B2 (en) Automatic audio captioning
Zhao et al. Multi-modal multi-cultural dimensional continues emotion recognition in dyadic interactions
Mun et al. Text-guided attention model for image captioning
JP2023537705A (en) AUDIO-VISUAL EVENT IDENTIFICATION SYSTEM, METHOD AND PROGRAM
CA3175428A1 (en) Multimodal analysis combining monitoring modalities to elicit cognitive states and perform screening for mental disorders
CN112053690A (en) Cross-modal multi-feature fusion audio and video voice recognition method and system
US20220172710A1 (en) Interactive systems and methods
Ma et al. Unpaired image-to-speech synthesis with multimodal information bottleneck
WO2023226239A1 (en) Object emotion analysis method and apparatus and electronic device
Misra et al. A Comparison of Supervised and Unsupervised Pre-Training of End-to-End Models.
CN115658954B (en) Cross-modal search countermeasure method based on prompt learning
Li et al. Detection of multiple steganography methods in compressed speech based on code element embedding, Bi-LSTM and CNN with attention mechanisms
CN112802445B (en) Cross-audiovisual information conversion method based on semantic reservation
Ruan et al. Accommodating audio modality in CLIP for multimodal processing
WO2021028236A1 (en) Systems and methods for sound conversion
Djilali et al. Lip2Vec: Efficient and Robust Visual Speech Recognition via Latent-to-Latent Visual to Audio Representation Mapping
Li et al. Robust audio-visual ASR with unified cross-modal attention
CN117370934B (en) Multi-mode data enhancement method of sensitive information discovery model
Hong et al. When hearing the voice, who will come to your mind
CN113743267A (en) Multi-mode video emotion visualization method and device based on spiral and text
Chen et al. Cross-modal dynamic sentiment annotation for speech sentiment analysis
US20230290371A1 (en) System and method for automatically generating a sign language video with an input speech using a machine learning model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant