CN110472384A

CN110472384A - A kind of big data water mark method and device based on artificial intelligence

Info

Publication number: CN110472384A
Application number: CN201910746344.XA
Authority: CN
Inventors: 邓高见; 张慧; 李萌
Original assignee: Zhongke Tianyu (suzhou) Technology Co Ltd
Current assignee: Zhongke Tianyu (suzhou) Technology Co Ltd
Priority date: 2019-08-13
Filing date: 2019-08-13
Publication date: 2019-11-19

Abstract

The present invention relates to a kind of big data water mark method and device based on artificial intelligence, key step include: that perception of content analysis module carries out language comprehension analysing to big data content, obtain data content type；Artificial intelligence watermark repository module classification stores all kinds of code element natural languages library；Watermark information is converted to the natural language code element in artificial intelligence watermark repository by watermark encoder module；Natural language code element is embedded into original big data content of text by watermark embedding module；Watermark extracting module extracts natural language code element from the big data content of text of insertion watermark；Natural language code element is converted to watermark information by watermark decoder module.The present invention is directed to the unformatted feature of big data content of text; natural language element encoded watermark is embedded in using intelligent algorithm; other information channel is not depended on; the destruction of the various attacks such as big data text editing, duplication, cutting, merging can be resisted; with very strong robustness and robustness, it is capable of the intellectual property of effective protection big data.

Description

A kind of big data water mark method and device based on artificial intelligence

Technical field

The present invention relates to a kind of big data water mark method and devices, and in particular to a kind of big data water based on artificial intelligence Impression method and device, belong to information security field.

Background technique

With the development of big data technology, big data content safety is more and more important, and especially secret protection, knowledge produce Power, leakage tracing etc. are even more the most important thing.Digital watermarking (Digital Watermarking) is to carry out data assets protection Important method is usually embedded in digital signal in digital product, can be image, text, symbol, number it is equal all can make For the information for identifying and marking, the purpose is to carry out copyright protection, proof of ownership, fingerprint (tracking publication multiple copies) and complete Whole property protection etc..

Traditional digital watermark technology is that some identification informations (i.e. digital watermarking) are directly embedded into digital carrier to (packet Include multimedia, document, software etc.) or secondary indication (structure of modification specific region), and the use valence of original vector is not influenced Value is also not easy to be ascertained and modify again.But it can be identified and be recognized by producer.Letter in the carrier is hidden by these Breath can achieve confirmation creator of content, buyer, transmission secret information or judge the purpose of whether carrier is tampered.Number Word watermark be protection information security, realize it is anti-fake trace to the source, the effective way of copyright protection, be Investigation of Information Hiding Technology field Important branch and research direction.

But big data has particularity compared to traditional information system, such as data are largely to deposit text, unformatted Information needs to guarantee to do in extensive shared and calculating process availability, therefore without image of Buddha picture, audio-video even pdf etc. Document form is once hidden in format and document properties.

Therefore, the safety of the information content is ensured for new digital watermark technology under big data scene, is needed.With nature Artificial intelligence technology one of with the characteristics of Language Processing (NLP) can be converted in linguistry level, not influence entirety Content recognition, on the basis of recognizing reading and understanding, incorporate specific language element (high frequency is synonymous, nearly justice, shape is close, split, merge, Negate the linguistic units, including word, word, phrase, short sentence such as antisense etc.), to be embedded in the watermark information of protection big data content.It adopts The digital watermarking that manually intelligent method is realized can resist the volume to content of text under conditions of big data massive information Volume, processing etc. various attacks and destruction.

Summary of the invention

In view of this, the invention discloses a kind of big data water mark method and device based on artificial intelligence, key step Include: that perception of content analysis module carries out language comprehension analysing to big data content, obtains data content type；Artificial intelligence water Print all kinds of code element natural languages of library module classification storage library；Watermark information is converted to artificial intelligence water by watermark encoder module Print the natural language code element in library；Natural language code element is embedded into original big data text by watermark embedding module In content；Watermark extracting module extracts natural language code element from the big data content of text of insertion watermark；Water It prints decoder module and natural language code element is converted into watermark information.The present invention is directed to the unformatted spy of big data content of text Point is embedded in natural language element encoded watermark using intelligent algorithm, does not depend on other information channel, can resist big data The various attacks such as text editing, duplication, cutting, merging destroy, have very strong robustness and robustness, can effective protection it is big The intellectual property of data.

Technical scheme is as follows: a kind of big data water mark method based on artificial intelligence, step include:

1) perception of content analysis module carries out language comprehension analysing to big data content, obtains data content type；

2) artificial intelligence watermark repository module classification stores all kinds of code element natural languages library；

3) watermark information is converted to the natural language code element in artificial intelligence watermark repository by watermark encoder module；

4) natural language code element is embedded into original big data content of text by watermark embedding module；

5) watermark extracting module extracts natural language code element from the big data content of text of insertion watermark；

6) natural language code element is converted to watermark information by watermark decoder module.

Further, the content of big data is identified and analyzed in the perception of content analysis module, obtains in data The linguistic property of appearance, including the multiple types such as Chinese, English, ancient Chinese prose, modern age text, astronomy, geography, law, official document, prose are special Sign.

Further, the artificial intelligence watermark library module establishes different natures according to different language form respectively Speech encoding element is the language lists such as the synonymous high frequency in all kinds of language classifications, nearly justice, shape is close, split, merge, negative antisense Member, including word, word, phrase, short sentence etc..

Further, the binary equivament code of watermark information is converted to artificial intelligence watermark repository by the watermark encoder module In natural language coding, in order to increase safety, the binary code of watermark can be encrypted by Encryption Algorithm.

Further, the natural language code element of the watermark embedding module watermark is substituted into big data urtext Rong Zhong.

Further, the watermark extracting module and watermark decoder module are the inverse process for being embedded in watermark, from from insertion Natural language code element is extracted in the big data content of text of watermark, and is converted to original watermark information.

The present invention also proposes the big data watermarking device based on artificial intelligence, including perception of content analysis module, artificial intelligence Energy watermark library module, watermark encoder module, watermark embedding module, watermark extracting module and watermark decoder module,

The content of big data is identified and analyzed in the perception of content analysis module, obtains the linguistic property of data content, Including the multiple types feature such as Chinese, English, ancient Chinese prose, modern age text, astronomy, geography, law, official document, prose；

The artificial intelligence watermark library module establishes different natural language code elements according to different language form respectively, For linguistic units such as the high frequency in all kinds of language classifications is synonymous, nearly justice, shape is close, split, merge, negative antisenses, including word, word, Phrase, short sentence etc.；

The binary equivament code of watermark information is converted to the natural language in artificial intelligence watermark repository by the watermark encoder module Coding；

The natural language code element of the watermark embedding module watermark is substituted into big data raw text content；

The watermark extracting module extracts natural language code element from the big data content of text of insertion watermark；

Natural language code element is converted to watermark information by the watermark decoder module, if original watermark binary system have passed through Encryption, decoding will finally carry out binary system decryption.

The invention has the benefit that

The present invention provides a kind of big data water mark method and device based on artificial intelligence, perception of content analysis module is to big number Language comprehension analysing is carried out according to content, obtains data content type；Artificial intelligence watermark repository module classification stores all kinds of coding members Plain natural language library；Watermark information is converted to the natural language code element in artificial intelligence watermark repository by watermark encoder module； Natural language code element is embedded into original big data content of text by watermark embedding module；Watermark extracting module from insertion water Natural language code element is extracted in the big data content of text of print；Watermark decoder module is by natural language code element Be converted to watermark information.The present invention is directed to the unformatted feature of big data content of text, is embedded in nature using intelligent algorithm Language element encoded watermark does not depend on other information channel, and it is more can to resist big data text editing, duplication, cutting, merging etc. Kind attack destroys, and has very strong robustness and robustness, is capable of the intellectual property of effective protection big data.

Detailed description of the invention

Attached drawing 1 is that the present invention is based on the big data watermarks of artificial intelligence to be embedded in flow chart.

Attached drawing 2 is the big data watermark extracting flow chart the present invention is based on artificial intelligence.

Specific embodiment

The invention will be further described with reference to the accompanying drawings and examples.

Big data watermarking device disclosed in one embodiment of the invention based on artificial intelligence, the steps include:

Below by way of specific example in attached drawing based on artificial intelligence big data water mark method and device carry out into one The explanation of step.

As shown in Fig. 1, the big data watermark insertion based on artificial intelligence, key step include:

1, the content of big data is identified and analyzed in perception of content analysis module, obtains the linguistic property of data content, packet Include the multiple types features such as Chinese, English, ancient Chinese prose, modern age text, astronomy, geography, law, official document, prose；

2, artificial intelligence watermark library module establishes different natural language code elements according to different language form respectively, is The linguistic units such as high frequency in all kinds of language classifications is synonymous, nearly justice, shape is close, split, merge, negative antisense, including it is word, word, short Language, short sentence etc.；

3, the binary equivament code of watermark information is converted to the natural language in artificial intelligence watermark repository and compiled by watermark encoder module Code, in order to increase safety, the binary code of watermark can be encrypted by Encryption Algorithm；

4, the natural language code element of the watermark embedding module watermark is substituted into big data raw text content.

As shown in Fig. 2, the big data watermark extracting based on artificial intelligence, its step are as follows:

2, the natural language library in watermark extracting module combination artificial intelligence watermark library module, from the big data text of insertion watermark Natural language code element is extracted in content；

3, natural language code element is converted to watermark information by watermark decoder module, is added if original watermark binary system have passed through Close, decoding will finally carry out binary system decryption.

The purpose of the above described specific embodiments of the present invention is use for a better understanding of the present invention, is not constituted Limiting the scope of the present invention.Any modification made within the spirit and principles in the present invention essence deforms and is equal Replacement etc., all should belong within scope of protection of the claims of the invention.

Claims

1. a kind of big data water mark method based on artificial intelligence, step include:

2. the big data water mark method based on artificial intelligence as described in claim 1, which is characterized in that the perception of content point The content of big data is identified and analyzed in analysis module, obtains the linguistic property of data content, including Chinese, English, ancient Chinese prose, The multiple types feature such as modern age text, astronomy, geography, law, official document, prose.

3. the big data water mark method based on artificial intelligence as described in claim 1, which is characterized in that the artificial intelligence water Print library module establishes different natural language code elements according to different language form respectively, is in all kinds of language classifications The linguistic units, including word, word, phrase, short sentence such as high frequency is synonymous, nearly justice, shape is close, split, merge, negative antisense etc..

4. the big data water mark method based on artificial intelligence as described in claim 1, which is characterized in that the watermark encoder mould The binary equivament code of watermark information is converted to the coding of the natural language in artificial intelligence watermark repository by block, in order to increase safety Property, the binary code of watermark can be encrypted by Encryption Algorithm.

5. the big data water mark method based on artificial intelligence as described in claim 1, which is characterized in that the watermark is embedded in mould The natural language code element of block watermark is substituted into big data raw text content.

6. the big data water mark method based on artificial intelligence as described in claim 1, which is characterized in that the watermark extracting Module and watermark decoder module are the inverse process for being embedded in watermark, by natural language from the big data content of text from insertion watermark Code element extracts, and is converted to original watermark information.

7. a kind of big data watermarking device based on artificial intelligence, including perception of content analysis module, artificial intelligence watermark repository mould Block, watermark encoder module, watermark embedding module, watermark extracting module and watermark decoder module,