CN110472384A - A kind of big data water mark method and device based on artificial intelligence - Google Patents
A kind of big data water mark method and device based on artificial intelligence Download PDFInfo
- Publication number
- CN110472384A CN110472384A CN201910746344.XA CN201910746344A CN110472384A CN 110472384 A CN110472384 A CN 110472384A CN 201910746344 A CN201910746344 A CN 201910746344A CN 110472384 A CN110472384 A CN 110472384A
- Authority
- CN
- China
- Prior art keywords
- watermark
- big data
- module
- artificial intelligence
- natural language
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013473 artificial intelligence Methods 0.000 title claims abstract description 47
- 238000000034 method Methods 0.000 title claims abstract description 21
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 title claims abstract description 20
- 230000008447 perception Effects 0.000 claims abstract description 14
- 238000003780 insertion Methods 0.000 claims abstract description 12
- 230000037431 insertion Effects 0.000 claims abstract description 12
- 239000000284 extract Substances 0.000 claims abstract description 8
- 230000000692 anti-sense effect Effects 0.000 claims description 6
- 229910002056 binary alloy Inorganic materials 0.000 claims description 6
- 230000006378 damage Effects 0.000 abstract description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/10—Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
- G06F21/16—Program or content traceability, e.g. by watermarking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/18—Legal services
- G06Q50/184—Intellectual property management
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Technology Law (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Tourism & Hospitality (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- Operations Research (AREA)
- Economics (AREA)
- Computer Security & Cryptography (AREA)
- Entrepreneurship & Innovation (AREA)
- Computer Hardware Design (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Editing Of Facsimile Originals (AREA)
- Image Processing (AREA)
Abstract
The present invention relates to a kind of big data water mark method and device based on artificial intelligence, key step include: that perception of content analysis module carries out language comprehension analysing to big data content, obtain data content type;Artificial intelligence watermark repository module classification stores all kinds of code element natural languages library;Watermark information is converted to the natural language code element in artificial intelligence watermark repository by watermark encoder module;Natural language code element is embedded into original big data content of text by watermark embedding module;Watermark extracting module extracts natural language code element from the big data content of text of insertion watermark;Natural language code element is converted to watermark information by watermark decoder module.The present invention is directed to the unformatted feature of big data content of text; natural language element encoded watermark is embedded in using intelligent algorithm; other information channel is not depended on; the destruction of the various attacks such as big data text editing, duplication, cutting, merging can be resisted; with very strong robustness and robustness, it is capable of the intellectual property of effective protection big data.
Description
Technical field
The present invention relates to a kind of big data water mark method and devices, and in particular to a kind of big data water based on artificial intelligence
Impression method and device, belong to information security field.
Background technique
With the development of big data technology, big data content safety is more and more important, and especially secret protection, knowledge produce
Power, leakage tracing etc. are even more the most important thing.Digital watermarking (Digital Watermarking) is to carry out data assets protection
Important method is usually embedded in digital signal in digital product, can be image, text, symbol, number it is equal all can make
For the information for identifying and marking, the purpose is to carry out copyright protection, proof of ownership, fingerprint (tracking publication multiple copies) and complete
Whole property protection etc..
Traditional digital watermark technology is that some identification informations (i.e. digital watermarking) are directly embedded into digital carrier to (packet
Include multimedia, document, software etc.) or secondary indication (structure of modification specific region), and the use valence of original vector is not influenced
Value is also not easy to be ascertained and modify again.But it can be identified and be recognized by producer.Letter in the carrier is hidden by these
Breath can achieve confirmation creator of content, buyer, transmission secret information or judge the purpose of whether carrier is tampered.Number
Word watermark be protection information security, realize it is anti-fake trace to the source, the effective way of copyright protection, be Investigation of Information Hiding Technology field
Important branch and research direction.
But big data has particularity compared to traditional information system, such as data are largely to deposit text, unformatted
Information needs to guarantee to do in extensive shared and calculating process availability, therefore without image of Buddha picture, audio-video even pdf etc.
Document form is once hidden in format and document properties.
Therefore, the safety of the information content is ensured for new digital watermark technology under big data scene, is needed.With nature
Artificial intelligence technology one of with the characteristics of Language Processing (NLP) can be converted in linguistry level, not influence entirety
Content recognition, on the basis of recognizing reading and understanding, incorporate specific language element (high frequency is synonymous, nearly justice, shape is close, split, merge,
Negate the linguistic units, including word, word, phrase, short sentence such as antisense etc.), to be embedded in the watermark information of protection big data content.It adopts
The digital watermarking that manually intelligent method is realized can resist the volume to content of text under conditions of big data massive information
Volume, processing etc. various attacks and destruction.
Summary of the invention
In view of this, the invention discloses a kind of big data water mark method and device based on artificial intelligence, key step
Include: that perception of content analysis module carries out language comprehension analysing to big data content, obtains data content type;Artificial intelligence water
Print all kinds of code element natural languages of library module classification storage library;Watermark information is converted to artificial intelligence water by watermark encoder module
Print the natural language code element in library;Natural language code element is embedded into original big data text by watermark embedding module
In content;Watermark extracting module extracts natural language code element from the big data content of text of insertion watermark;Water
It prints decoder module and natural language code element is converted into watermark information.The present invention is directed to the unformatted spy of big data content of text
Point is embedded in natural language element encoded watermark using intelligent algorithm, does not depend on other information channel, can resist big data
The various attacks such as text editing, duplication, cutting, merging destroy, have very strong robustness and robustness, can effective protection it is big
The intellectual property of data.
Technical scheme is as follows: a kind of big data water mark method based on artificial intelligence, step include:
1) perception of content analysis module carries out language comprehension analysing to big data content, obtains data content type;
2) artificial intelligence watermark repository module classification stores all kinds of code element natural languages library;
3) watermark information is converted to the natural language code element in artificial intelligence watermark repository by watermark encoder module;
4) natural language code element is embedded into original big data content of text by watermark embedding module;
5) watermark extracting module extracts natural language code element from the big data content of text of insertion watermark;
6) natural language code element is converted to watermark information by watermark decoder module.
Further, the content of big data is identified and analyzed in the perception of content analysis module, obtains in data
The linguistic property of appearance, including the multiple types such as Chinese, English, ancient Chinese prose, modern age text, astronomy, geography, law, official document, prose are special
Sign.
Further, the artificial intelligence watermark library module establishes different natures according to different language form respectively
Speech encoding element is the language lists such as the synonymous high frequency in all kinds of language classifications, nearly justice, shape is close, split, merge, negative antisense
Member, including word, word, phrase, short sentence etc..
Further, the binary equivament code of watermark information is converted to artificial intelligence watermark repository by the watermark encoder module
In natural language coding, in order to increase safety, the binary code of watermark can be encrypted by Encryption Algorithm.
Further, the natural language code element of the watermark embedding module watermark is substituted into big data urtext
Rong Zhong.
Further, the watermark extracting module and watermark decoder module are the inverse process for being embedded in watermark, from from insertion
Natural language code element is extracted in the big data content of text of watermark, and is converted to original watermark information.
The present invention also proposes the big data watermarking device based on artificial intelligence, including perception of content analysis module, artificial intelligence
Energy watermark library module, watermark encoder module, watermark embedding module, watermark extracting module and watermark decoder module,
The content of big data is identified and analyzed in the perception of content analysis module, obtains the linguistic property of data content,
Including the multiple types feature such as Chinese, English, ancient Chinese prose, modern age text, astronomy, geography, law, official document, prose;
The artificial intelligence watermark library module establishes different natural language code elements according to different language form respectively,
For linguistic units such as the high frequency in all kinds of language classifications is synonymous, nearly justice, shape is close, split, merge, negative antisenses, including word, word,
Phrase, short sentence etc.;
The binary equivament code of watermark information is converted to the natural language in artificial intelligence watermark repository by the watermark encoder module
Coding;
The natural language code element of the watermark embedding module watermark is substituted into big data raw text content;
The watermark extracting module extracts natural language code element from the big data content of text of insertion watermark;
Natural language code element is converted to watermark information by the watermark decoder module, if original watermark binary system have passed through
Encryption, decoding will finally carry out binary system decryption.
The invention has the benefit that
The present invention provides a kind of big data water mark method and device based on artificial intelligence, perception of content analysis module is to big number
Language comprehension analysing is carried out according to content, obtains data content type;Artificial intelligence watermark repository module classification stores all kinds of coding members
Plain natural language library;Watermark information is converted to the natural language code element in artificial intelligence watermark repository by watermark encoder module;
Natural language code element is embedded into original big data content of text by watermark embedding module;Watermark extracting module from insertion water
Natural language code element is extracted in the big data content of text of print;Watermark decoder module is by natural language code element
Be converted to watermark information.The present invention is directed to the unformatted feature of big data content of text, is embedded in nature using intelligent algorithm
Language element encoded watermark does not depend on other information channel, and it is more can to resist big data text editing, duplication, cutting, merging etc.
Kind attack destroys, and has very strong robustness and robustness, is capable of the intellectual property of effective protection big data.
Detailed description of the invention
Attached drawing 1 is that the present invention is based on the big data watermarks of artificial intelligence to be embedded in flow chart.
Attached drawing 2 is the big data watermark extracting flow chart the present invention is based on artificial intelligence.
Specific embodiment
The invention will be further described with reference to the accompanying drawings and examples.
Big data watermarking device disclosed in one embodiment of the invention based on artificial intelligence, the steps include:
1) perception of content analysis module carries out language comprehension analysing to big data content, obtains data content type;
2) artificial intelligence watermark repository module classification stores all kinds of code element natural languages library;
3) watermark information is converted to the natural language code element in artificial intelligence watermark repository by watermark encoder module;
4) natural language code element is embedded into original big data content of text by watermark embedding module;
5) watermark extracting module extracts natural language code element from the big data content of text of insertion watermark;
6) natural language code element is converted to watermark information by watermark decoder module.
Below by way of specific example in attached drawing based on artificial intelligence big data water mark method and device carry out into one
The explanation of step.
As shown in Fig. 1, the big data watermark insertion based on artificial intelligence, key step include:
1, the content of big data is identified and analyzed in perception of content analysis module, obtains the linguistic property of data content, packet
Include the multiple types features such as Chinese, English, ancient Chinese prose, modern age text, astronomy, geography, law, official document, prose;
2, artificial intelligence watermark library module establishes different natural language code elements according to different language form respectively, is
The linguistic units such as high frequency in all kinds of language classifications is synonymous, nearly justice, shape is close, split, merge, negative antisense, including it is word, word, short
Language, short sentence etc.;
3, the binary equivament code of watermark information is converted to the natural language in artificial intelligence watermark repository and compiled by watermark encoder module
Code, in order to increase safety, the binary code of watermark can be encrypted by Encryption Algorithm;
4, the natural language code element of the watermark embedding module watermark is substituted into big data raw text content.
As shown in Fig. 2, the big data watermark extracting based on artificial intelligence, its step are as follows:
1, the content of big data is identified and analyzed in perception of content analysis module, obtains the linguistic property of data content, packet
Include the multiple types features such as Chinese, English, ancient Chinese prose, modern age text, astronomy, geography, law, official document, prose;
2, the natural language library in watermark extracting module combination artificial intelligence watermark library module, from the big data text of insertion watermark
Natural language code element is extracted in content;
3, natural language code element is converted to watermark information by watermark decoder module, is added if original watermark binary system have passed through
Close, decoding will finally carry out binary system decryption.
The purpose of the above described specific embodiments of the present invention is use for a better understanding of the present invention, is not constituted
Limiting the scope of the present invention.Any modification made within the spirit and principles in the present invention essence deforms and is equal
Replacement etc., all should belong within scope of protection of the claims of the invention.
Claims (7)
1. a kind of big data water mark method based on artificial intelligence, step include:
1) perception of content analysis module carries out language comprehension analysing to big data content, obtains data content type;
2) artificial intelligence watermark repository module classification stores all kinds of code element natural languages library;
3) watermark information is converted to the natural language code element in artificial intelligence watermark repository by watermark encoder module;
4) natural language code element is embedded into original big data content of text by watermark embedding module;
5) watermark extracting module extracts natural language code element from the big data content of text of insertion watermark;
6) natural language code element is converted to watermark information by watermark decoder module.
2. the big data water mark method based on artificial intelligence as described in claim 1, which is characterized in that the perception of content point
The content of big data is identified and analyzed in analysis module, obtains the linguistic property of data content, including Chinese, English, ancient Chinese prose,
The multiple types feature such as modern age text, astronomy, geography, law, official document, prose.
3. the big data water mark method based on artificial intelligence as described in claim 1, which is characterized in that the artificial intelligence water
Print library module establishes different natural language code elements according to different language form respectively, is in all kinds of language classifications
The linguistic units, including word, word, phrase, short sentence such as high frequency is synonymous, nearly justice, shape is close, split, merge, negative antisense etc..
4. the big data water mark method based on artificial intelligence as described in claim 1, which is characterized in that the watermark encoder mould
The binary equivament code of watermark information is converted to the coding of the natural language in artificial intelligence watermark repository by block, in order to increase safety
Property, the binary code of watermark can be encrypted by Encryption Algorithm.
5. the big data water mark method based on artificial intelligence as described in claim 1, which is characterized in that the watermark is embedded in mould
The natural language code element of block watermark is substituted into big data raw text content.
6. the big data water mark method based on artificial intelligence as described in claim 1, which is characterized in that the watermark extracting
Module and watermark decoder module are the inverse process for being embedded in watermark, by natural language from the big data content of text from insertion watermark
Code element extracts, and is converted to original watermark information.
7. a kind of big data watermarking device based on artificial intelligence, including perception of content analysis module, artificial intelligence watermark repository mould
Block, watermark encoder module, watermark embedding module, watermark extracting module and watermark decoder module,
The content of big data is identified and analyzed in the perception of content analysis module, obtains the linguistic property of data content,
Including the multiple types feature such as Chinese, English, ancient Chinese prose, modern age text, astronomy, geography, law, official document, prose;
The artificial intelligence watermark library module establishes different natural language code elements according to different language form respectively,
For linguistic units such as the high frequency in all kinds of language classifications is synonymous, nearly justice, shape is close, split, merge, negative antisenses, including word, word,
Phrase, short sentence etc.;
The binary equivament code of watermark information is converted to the natural language in artificial intelligence watermark repository by the watermark encoder module
Coding;
The natural language code element of the watermark embedding module watermark is substituted into big data raw text content;
The watermark extracting module extracts natural language code element from the big data content of text of insertion watermark;
Natural language code element is converted to watermark information by the watermark decoder module, if original watermark binary system have passed through
Encryption, decoding will finally carry out binary system decryption.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910746344.XA CN110472384A (en) | 2019-08-13 | 2019-08-13 | A kind of big data water mark method and device based on artificial intelligence |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910746344.XA CN110472384A (en) | 2019-08-13 | 2019-08-13 | A kind of big data water mark method and device based on artificial intelligence |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110472384A true CN110472384A (en) | 2019-11-19 |
Family
ID=68510597
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910746344.XA Pending CN110472384A (en) | 2019-08-13 | 2019-08-13 | A kind of big data water mark method and device based on artificial intelligence |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110472384A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111324871A (en) * | 2020-03-09 | 2020-06-23 | 河南大学 | Big data watermarking method and device based on artificial intelligence |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101957810A (en) * | 2009-07-16 | 2011-01-26 | 西安腾惟科技有限公司 | Method and device for embedding and detecting watermark in document by using computer system |
CN102194205A (en) * | 2010-03-18 | 2011-09-21 | 湖南大学 | Method and device for text recoverable watermark based on synonym replacement |
CN102254126A (en) * | 2011-07-29 | 2011-11-23 | 西安交通大学 | Robust-based natural language Hash domain spread spectrum watermarking coding algorithm for |
CN105205355A (en) * | 2015-11-05 | 2015-12-30 | 南通大学 | Embedding method and extracting method for text watermark based on semantic role position mapping |
US20190130080A1 (en) * | 2017-10-27 | 2019-05-02 | Telefonica Digital Espana, S.L.U. | Watermark embedding and extracting method for protecting documents |
-
2019
- 2019-08-13 CN CN201910746344.XA patent/CN110472384A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101957810A (en) * | 2009-07-16 | 2011-01-26 | 西安腾惟科技有限公司 | Method and device for embedding and detecting watermark in document by using computer system |
CN102194205A (en) * | 2010-03-18 | 2011-09-21 | 湖南大学 | Method and device for text recoverable watermark based on synonym replacement |
CN102254126A (en) * | 2011-07-29 | 2011-11-23 | 西安交通大学 | Robust-based natural language Hash domain spread spectrum watermarking coding algorithm for |
CN105205355A (en) * | 2015-11-05 | 2015-12-30 | 南通大学 | Embedding method and extracting method for text watermark based on semantic role position mapping |
US20190130080A1 (en) * | 2017-10-27 | 2019-05-02 | Telefonica Digital Espana, S.L.U. | Watermark embedding and extracting method for protecting documents |
Non-Patent Citations (1)
Title |
---|
何路 等: "自然语言水印鲁棒性分析与评估", 《计算机学报》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111324871A (en) * | 2020-03-09 | 2020-06-23 | 河南大学 | Big data watermarking method and device based on artificial intelligence |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yang et al. | TS-RNN: Text steganalysis based on recurrent neural networks | |
Kamaruddin et al. | A review of text watermarking: theory, methods, and applications | |
Ahvanooey et al. | ANiTW: A novel intelligent text watermarking technique for forensic identification of spurious information on social media | |
Agarwal | Text steganographic approaches: a comparison | |
Jalil et al. | A review of digital watermarking techniques for text documents | |
CN101957810A (en) | Method and device for embedding and detecting watermark in document by using computer system | |
CN103049682B (en) | Character pitch encoding-based dual-watermark embedded text watermarking method | |
US20210165860A1 (en) | Watermark embedding and extracting method for protecting documents | |
Al-Wesabi | Proposing high-smart approach for content authentication and tampering detection of Arabic text transmitted via internet | |
Alginahi et al. | An enhanced Kashida-based watermarking approach for Arabic text-documents | |
Domain | A review and open issues of diverse text watermarking techniques in spatial domain | |
Memon et al. | EVALUATION OF STEGANOGRAPHY FOR URDU/ARABIC TEXT. | |
CN103544408A (en) | Method for embedment and extraction of PDF document hidden information according to composite font | |
Alkawaz et al. | Concise analysis of current text automation and watermarking approaches | |
Al-Wesabi et al. | A Reliable NLP Scheme for English Text Watermarking Based on Contents Interrelationship. | |
CN110472384A (en) | A kind of big data water mark method and device based on artificial intelligence | |
Ghilan et al. | Combined Markov model and zero watermarking techniques to enhance content authentication of english text documents | |
CN114078071A (en) | Image tracing method, device and medium | |
CN114648435A (en) | Method, device and equipment for detecting watermark in text and storage medium | |
Zhang et al. | Chinese text watermarking based on occlusive components | |
Zheng et al. | General framework for reversible data hiding in texts based on masked language modeling | |
Alamgeer et al. | Smart-Fragile Authentication Scheme for Robust Detecting of Tampering Attacks on English Text. | |
Pathak | A new approach for text steganography using Hindi numerical code | |
Tian et al. | A Text Watermarking Algorithm based on Hidden Object. | |
CN113542908A (en) | Video detection method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20191119 |