[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN118051631B - Information analysis management method and system for digital new media based on big data - Google Patents

Information analysis management method and system for digital new media based on big data Download PDF

Info

Publication number
CN118051631B
CN118051631B CN202410200785.0A CN202410200785A CN118051631B CN 118051631 B CN118051631 B CN 118051631B CN 202410200785 A CN202410200785 A CN 202410200785A CN 118051631 B CN118051631 B CN 118051631B
Authority
CN
China
Prior art keywords
digital media
data
information
word
public opinion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410200785.0A
Other languages
Chinese (zh)
Other versions
CN118051631A (en
Inventor
王雪羿
姜宇璇
陈琳晔
刘成国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN202410200785.0A priority Critical patent/CN118051631B/en
Publication of CN118051631A publication Critical patent/CN118051631A/en
Application granted granted Critical
Publication of CN118051631B publication Critical patent/CN118051631B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/44Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/45Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • Primary Health Care (AREA)
  • Probability & Statistics with Applications (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method and a system for information analysis and management of digital new media based on big data, which relate to the field of digital new media and comprise the steps of acquiring digital media data, according to the digital media data, based on data preprocessing, digital media standard data are acquired, according to the digital media classification database, based on data mining, digital media data characteristic information is acquired. According to the invention, the digital media data is subjected to data preprocessing, so that the subsequent keyword extraction efficiency and accuracy are improved, the demand information, behavior habit and attitude tendency of the digital media user are known through data mining, the public opinion monitoring keyword information is used for monitoring the public opinion information related to targets on social media, news websites, forums and other media, whether the public opinion attention is too high is judged through the digital media data public opinion correlation indexes, the strategy is timely adjusted through the digital media data public opinion comment indexes, and the market feedback receiving degree is improved.

Description

Information analysis management method and system for digital new media based on big data
Technical Field
The invention relates to the field of digital new media, in particular to an information analysis management method and system of digital new media based on big data.
Background
With the rapid development of computer technology and network technology, information acquisition through an informationized system is becoming a necessary means for developing work in various industries and fields. Under the background of digital new media, the construction of an informationized system starts to develop vigorously, and the digital media technology mainly comprises scene design, character image design, game program design, multimedia post-processing and man-machine interaction technology, and mainly aims at the major of work design such as game development and website artists and creative design.
The digital new media is a comprehensive cross subject which takes information science and digital technology as a leading part and takes mass propagation theory as a basis, fuses culture and art, and applies the information propagation technology to the science and culture high-fusion fields of culture, art, entertainment, commerce, education, management and the like; the digital new media comprises various forms such as images, characters, audio, video and the like, and the transmission forms and the transmission contents adopt digitization, namely, the digitization process of information acquisition, access, processing, management and distribution; digital new media have become the latest and most widespread information carriers in information society, penetrating almost all aspects of people's life and work.
At present, the processing of the digital new media data also has the problems that keywords in the digital new media data cannot be accurately extracted, the subsequent classified storage of the digital new media data is affected, the public opinion information related to the digital new media data on social media, news websites, forums and other media cannot be timely monitored, and strategies cannot be timely adjusted and crisis coping with according to public attention, feedback opinion and attitude tendency.
Disclosure of Invention
In order to solve the technical problems, the technical scheme solves the problems that keywords in the digital new media data cannot be accurately extracted, the subsequent classified storage of the digital new media data cannot be affected, public opinion information related to the media such as social media, news websites and forums cannot be monitored in time, and strategies cannot be adjusted and crisis coping with according to public attention, feedback opinion and attitude tendency in time, which are proposed in the background art.
In order to achieve the above purpose, the invention adopts the following technical scheme:
the information analysis management method of the digitalized new media based on big data comprises the following steps:
acquiring digital media data, wherein the digital media data comprises text data, picture data and video data;
acquiring digital media standard data based on data preprocessing according to the digital media data, wherein the data preprocessing comprises data cleaning, data denoising, data word segmentation and word frequency statistics;
Obtaining standard data keyword information according to the digital media standard data;
acquiring digital media classification data based on data classification according to standard data keyword information;
Acquiring a digital media classification database based on distributed storage according to the digital media classification data;
Acquiring digital media data characteristic information based on data mining according to a digital media classification database, wherein the digital media data characteristic information comprises digital media user demand information, behavior habit information and attitude tendency information;
acquiring the characteristic visualization information of the digital media data based on the visualization processing according to the characteristic information of the digital media data;
obtaining public opinion monitoring keyword information according to the actual development requirements and public opinion monitoring of the digital new media;
Judging whether the digital media standard data is public opinion related data or not according to the standard data keyword information and public opinion monitoring keyword information, if not, recording the digital media standard data, and if so, analyzing the digital media standard data to obtain the digital media data public opinion related information;
Acquiring digital media data public opinion correlation indexes according to the digital media data public opinion correlation information;
Acquiring a digital media data public opinion correlation index threshold and a digital media data public opinion comment threshold according to the actual demand of the digital new media;
Judging whether the public opinion attention is too high according to the digital media data public opinion correlation index and the digital media data public opinion correlation index threshold, if not, recording the digital media data public opinion correlation index to obtain public opinion correlation index data;
according to the public opinion related index data, based on visual processing, obtaining public opinion related index curve information;
And outputting and displaying the public opinion correlation index curve according to the public opinion correlation index curve information, wherein the public opinion correlation index curve information comprises public opinion correlation index information, public opinion comment proportion information and public opinion attitude tendency information.
Preferably, the determining whether the public opinion attention is too high according to the digital media data public opinion correlation index and the digital media data public opinion correlation index threshold further includes:
If the digital media data public opinion correlation index exceeds the digital media data public opinion correlation index threshold, outputting and displaying the information of overhigh public opinion attention;
acquiring digital media data public opinion comment information according to the digital media data public opinion related information, wherein the digital media data public opinion comment information comprises positive comment information, negative comment information and neutral comment information;
Acquiring digital media data public opinion comment indexes according to the digital media data public opinion comment information;
judging whether the digital media data public opinion comment index exceeds the digital media data public opinion comment threshold according to the digital media data public opinion comment index and the digital media data public opinion comment threshold, if not, recording the digital media data public opinion comment index, and if so, acquiring strategy adjustment scheme information based on public opinion analysis according to the digital media data public opinion comment index;
According to the strategy adjustment scheme information, the strategy is adjusted;
the calculation formula of the digital media data public opinion comment index is as follows:
wherein Q is the digital media data public opinion comment index, As an error term,For a scaling factor of negative comments in digital media data public opinion comment information,As a weight for the negative comments,The impact evaluation value of the negative comments is (0-100),For the proportionality coefficient of the neutral comment,To neutral the weight of the comment,To neutralize negative attitude propensity scaling values in the valuation theory,The positive attitude tendency ratio value in the neutral evaluation theory.
Preferably, the acquiring digital media standard data based on data preprocessing according to the digital media data specifically includes:
Acquiring character information of the digital media data based on character recognition according to the digital media data;
removing HTML tag data and special character data according to the character information of the digital media data to obtain digital media cleaning data;
acquiring digital media word segmentation data based on JIEBA Chinese word segmentation and NLTK English word segmentation according to the digital media cleaning data;
acquiring digital media filtering data based on deactivated word filtering according to the digital media word segmentation data;
acquiring digital media data entity identification information based on Stanford NER named entity identification according to the digital media filtering data;
Acquiring word frequency information of digital media data based on word frequency statistics according to the digital media filtering data;
and acquiring digital media standard data based on data standardization and format conversion according to the digital media data entity identification information, the digital media data word frequency information and the digital media filtering data.
Preferably, the obtaining the digital media classification data based on the data classification according to the standard data keyword information specifically includes:
acquiring digital media filtering data according to the digital media standard data;
Obtaining digital media word information according to the digital media filtering data;
According to the digital media word information, acquiring a digital media word case index:
In the method, in the process of the invention, (T) a digital media word case index for word t,For the number of occurrences of the word t,For the capitalization number of the word t,For the number of lowercase of the word t,Maximum value of the number of uppercase and lowercase of word t;
Acquiring a digital media word position index according to the digital media word information:
In the method, in the process of the invention, The digital media word position index that is the word t,To center the position of all sentences containing word t in the document,For the position of the ith sentence containing the word t in the document, n is the total number of all sentences containing the word t;
according to the digital media word information, acquiring word frequency indexes of the digital media words:
In the method, in the process of the invention, The word frequency index of the digital media word being word t,For the frequency of occurrence of the word t,For the word frequency mean of all words,Standard deviation of word frequency for all words;
based on the sliding window, sliding traversal is performed on the digital media filtering data, and digital media word co-occurrence indexes are obtained:
In the method, in the process of the invention, The digital media word co-occurrence index for word t,For the number of different words that appear under a window of size w,Representing all of the words in the document,For the size of the window to be a window size,Representing the co-occurrence times of the word t and the word k;
according to the digital media word information, acquiring a digital media word sentence frequency index:
In the method, in the process of the invention, The word frequency index of the digital media word for word t,The number of sentences containing the word t;
Acquiring a digital media word key index according to the digital media word case index, the digital media word position index, the digital media word frequency index, the digital media word co-occurrence index and the digital media word sentence frequency index:
In the method, in the process of the invention, The digital media word key index that is the word t,Weights for the digital media word case index,Is the weight of the word frequency index of the digital media word,Is the weighting of the digital media word co-occurrence index,Is the weight of the digital media word frequency index,Weights for digital media word position index;
acquiring a digital media word key index threshold based on actual analysis requirements;
judging whether the word is a keyword according to the digital media word key index and the digital media word key index threshold, and if the digital media word key index is higher than the digital media word key index threshold, obtaining standard data keyword information by using the word as a digital media standard data keyword;
classifying the digital media standard data based on the digital media standard data keywords according to the standard data keyword information to obtain digital media classification data;
If the digital media standard data keywords are not unique, the digital media standard data can be classified into multiple categories.
Preferably, the acquiring the characteristic information of the digital media data based on the data mining according to the digital media classification database specifically includes:
acquiring topic information of digital media data according to the digital media classification database;
Acquiring digital media classification data of the subject according to the subject information of the digital media data;
acquiring digital media user keyword information based on keyword extraction according to the digital media classification data;
Acquiring digital media user demand information and behavior habit information based on expert evaluation according to the digital media user keyword information;
Acquiring attitude tendency information of a digital media user based on emotion analysis according to the digital media classification data;
And acquiring the characteristic information of the digital media data according to the digital media user demand information, the behavior habit information and the digital media user attitude tendency information.
Preferably, the obtaining the digital media data public opinion correlation index according to the digital media data public opinion correlation information specifically includes:
acquiring digital media public opinion related data according to the digital media data public opinion related information;
According to the digital media public opinion related data, affective factor index information, influence factor index information and timeliness information are obtained;
acquiring digital media data public opinion correlation indexes according to emotion factor index information, influence factor index information and timeliness information;
The digital media data public opinion correlation index calculation formula is as follows:
In the method, in the process of the invention, Is a digital media data public opinion associated index,As a weight for the digital media trend data,In the case of digital media trend data,As a weight for the neutral data of the digital media,In the case of digital media neutral data,In the case of digital media standard data,Is the emotion index coefficient of the digital media trend data,As an influence coefficient of digital media trend data,Is the acquisition time of the digital media trend data.
Furthermore, an information analysis management system of a digitalized new medium based on big data is provided, which is used for implementing the information analysis management method, including:
the main control module is used for acquiring digital media data public opinion correlation indexes according to digital media data public opinion comment information, acquiring digital media data public opinion comment indexes according to digital media data public opinion comment information, acquiring digital media data public opinion correlation index threshold values and digital media data public opinion comment threshold values according to digital new media actual requirements, acquiring public opinion correlation index curve information based on visual processing according to the public opinion correlation index data, outputting and displaying a public opinion correlation index curve according to the public opinion correlation index curve information, and acquiring strategy adjustment scheme information based on public opinion analysis according to the digital media data public opinion comment indexes;
The information acquisition module is used for acquiring digital media data, performing data cleaning, data denoising, data word segmentation and word frequency statistics on the digital media data, and classifying the digital media standard data according to standard data keyword information to form a digital media classification database;
The keyword module is used for acquiring a digital media word keyword index according to the digital media word case index, the digital media word position index, the digital media word frequency index, the digital media word co-occurrence index and the digital media word sentence frequency index, extracting keywords according to the digital media word keyword index, judging whether the digital media standard data is public opinion related data according to the standard data keyword information and the public opinion monitoring keyword information, and transmitting the digital media standard data to the main control module;
the display module is used for outputting and displaying standard data keyword information, public opinion related index curve information and strategy adjustment scheme information.
Optionally, the main control module specifically includes:
The control unit is used for acquiring a digital media data public opinion correlation index threshold and a digital media data public opinion comment threshold according to the actual requirement of the digital new media, acquiring public opinion correlation index curve information based on visual processing according to the public opinion correlation index data, outputting and displaying a public opinion correlation index curve according to the public opinion correlation index curve information, and acquiring strategy adjustment scheme information based on public opinion analysis according to the public opinion comment index of the digital media data;
the information receiving unit is interacted with the information acquisition module and the keyword module, and is used for receiving the digital media classification database, the standard data keyword information and the public opinion monitoring keyword information and transmitting the digital media classification database, the standard data keyword information and the public opinion monitoring keyword information to the computing unit;
The computing unit is used for acquiring digital media data public opinion correlation indexes according to the digital media data public opinion correlation information and acquiring digital media data public opinion comment indexes according to the digital media data public opinion comment information.
Optionally, the information acquisition module specifically includes:
the information acquisition unit is used for acquiring digital media data and carrying out data cleaning, data denoising, data word segmentation and word frequency statistics on the digital media data;
The information storage unit is used for classifying the digital media standard data according to the standard data keyword information, obtaining digital media classified data, and performing distributed storage on the digital media classified data to form a digital media classified database.
Optionally, the keyword module specifically includes:
The keyword extraction unit is used for obtaining a digital media word keyword index according to the digital media word case index, the digital media word position index, the digital media word frequency index, the digital media word co-occurrence index and the digital media word sentence frequency index, and extracting keywords according to the digital media word keyword index;
And the public opinion monitoring keyword recognition unit is used for judging whether the digital media standard data is public opinion related data according to the standard data keyword information and the public opinion monitoring keyword information and transmitting the digital media standard data to the main control module.
Compared with the prior art, the invention has the beneficial effects that:
The invention provides an information analysis management method and system for digital new media based on big data, which eliminates useless information in the data by preprocessing the digital media data, improves the subsequent keyword extraction efficiency and accuracy, knows the demand information, behavior habit and attitude tendency of digital media users by data mining, monitors public opinion information related to targets on social media, news websites, forums and other media by public opinion monitoring keyword information, judges whether the public opinion attention is too high by digital media data public opinion correlation indexes, timely adjusts strategies by digital media data public opinion comment indexes, and improves market feedback receiving degree.
Drawings
Fig. 1 and fig. 2 are flowcharts of an information analysis management method for a digital new media based on big data according to the present invention;
FIG. 3 is a flow chart of the digital media data preprocessing in the present invention;
FIG. 4 is a flow chart of a method for classifying data according to standard data keyword information in the present invention;
FIG. 5 is a flow chart of the invention for obtaining digital media data characteristic information based on data mining according to a digital media classification database;
fig. 6 is a block diagram of an information analysis management system for a new digitalized media based on big data according to the present invention.
Detailed Description
The following description is presented to enable one of ordinary skill in the art to make and use the invention. The preferred embodiments in the following description are by way of example only and other obvious variations will occur to those skilled in the art.
Referring to fig. 1-5, an information analysis management method for a digitalized new medium based on big data according to an embodiment of the present invention includes:
acquiring digital media data, wherein the digital media data comprises text data, picture data and video data;
acquiring digital media standard data based on data preprocessing according to the digital media data, wherein the data preprocessing comprises data cleaning, data denoising, data word segmentation and word frequency statistics;
specifically, the method for preprocessing the digital media data specifically comprises the following steps:
Acquiring character information of the digital media data based on character recognition according to the digital media data;
removing HTML tag data and special character data according to the character information of the digital media data to obtain digital media cleaning data;
acquiring digital media word segmentation data based on JIEBA Chinese word segmentation and NLTK English word segmentation according to the digital media cleaning data;
acquiring digital media filtering data based on deactivated word filtering according to the digital media word segmentation data;
acquiring digital media data entity identification information based on Stanford NER named entity identification according to the digital media filtering data;
Acquiring word frequency information of digital media data based on word frequency statistics according to the digital media filtering data;
and acquiring digital media standard data based on data standardization and format conversion according to the digital media data entity identification information, the digital media data word frequency information and the digital media filtering data.
In the scheme, tag data and special character data in the data are removed through character recognition, useless words are prevented from being generated when the data are segmented later, the digital media data are segmented through JIEBA Chinese segmentation and NLTK English segmentation, stop words are removed, specific entity information such as person names, place names and organization names is recognized through Stanford NER named entity recognition, word frequency of the words is counted, keyword extraction is facilitated later, and keyword extraction efficiency is improved.
Obtaining standard data keyword information according to the digital media standard data;
acquiring digital media classification data based on data classification according to standard data keyword information;
Specifically, the method classifies the digital media standard data through standard data keyword information, and specifically comprises the following steps:
acquiring digital media filtering data according to the digital media standard data;
Obtaining digital media word information according to the digital media filtering data;
According to the digital media word information, acquiring a digital media word case index:
In the method, in the process of the invention, (T) a digital media word case index for word t,For the number of occurrences of the word t,For the capitalization number of the word t,For the number of lowercase of the word t,Maximum value of the number of uppercase and lowercase of word t;
Acquiring a digital media word position index according to the digital media word information:
In the method, in the process of the invention, The digital media word position index that is the word t,To center the position of all sentences containing word t in the document,For the position of the ith sentence containing the word t in the document, n is the total number of all sentences containing the word t;
according to the digital media word information, acquiring word frequency indexes of the digital media words:
In the method, in the process of the invention, The word frequency index of the digital media word being word t,For the frequency of occurrence of the word t,For the word frequency mean of all words,Standard deviation of word frequency for all words;
based on the sliding window, sliding traversal is performed on the digital media filtering data, and digital media word co-occurrence indexes are obtained:
In the method, in the process of the invention, The digital media word co-occurrence index for word t,For the number of different words that appear under a window of size w,Representing all of the words in the document,For the size of the window to be a window size,Representing the co-occurrence times of the word t and the word k;
according to the digital media word information, acquiring a digital media word sentence frequency index:
In the method, in the process of the invention, The word frequency index of the digital media word for word t,The number of sentences containing the word t;
Acquiring a digital media word key index according to the digital media word case index, the digital media word position index, the digital media word frequency index, the digital media word co-occurrence index and the digital media word sentence frequency index:
In the method, in the process of the invention, The digital media word key index that is the word t,Weights for the digital media word case index,Is the weight of the word frequency index of the digital media word,Is the weighting of the digital media word co-occurrence index,Is the weight of the digital media word frequency index,Weights for digital media word position index;
acquiring a digital media word key index threshold based on actual analysis requirements;
judging whether the word is a keyword according to the digital media word key index and the digital media word key index threshold, and if the digital media word key index is higher than the digital media word key index threshold, obtaining standard data keyword information by using the word as a digital media standard data keyword;
classifying the digital media standard data based on the digital media standard data keywords according to the standard data keyword information to obtain digital media classification data;
If the digital media standard data keywords are not unique, the digital media standard data can be classified into multiple categories.
In the scheme, the key index of the digital media word of the word is calculated by analyzing the case of the word, the position in the sentence, the word frequency, the word co-occurrence condition and the sentence frequency of the sentence where the word is located, and whether the word is the key word is judged by the key index of the digital media word of the word, so that the accuracy of extracting the key word is improved, and the error of extracting the wrong key word or the key word is avoided, and the later analysis of the data is caused to have deviation.
Acquiring a digital media classification database based on distributed storage according to the digital media classification data;
Acquiring digital media data characteristic information based on data mining according to a digital media classification database, wherein the digital media data characteristic information comprises digital media user demand information, behavior habit information and attitude tendency information;
specifically, the method further classifies the digital media classification data through data mining, specifically includes:
acquiring topic information of digital media data according to the digital media classification database;
Acquiring digital media classification data of the subject according to the subject information of the digital media data;
acquiring digital media user keyword information based on keyword extraction according to the digital media classification data;
Acquiring digital media user demand information and behavior habit information based on expert evaluation according to the digital media user keyword information;
Acquiring attitude tendency information of a digital media user based on emotion analysis according to the digital media classification data;
And acquiring the characteristic information of the digital media data according to the digital media user demand information, the behavior habit information and the digital media user attitude tendency information.
In the scheme, certain topic information is selected through a digital media classification database, digital media classification data of the topic is obtained according to the topic information of the digital media data, and digital media user demand information, behavior habit information and digital media user attitude tendency information reflected by the data are mined through keyword extraction and emotion analysis of the digital media classification data.
Acquiring the characteristic visualization information of the digital media data based on the visualization processing according to the characteristic information of the digital media data;
obtaining public opinion monitoring keyword information according to the actual development requirements and public opinion monitoring of the digital new media;
Judging whether the digital media standard data is public opinion related data or not according to the standard data keyword information and public opinion monitoring keyword information, if not, recording the digital media standard data, and if so, analyzing the digital media standard data to obtain the digital media data public opinion related information;
Acquiring digital media data public opinion correlation indexes according to the digital media data public opinion correlation information;
Specifically, by analyzing the public opinion related information of the digital media data, calculating the public opinion related index of the digital media data specifically includes:
acquiring digital media public opinion related data according to the digital media data public opinion related information;
According to the digital media public opinion related data, affective factor index information, influence factor index information and timeliness information are obtained;
acquiring digital media data public opinion correlation indexes according to emotion factor index information, influence factor index information and timeliness information;
The digital media data public opinion correlation index calculation formula is as follows:
In the method, in the process of the invention, Is a digital media data public opinion associated index,As a weight for the digital media trend data,In the case of digital media trend data,As a weight for the neutral data of the digital media,In the case of digital media neutral data,In the case of digital media standard data,Is the emotion index coefficient of the digital media trend data,As an influence coefficient of digital media trend data,Is the acquisition time of the digital media trend data.
According to the scheme, the data related to the public opinion of the target item in the digital media data are evaluated, the digital media data public opinion related index is calculated, whether the public opinion attention is too high or not is timely found through the digital media data public opinion related index, and the influence of some outdated data on an evaluation result is avoided through introducing timeliness information, so that the evaluation result is inaccurate.
Acquiring a digital media data public opinion correlation index threshold and a digital media data public opinion comment threshold according to the actual demand of the digital new media;
Judging whether the public opinion attention is too high according to the digital media data public opinion correlation index and the digital media data public opinion correlation index threshold, if not, recording the digital media data public opinion correlation index to obtain public opinion correlation index data;
according to the public opinion related index data, based on visual processing, obtaining public opinion related index curve information;
According to public opinion related index curve information, outputting and displaying a public opinion related index curve, wherein the public opinion related index curve information comprises public opinion related index information, public opinion comment proportion information and public opinion attitude tendency information;
If the digital media data public opinion correlation index exceeds the digital media data public opinion correlation index threshold, outputting and displaying the information of overhigh public opinion attention;
acquiring digital media data public opinion comment information according to the digital media data public opinion related information, wherein the digital media data public opinion comment information comprises positive comment information, negative comment information and neutral comment information;
Acquiring digital media data public opinion comment indexes according to the digital media data public opinion comment information;
judging whether the digital media data public opinion comment index exceeds the digital media data public opinion comment threshold according to the digital media data public opinion comment index and the digital media data public opinion comment threshold, if not, recording the digital media data public opinion comment index, and if so, acquiring strategy adjustment scheme information based on public opinion analysis according to the digital media data public opinion comment index;
According to the strategy adjustment scheme information, the strategy is adjusted;
the calculation formula of the digital media data public opinion comment index is as follows:
wherein Q is the digital media data public opinion comment index, As an error term,For a scaling factor of negative comments in digital media data public opinion comment information,As a weight for the negative comments,The impact evaluation value of the negative comments is (0-100),For the proportionality coefficient of the neutral comment,To neutral the weight of the comment,To neutralize negative attitude propensity scaling values in the valuation theory,The positive attitude tendency ratio value in the neutral evaluation theory.
According to the scheme, the digital media data public opinion comment index is obtained through the digital media data public opinion comment information, whether the influence degree of public opinion comments is too high is judged according to the digital media data public opinion comment index and the digital media data public opinion comment threshold, and related strategies are timely adjusted through judging results, so that public attention, feedback opinion and attitude tendency are timely mastered, and the strategies are timely adjusted and the crisis is responded.
Referring to fig. 6, further, in combination with the above method for information analysis and management of digital new media based on big data, an information analysis and management system of digital new media based on big data is provided, including:
the main control module is used for acquiring digital media data public opinion correlation indexes according to digital media data public opinion comment information, acquiring digital media data public opinion comment indexes according to digital media data public opinion comment information, acquiring digital media data public opinion correlation index threshold values and digital media data public opinion comment threshold values according to digital new media actual requirements, acquiring public opinion correlation index curve information based on visual processing according to the public opinion correlation index data, outputting and displaying a public opinion correlation index curve according to the public opinion correlation index curve information, and acquiring strategy adjustment scheme information based on public opinion analysis according to the digital media data public opinion comment indexes;
The information acquisition module is used for acquiring digital media data, performing data cleaning, data denoising, data word segmentation and word frequency statistics on the digital media data, and classifying the digital media standard data according to standard data keyword information to form a digital media classification database;
The keyword module is used for acquiring a digital media word keyword index according to the digital media word case index, the digital media word position index, the digital media word frequency index, the digital media word co-occurrence index and the digital media word sentence frequency index, extracting keywords according to the digital media word keyword index, judging whether the digital media standard data is public opinion related data according to the standard data keyword information and the public opinion monitoring keyword information, and transmitting the digital media standard data to the main control module;
the display module is used for outputting and displaying standard data keyword information, public opinion related index curve information and strategy adjustment scheme information.
The main control module specifically comprises:
The control unit is used for acquiring a digital media data public opinion correlation index threshold and a digital media data public opinion comment threshold according to the actual requirement of the digital new media, acquiring public opinion correlation index curve information based on visual processing according to the public opinion correlation index data, outputting and displaying a public opinion correlation index curve according to the public opinion correlation index curve information, and acquiring strategy adjustment scheme information based on public opinion analysis according to the public opinion comment index of the digital media data;
the information receiving unit is interacted with the information acquisition module and the keyword module, and is used for receiving the digital media classification database, the standard data keyword information and the public opinion monitoring keyword information and transmitting the digital media classification database, the standard data keyword information and the public opinion monitoring keyword information to the computing unit;
The computing unit is used for acquiring digital media data public opinion correlation indexes according to the digital media data public opinion correlation information and acquiring digital media data public opinion comment indexes according to the digital media data public opinion comment information.
The information acquisition module specifically comprises:
the information acquisition unit is used for acquiring digital media data and carrying out data cleaning, data denoising, data word segmentation and word frequency statistics on the digital media data;
The information storage unit is used for classifying the digital media standard data according to the standard data keyword information, obtaining digital media classified data, and performing distributed storage on the digital media classified data to form a digital media classified database.
The keyword module specifically comprises:
The keyword extraction unit is used for obtaining a digital media word keyword index according to the digital media word case index, the digital media word position index, the digital media word frequency index, the digital media word co-occurrence index and the digital media word sentence frequency index, and extracting keywords according to the digital media word keyword index;
And the public opinion monitoring keyword recognition unit is used for judging whether the digital media standard data is public opinion related data according to the standard data keyword information and the public opinion monitoring keyword information and transmitting the digital media standard data to the main control module.
In summary, the invention has the advantages that: the tag data and the special character data in the data are removed through character recognition, useless words are avoided when the data are segmented later, the digital media data are segmented through JIEBA Chinese segmentation and NLTK English segmentation, stop words are removed, specific entity information such as personal names, place names and organization names is recognized through Stanford NER naming entity recognition, word frequencies of the words are counted, keyword extraction is facilitated later, follow-up keyword extraction efficiency and accuracy are improved, demand information, behavior habit and attitude tendency of digital media users are known through data mining, public opinion information related to targets on media such as social media, news websites and forums is monitored through public opinion monitoring keyword information, whether public opinion attention is too high is judged through digital media data public opinion indexes, strategies are timely adjusted through digital media data public opinion indexes, and market feedback receiving degree is improved.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made therein without departing from the spirit and scope of the invention, which is defined by the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (2)

1. The information analysis management method of the digital new media based on big data is characterized by comprising the following steps:
acquiring digital media data, wherein the digital media data comprises text data, picture data and video data;
acquiring digital media standard data based on data preprocessing according to the digital media data, wherein the data preprocessing comprises data cleaning, data denoising, data word segmentation and word frequency statistics;
Obtaining standard data keyword information according to the digital media standard data;
acquiring digital media classification data based on data classification according to standard data keyword information;
Acquiring a digital media classification database based on distributed storage according to the digital media classification data;
Acquiring digital media data characteristic information based on data mining according to a digital media classification database, wherein the digital media data characteristic information comprises digital media user demand information, behavior habit information and attitude tendency information;
acquiring the characteristic visualization information of the digital media data based on the visualization processing according to the characteristic information of the digital media data;
obtaining public opinion monitoring keyword information according to the actual development requirements and public opinion monitoring of the digital new media;
Judging whether the digital media standard data is public opinion related data or not according to the standard data keyword information and public opinion monitoring keyword information, if not, recording the digital media standard data, and if so, analyzing the digital media standard data to obtain the digital media data public opinion related information;
Acquiring digital media data public opinion correlation indexes according to the digital media data public opinion correlation information;
Acquiring a digital media data public opinion correlation index threshold and a digital media data public opinion comment threshold according to the actual demand of the digital new media;
Judging whether the public opinion attention is too high according to the digital media data public opinion correlation index and the digital media data public opinion correlation index threshold, if not, recording the digital media data public opinion correlation index to obtain public opinion correlation index data;
according to the public opinion related index data, based on visual processing, obtaining public opinion related index curve information;
According to public opinion related index curve information, outputting and displaying a public opinion related index curve, wherein the public opinion related index curve information comprises public opinion related index information, public opinion comment proportion information and public opinion attitude tendency information;
If the digital media data public opinion correlation index exceeds the digital media data public opinion correlation index threshold, outputting and displaying the information of overhigh public opinion attention;
acquiring digital media data public opinion comment information according to the digital media data public opinion related information, wherein the digital media data public opinion comment information comprises positive comment information, negative comment information and neutral comment information;
Acquiring digital media data public opinion comment indexes according to the digital media data public opinion comment information;
judging whether the digital media data public opinion comment index exceeds the digital media data public opinion comment threshold according to the digital media data public opinion comment index and the digital media data public opinion comment threshold, if not, recording the digital media data public opinion comment index, and if so, acquiring strategy adjustment scheme information based on public opinion analysis according to the digital media data public opinion comment index;
According to the strategy adjustment scheme information, the strategy is adjusted;
the calculation formula of the digital media data public opinion comment index is as follows:
wherein Q is the digital media data public opinion comment index, As an error term,For a scaling factor of negative comments in digital media data public opinion comment information,As a weight for the negative comments,The impact evaluation value of the negative comments is (0-100),For the proportionality coefficient of the neutral comment,To neutral the weight of the comment,To neutralize negative attitude propensity scaling values in the valuation theory,The positive attitude tendency proportion value in the neutral evaluation theory is obtained;
Wherein, according to the digital media data, based on the data preprocessing, obtain the digital media standard data, specifically include:
Acquiring character information of the digital media data based on character recognition according to the digital media data;
removing HTML tag data and special character data according to the character information of the digital media data to obtain digital media cleaning data;
acquiring digital media word segmentation data based on JIEBA Chinese word segmentation and NLTK English word segmentation according to the digital media cleaning data;
acquiring digital media filtering data based on deactivated word filtering according to the digital media word segmentation data;
acquiring digital media data entity identification information based on Stanford NER named entity identification according to the digital media filtering data;
Acquiring word frequency information of digital media data based on word frequency statistics according to the digital media filtering data;
acquiring digital media standard data based on data standardization and format conversion according to digital media data entity identification information, digital media data word frequency information and digital media filtering data;
the obtaining the digital media classification data based on the data classification according to the standard data keyword information specifically comprises the following steps:
acquiring digital media filtering data according to the digital media standard data;
Obtaining digital media word information according to the digital media filtering data;
According to the digital media word information, acquiring a digital media word case index:
In the method, in the process of the invention, (T) a digital media word case index for word t,For the number of occurrences of the word t,For the capitalization number of the word t,For the number of lowercase of the word t,Maximum value of the number of uppercase and lowercase of word t;
Acquiring a digital media word position index according to the digital media word information:
In the method, in the process of the invention, The digital media word position index that is the word t,To center the position of all sentences containing word t in the document,For the position of the ith sentence containing the word t in the document, n is the total number of all sentences containing the word t;
according to the digital media word information, acquiring word frequency indexes of the digital media words:
In the method, in the process of the invention, The word frequency index of the digital media word being word t,For the frequency of occurrence of the word t,For the word frequency mean of all words,Standard deviation of word frequency for all words;
based on the sliding window, sliding traversal is performed on the digital media filtering data, and digital media word co-occurrence indexes are obtained:
In the method, in the process of the invention, The digital media word co-occurrence index for word t,For the number of different words that appear under a window of size w,Representing all of the words in the document,For the size of the window to be a window size,Representing the co-occurrence times of the word t and the word k;
according to the digital media word information, acquiring a digital media word sentence frequency index:
In the method, in the process of the invention, The word frequency index of the digital media word for word t,The number of sentences containing the word t;
Acquiring a digital media word key index according to the digital media word case index, the digital media word position index, the digital media word frequency index, the digital media word co-occurrence index and the digital media word sentence frequency index:
In the method, in the process of the invention, The digital media word key index that is the word t,Weights for the digital media word case index,Is the weight of the word frequency index of the digital media word,Is the weighting of the digital media word co-occurrence index,Is the weight of the digital media word frequency index,Weights for digital media word position index;
acquiring a digital media word key index threshold based on actual analysis requirements;
judging whether the word is a keyword according to the digital media word key index and the digital media word key index threshold, and if the digital media word key index is higher than the digital media word key index threshold, obtaining standard data keyword information by using the word as a digital media standard data keyword;
classifying the digital media standard data based on the digital media standard data keywords according to the standard data keyword information to obtain digital media classification data;
If the digital media standard data keywords are not unique, the digital media standard data can be classified into multiple categories;
the method for acquiring the characteristic information of the digital media data based on the data mining according to the digital media classification database specifically comprises the following steps:
acquiring topic information of digital media data according to the digital media classification database;
Acquiring digital media classification data of the subject according to the subject information of the digital media data;
acquiring digital media user keyword information based on keyword extraction according to the digital media classification data;
Acquiring digital media user demand information and behavior habit information based on expert evaluation according to the digital media user keyword information;
Acquiring attitude tendency information of a digital media user based on emotion analysis according to the digital media classification data;
Acquiring digital media data characteristic information according to digital media user demand information, behavior habit information and digital media user attitude tendency information;
the obtaining the digital media data public opinion correlation index according to the digital media data public opinion correlation information specifically includes:
acquiring digital media public opinion related data according to the digital media data public opinion related information;
According to the digital media public opinion related data, affective factor index information, influence factor index information and timeliness information are obtained;
acquiring digital media data public opinion correlation indexes according to emotion factor index information, influence factor index information and timeliness information;
The digital media data public opinion correlation index calculation formula is as follows:
In the method, in the process of the invention, Is a digital media data public opinion associated index,As a weight for the digital media trend data,In the case of digital media trend data,As a weight for the neutral data of the digital media,In the case of digital media neutral data,In the case of digital media standard data,Is the emotion index coefficient of the digital media trend data,As an influence coefficient of digital media trend data,Is the acquisition time of the digital media trend data.
2. An information analysis management system for a new digitalized medium based on big data, for implementing the information analysis management method as set forth in claim 1, comprising:
the main control module is used for acquiring digital media data public opinion correlation indexes according to digital media data public opinion comment information, acquiring digital media data public opinion comment indexes according to digital media data public opinion comment information, acquiring digital media data public opinion correlation index threshold values and digital media data public opinion comment threshold values according to digital new media actual requirements, acquiring public opinion correlation index curve information based on visual processing according to the public opinion correlation index data, outputting and displaying a public opinion correlation index curve according to the public opinion correlation index curve information, and acquiring strategy adjustment scheme information based on public opinion analysis according to the digital media data public opinion comment indexes;
The information acquisition module is used for acquiring digital media data, performing data cleaning, data denoising, data word segmentation and word frequency statistics on the digital media data, and classifying the digital media standard data according to standard data keyword information to form a digital media classification database;
The keyword module is used for acquiring a digital media word keyword index according to the digital media word case index, the digital media word position index, the digital media word frequency index, the digital media word co-occurrence index and the digital media word sentence frequency index, extracting keywords according to the digital media word keyword index, judging whether the digital media standard data is public opinion related data according to the standard data keyword information and the public opinion monitoring keyword information, and transmitting the digital media standard data to the main control module;
The display module is used for outputting and displaying standard data keyword information, public opinion related index curve information and strategy adjustment scheme information;
wherein, the main control module specifically includes:
The control unit is used for acquiring a digital media data public opinion correlation index threshold and a digital media data public opinion comment threshold according to the actual requirement of the digital new media, acquiring public opinion correlation index curve information based on visual processing according to the public opinion correlation index data, outputting and displaying a public opinion correlation index curve according to the public opinion correlation index curve information, and acquiring strategy adjustment scheme information based on public opinion analysis according to the public opinion comment index of the digital media data;
the information receiving unit is interacted with the information acquisition module and the keyword module, and is used for receiving the digital media classification database, the standard data keyword information and the public opinion monitoring keyword information and transmitting the digital media classification database, the standard data keyword information and the public opinion monitoring keyword information to the computing unit;
The computing unit is used for acquiring digital media data public opinion correlation indexes according to the digital media data public opinion correlation information and acquiring digital media data public opinion comment indexes according to the digital media data public opinion comment information;
The information acquisition module specifically comprises:
the information acquisition unit is used for acquiring digital media data and carrying out data cleaning, data denoising, data word segmentation and word frequency statistics on the digital media data;
The information storage unit is used for classifying the digital media standard data according to the standard data keyword information to obtain digital media classified data, and performing distributed storage on the digital media classified data to form a digital media classified database;
The keyword module specifically comprises:
The keyword extraction unit is used for obtaining a digital media word keyword index according to the digital media word case index, the digital media word position index, the digital media word frequency index, the digital media word co-occurrence index and the digital media word sentence frequency index, and extracting keywords according to the digital media word keyword index;
And the public opinion monitoring keyword recognition unit is used for judging whether the digital media standard data is public opinion related data according to the standard data keyword information and the public opinion monitoring keyword information and transmitting the digital media standard data to the main control module.
CN202410200785.0A 2024-02-23 2024-02-23 Information analysis management method and system for digital new media based on big data Active CN118051631B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410200785.0A CN118051631B (en) 2024-02-23 2024-02-23 Information analysis management method and system for digital new media based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410200785.0A CN118051631B (en) 2024-02-23 2024-02-23 Information analysis management method and system for digital new media based on big data

Publications (2)

Publication Number Publication Date
CN118051631A CN118051631A (en) 2024-05-17
CN118051631B true CN118051631B (en) 2024-09-27

Family

ID=91044392

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410200785.0A Active CN118051631B (en) 2024-02-23 2024-02-23 Information analysis management method and system for digital new media based on big data

Country Status (1)

Country Link
CN (1) CN118051631B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751458A (en) * 2009-12-31 2010-06-23 暨南大学 Network public sentiment monitoring system and method
CN110134849A (en) * 2019-05-20 2019-08-16 瑞森网安(福建)信息科技有限公司 A kind of network public-opinion monitoring method and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112765442A (en) * 2018-06-25 2021-05-07 中译语通科技股份有限公司 Network emotion fluctuation index monitoring and analyzing method and system based on news big data
CN116186422A (en) * 2022-12-12 2023-05-30 浙江大学 Disease-related public opinion analysis system based on social media and artificial intelligence

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751458A (en) * 2009-12-31 2010-06-23 暨南大学 Network public sentiment monitoring system and method
CN110134849A (en) * 2019-05-20 2019-08-16 瑞森网安(福建)信息科技有限公司 A kind of network public-opinion monitoring method and system

Also Published As

Publication number Publication date
CN118051631A (en) 2024-05-17

Similar Documents

Publication Publication Date Title
CN110297988B (en) Hot topic detection method based on weighted LDA and improved Single-Pass clustering algorithm
US10977447B2 (en) Method and device for identifying a user interest, and computer-readable storage medium
CN110162593B (en) Search result processing and similarity model training method and device
Tang et al. Big data in forecasting research: a literature review
US9514216B2 (en) Automatic classification of segmented portions of web pages
CN112581006B (en) Public opinion information screening and enterprise subject risk level monitoring public opinion system and method
CN111310011B (en) Information pushing method and device, electronic equipment and storage medium
US7827133B2 (en) Method and arrangement for SIM algorithm automatic charset detection
US20110066650A1 (en) Query classification using implicit labels
CN111914087B (en) Public opinion analysis method
CN104820629A (en) Intelligent system and method for emergently processing public sentiment emergency
CN111694958A (en) Microblog topic clustering method based on word vector and single-pass fusion
AU2015310494A1 (en) Sentiment rating system and method
TW201839628A (en) Method, system and apparatus for discovering and tracking hot topics from network media data streams
CN107544988B (en) Method and device for acquiring public opinion data
CN103914478A (en) Webpage training method and system and webpage prediction method and system
CN109086355B (en) Hot-spot association relation analysis method and system based on news subject term
Sun et al. Efficient event detection in social media data streams
CN116362811A (en) Automatic advertisement delivery management system based on big data
CN110019763B (en) Text filtering method, system, equipment and computer readable storage medium
CN113641788B (en) Unsupervised long and short film evaluation fine granularity viewpoint mining method
CN118051631B (en) Information analysis management method and system for digital new media based on big data
CN117150116A (en) Method for constructing personalized message recommendation system facing user interests
CN116304128A (en) Multimedia information recommendation system based on big data
CN115510269A (en) Video recommendation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant