Disclosure of Invention
The present invention is directed to a computer data processing system based on big data analysis, so as to solve the problems set forth in the background art.
In order to solve the technical problems, the invention provides the following technical scheme: the computer data processing system based on big data analysis comprises a storage space analysis module, a target data selection module and a state information analysis module, wherein the storage space analysis module acquires video downloading data stored by a current computer as analysis data, if the storage space occupied by the analysis data is larger than a storage space threshold value, the target data selection module analyzes the analysis data, and if the time interval between the time when a certain video downloading data is triggered last time and the current time in the analysis data is longer than a time length threshold value, the video downloading data is the target data, and the state information analysis module analyzes the state information of the target data and judges whether the target data is to be deleted.
Further, the status information analysis module comprises a sorting module, a demarcation data selection module, a collection dividing module, a collection analysis module and a deletion control module, wherein the sorting module obtains the downloading initiation time of the video downloading data which is historically downloaded by the computer, sorts the video downloading data in sequence according to the sequence from front to back of the downloading initiation time to obtain the sorting order, the demarcation data selection module selects the demarcation data in the sorting order, if the time interval between the downloading initiation time of two adjacent video downloading data in the sorting order is larger than the interval threshold value, the video downloading data positioned in front in the sorting order is the demarcation data, the set dividing module divides a plurality of downloading sets according to the positions of all demarcation data in the sorting and sorting, wherein video downloading data included in one downloading set is video downloading data between two adjacent demarcation data in the sorting and demarcation data of the two adjacent demarcation data positioned at the back in the sorting and sorting, the set analyzing module sets the downloading set of the target data as a central set, wherein the downloading set positioned at the front of the central set in the sorting and sorting is a reference set, the downloading set positioned at the back of the central set in the sorting and sorting is an influence set, analyzes the central set, the reference set and the influence set, judges the type of the target data, and pushes inquiry information about whether to delete the target data to a user when the target data is first data; and when the target data is the second data, directly deleting the target data.
Further, the set analysis module includes an impact thresholdThe system comprises a comparison module, a preferred data selection module, a reference index calculation module, a first index comparison module and an effective analysis module, wherein the influence threshold comparison module is used for counting the number of influence sets, if the number of the influence sets is smaller than the influence threshold, the target data is first data, otherwise, the preferred data selection module acquires the condition that each video download data is effectively triggered, if the time interval duration between the last time a certain video download data is effectively triggered and the current time is smaller than or equal to a duration threshold, the video download data is preferred, wherein the certain video download data is effectively triggered for the time when a user opens and views the certain video download data, and the reference index calculation module calculates the reference index of the center set
Wherein m is the number of reference sets, C
i For the number of video download data in the ith reference set,/or->
F
i For the number of video download data in the ith reference set as the preferred data, H
i And for the number of video download data in the ith reference set, the first index calculation module calculates a first index P=u/v of the center set, wherein u is the number of preferred data in the center set, v is the number of video download data in the center set, the first index comparison module compares the first index of the center set with the reference index, if the first index of the center set is smaller than the reference index, the target data is the first data, otherwise, the effective analysis module analyzes the effectively triggered condition of the video of the reference set.
Further, the effective analysis module includes a passive data judgment module, a passive index calculation module, and a focus passive index calculation module, where the passive data judgment module effectively triggers a certain preferential data in the reference set in a latest preset time period when the preferential data is effectively triggered after a certain video download data in the influence set is effectively triggeredSelecting data as concerned data, wherein the concerned data effectively triggered at this time is passive data, and the passive index calculation module calculates the passive index of the concerned data by using the continuous number of effectively triggered video download data in the influence set before the concerned data is effectively triggered at the time of being passive data at this time as the influence factor of the passive data at this time
Wherein e is the number of times that the data of interest is passive data in the latest preset time period, N is the number of times that the data of interest is effectively triggered in the latest preset time period, w is the average number of influencing factors when the data of interest is passive data in the latest preset time period, and the attention passive index calculation module calculates attention passive index (I) of the preferred data of the reference set>
Wherein S is the number of the data of interest in the preferred data, tx is the average value of the passive indexes of all the data of interest, R is the number of the preferred data, if the passive index of interest of the preferred data of the reference set is smaller than the passive threshold, the target data is the first data, otherwise, the target data is the second data.
Further, the data processing system adopts a data processing method, and the data processing method comprises the following steps:
acquiring video download data stored by a current computer as analysis data, if the storage space occupied by the analysis data is larger than a storage space threshold value,
if the time interval between the time when a certain video download data is triggered last time and the current time in the analysis data is longer than the time length threshold value, the video download data is target data,
and analyzing the state information of the target data, and judging whether the target data is to be deleted.
Further, the analyzing the status information of the target data includes:
acquiring the downloading initiation time of the video downloading data historically downloaded by the computer, sequentially ordering the video downloading data according to the order of the downloading initiation time from front to back to obtain the classification ordering,
in the sort order, if the time interval between download initiation times of adjacent two video download data is greater than the interval threshold, the one of the two video download data that is located before in the sort order is the demarcation data,
dividing a plurality of downloading sets according to the positions of the demarcation data in the sorting order, wherein the video downloading data included in one downloading set is the video downloading data between two adjacent demarcation data in the sorting order and the demarcation data of the two adjacent demarcation data positioned at the rear in the sorting order,
setting the download set of the target data as the center set, wherein the download set in front of the center set in the sorting order is the reference set, the download set in back of the center set in the sorting order is the influencing set,
analyzing the center set, the reference set and the influence set, judging the type of the target data,
if the target data is the first data, the inquiry information of whether to delete the target data is pushed to the user;
and if the target data is the second data, directly deleting the target data.
Further, the analyzing the center set, the reference set, and the influence set includes:
if the number of influence sets is less than the influence threshold, the target data is first data,
otherwise, acquiring the condition that each video download data is effectively triggered, if the time interval duration between the time when a certain video download data is effectively triggered last time and the current time is less than or equal to a duration threshold value, the certain video download data is the preferred data, wherein the certain video download data is effectively triggered for the time when the user opens and views the certain video download data for the time,
calculating a reference index for a center set
Wherein m is the number of reference sets, C
i For the number of video download data in the ith reference set,/or->
F
i For the number of video download data in the ith reference set as the preferred data, H
i For the number of video download data in the ith reference set,
calculating a first index P=u/v of the center set, wherein u is the number of preferred data in the center set, and v is the number of video download data in the center set;
if the first index of the center set is less than the reference index, then the target data is the first data,
otherwise, analyzing the condition that the video of the reference set is effectively triggered.
Further, the analyzing the effectively triggered condition of the video of the reference set includes:
if a certain preferred data in the reference set is actively triggered a certain time within a recent preset period of time after a certain video download data in the influence set is actively triggered, the preferred data is the data of interest, the data of interest actively triggered the time is the passive data,
if the continuous number of effectively triggered video download data in the influence set before being effectively triggered when a certain concerned data is passive data is the influence factor of the passive data, calculating the passive index of the concerned data
Wherein e is the number of times that the data of interest is passive data in the latest preset time period, N is the number of times that the data of interest is effectively triggered in the latest preset time period, w is the average number of influencing factors when the data of interest is passive data in the latest preset time period,
then the attention passive index of the preference data of the reference set
Wherein S is the number of concerned data in the preferable data, T
x R is the number of preferred data, which is the average of the passive indexes of all the data of interest;
the target data is the first data if the passive index of interest of the preferred data of the reference set is less than the passive threshold, otherwise the target data is the second data.
Further, the last triggered time of the certain video download data includes:
if the video download data is viewed by the user, then the time the video download data was last triggered is the time the video download data was last viewed,
otherwise, the last time the video download data was triggered is the time the video download data was downloaded.
Compared with the prior art, the invention has the following beneficial effects: the invention judges the probability that the subsequent user looks at the video again by analyzing the video which is not watched for a long time, and directly deletes the video under the condition of lower probability, thereby reducing the occupation of idle video data to the storage space of the computer, ensuring the normal operation of the computer and improving the operation efficiency of the computer.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, the present invention provides the following technical solutions: the computer data processing system based on big data analysis comprises a storage space analysis module, a target data selection module and a state information analysis module, wherein the storage space analysis module acquires video downloading data stored by a current computer as analysis data, if the storage space occupied by the analysis data is larger than a storage space threshold value, the target data selection module analyzes the analysis data, and if the time interval between the time when a certain video downloading data is triggered last time and the current time in the analysis data is longer than a time length threshold value, the video downloading data is the target data, and the state information analysis module analyzes the state information of the target data and judges whether the target data is to be deleted.
The state information analysis module comprises a sorting module, a demarcation data selection module, a set partitioning module, a set analysis module and a deletion control module, wherein the sorting module acquires the downloading initiation time of the video downloading data which is historically downloaded by the computer, the video downloading data is sequentially sorted according to the sequence from front to back in the downloading initiation time to obtain sorting, the demarcation data selection module selects the demarcation data in the sorting, if the time interval between the downloading initiation time of two adjacent video downloading data in the sorting is greater than the interval threshold, the two video downloading data are the demarcation data in the front in the sorting, the set partitioning module partitions a plurality of downloading sets according to the position of each demarcation data in the sorting, wherein the video downloading data included in one downloading set are the video downloading data between the two adjacent demarcation data in the sorting, and the demarcation data in the rear of the two adjacent demarcation data in the sorting, the set analysis module sets the downloading set in the sorting is set as a central set, the downloading set in the sorting set is set, the set in the sorting set is the first set is the set of the reference set, and the set is the set of the user-affected set is deleted set when the set is the set-referred to be the user-affected set; and when the target data is the second data, directly deleting the target data.
The collection analysis module comprises an influence threshold comparison module, a preferred data selection module, a reference index calculation module, a first index comparison module and an effective analysis module, wherein the influence threshold comparison module is used for counting the number of influence sets, if the number of the influence sets is smaller than the influence threshold, the target data is first data, otherwise, the preferred data selection module acquires the condition that each video download data is effectively triggered, if the time interval duration between the last time a certain video download data is effectively triggered and the current time is smaller than or equal to the duration threshold, the certain video download data is preferred, wherein the certain video download data is effectively triggered for the time when the user opens the video download data for a certain time, and the reference index calculation module calculates the reference index of the center set
Wherein m is the number of reference sets, C
i For the number of video download data in the ith reference set,/or->
F
i For the number of video download data in the ith reference set as the preferred data, H
i And for the number of video download data in the ith reference set, the first index calculation module calculates a first index P=u/v of the center set, wherein u is the number of preferred data in the center set, v is the number of video download data in the center set, the first index comparison module compares the first index of the center set with the reference index, if the first index of the center set is smaller than the reference index, the target data is the first data, otherwise, the effective analysis module analyzes the effectively triggered condition of the video of the reference set.
The effective analysis module comprises a passive data judgment module and a passive exponent meterThe calculation module and the attention passive index calculation module are used for calculating the passive index of a certain attention data, wherein the effective triggering of the certain preferential data in the reference set in the latest preset time period is the condition that the preferential data is attention data after the effective triggering of the certain video download data in the effect set, the attention data which is effectively triggered is passive data, and the continuous number of the effective triggering of the video download data in the effect set before the effective triggering of the certain attention data is effective data when the certain attention data is passive data is the effect factor of the time of the passive data
Wherein e is the number of times that the data of interest is passive data in the latest preset time period, N is the number of times that the data of interest is effectively triggered in the latest preset time period, w is the average number of influencing factors when the data of interest is passive data in the latest preset time period, and the attention passive index calculation module calculates attention passive index (I) of the preferred data of the reference set>
Wherein S is the number of concerned data in the preferable data, T
x And R is the number of the preferred data, and is the average value of the passive indexes of all the concerned data, if the concerned passive index of the preferred data of the reference set is smaller than the passive threshold value, the target data is the first data, otherwise, the target data is the second data.
The data processing system adopts a data processing method, and the data processing method comprises the following steps:
acquiring video download data stored by a current computer as analysis data, when the storage space occupied by the analysis data is larger than a storage space threshold value,
if the time interval between the time when a certain video download data is triggered last time and the current time in the analysis data is longer than a time length threshold value, the certain video download data is target data, wherein the time when the certain video download data is triggered last time comprises the following steps: if the video download data is watched by the user, the last time the video download data is triggered is the last time the video download data is watched by the user, otherwise, the last time the video download data is triggered is the time the video download data is downloaded; if a certain video download data is not opened for viewing since downloading, or is opened for viewing long before, it is highly likely that the user does not need to use the video download data;
and analyzing the state information of the target data, and judging whether the target data is to be deleted.
The state information of the analysis target data includes:
acquiring the downloading initiation time of the video downloading data historically downloaded by the computer, sequentially ordering the video downloading data according to the order of the downloading initiation time from front to back to obtain the classification ordering,
in the sort order, if the time interval between download initiation times of adjacent two video download data is greater than the interval threshold, the one of the two video download data that is located before in the sort order is the demarcation data, such as the sort order: video 1, video 2, video 3, video 4, the time interval between video 1, video 2 download initiation time is smaller than the interval threshold, the time interval between video 2, video 3 download initiation time is smaller than the interval threshold, the time interval between video 3, video 4 download initiation time is greater than the interval threshold, then video 3 is demarcation data;
dividing a plurality of downloading sets according to the positions of the demarcation data in the sorting order, wherein the video downloading data included in one downloading set is the video downloading data between two adjacent demarcation data in the sorting order and the demarcation data of the two adjacent demarcation data positioned at the rear in the sorting order,
setting a downloading set in which target data is located as a central set, wherein the downloading set in front of the central set in the sorting order is used as a reference set, and the downloading set in back of the central set in the sorting order is used as an influencing set, for example, the sorting order is as follows: video 1, video 2, video 3, video 4, video 5, video 6, video 7, video 8, video 9, the target data is video 6, the demarcation data is video 3, video 6,
video 1, video 2, video 3 are a download set, video 4, video 5, video 6 are a download set, video 7, video 8, video 9 are a download set,
then video 1, video 2, video 3 are reference sets, video 4, video 5, video 6 are center sets, video 7, video 8, video 9 are influence sets,
analyzing the center set, the reference set and the influence set, judging the type of the target data,
if the target data is the first data, the probability that the target data is used by the user in the later period is high, so that inquiry information about whether to delete the target data is pushed to the user;
if the target data is the second data, the probability that the target data is used later by the user is small, and the target data is directly deleted.
The analyzing the center set, the reference set, and the influence set includes:
if the number of influence sets is less than the influence threshold, indicating that there is less video download data to download newly, the user may remember the target data, possibly also using the video download data, then the target data is the first data,
otherwise, acquiring the condition that each video download data is effectively triggered, if the time interval duration between the time when a certain video download data is effectively triggered last time and the current time is less than or equal to a duration threshold value, the certain video download data is the preferred data, wherein the certain video download data is effectively triggered for the time when the user opens and views the certain video download data for the time,
calculating a reference index for a center set
Wherein m is the number of reference sets, C
i For the number of video download data in the ith reference set,/or->
F
i For the number of video download data in the ith reference set as the preferred data, H
i For the number of video download data in the ith reference set,
calculating a first index P=u/v of the center set, wherein u is the number of preferred data in the center set, and v is the number of video download data in the center set;
if the first index of the center set is smaller than the reference index, which means that even if the computer stores new video download data, the user can watch the previously downloaded data, and the first index of the center set is smaller than the reference index, which means that the probability that the user later watches the video download data of the center set is larger, the target data is the first data, the possibility that the user watches the target data is judged according to the watching condition of the user on the previous video download data, and if the user watches the previous video download data frequently, the possibility that the user watches the target data is larger, so that the user needs to be inquired whether the user deletes the target data or not, and the erroneous deletion is prevented; the more data in the reference set, the heavier the impact of the analysis when he is referenced, so by
As a weight, the rationality of the reference index is improved, so that the judgment accuracy is improved; in practice, a threshold may be set according to the reference index, and if the first index of the center set is smaller than the threshold, the target data is the first data;
otherwise, analyzing the effectively triggered condition of the video of the reference set; the analyzing the video of the reference set by the effective triggering condition comprises the following steps:
if a certain preferred data in the reference set is effectively triggered a certain time within the latest preset time period, after a certain video download data in the influence set is effectively triggered, the certain preferred data is concerned data, the concerned data effectively triggered a certain time is passive data, such as video 1, video 2 and video 3 are reference sets, video 7, video 8 and video 9 are influence sets, if a certain time within the latest preset time period is that video 2 is seen after video 7 and video 9 are seen, video 2 is concerned data, and the concerned data is passive data, if no video in the influence set is seen before video 2 is seen within the latest preset time period, then the concerned data is not passive data;
if the continuous number of effectively triggered video download data in the influence set before being effectively triggered when a certain concerned data is passive data is the influence factor of the passive data, calculating the passive index of the concerned data
Wherein e is the number of times that the data of interest is passive data in the latest preset time period, N is the total number of times that the data of interest is effectively triggered in the latest preset time period, w is the average number of influence factors when the data of interest is passive data in the latest preset time period, for example, video 2 is watched 3 times in the latest time period, wherein, if two times are watching video 2 after watching the influence set, e=2 and n=3, when e/N is smaller, the number of times that a certain data of interest is actively watched is relatively more, when e/N is smaller, the active watching performance of a user is stronger, when w is larger, the user is required to watch videos in a plurality of influence sets, and when w is smaller, the user is required to easily think about watching videos in a reference set, therefore, the smaller the passive index is, the stronger the initiative of the user watching videos in the reference set is, and the watching probability is higher;
then the attention passive index of the preference data of the reference set
Wherein S is the number of concerned data in the preferable data, T
x R is the number of preferred data, which is the average of the passive indexes of all the data of interest; />
Smaller indicates that the video download data actively watched is relatively more in the preferred dataI.e. +.>
The smaller the probability that the user actively views the target data later is, the larger the probability that the user actively views the target data later is;
if the attention passive index of the preferred data of the reference set is smaller than the passive threshold, the probability that the user actively views the previous video is higher as the attention passive index is smaller, the target data is the first data, otherwise, the target data is the second data.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.