[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN104657444B - Microblogging homepage data auto recommending method - Google Patents

Microblogging homepage data auto recommending method Download PDF

Info

Publication number
CN104657444B
CN104657444B CN201510059763.8A CN201510059763A CN104657444B CN 104657444 B CN104657444 B CN 104657444B CN 201510059763 A CN201510059763 A CN 201510059763A CN 104657444 B CN104657444 B CN 104657444B
Authority
CN
China
Prior art keywords
microblogging
picture
blog article
size
template
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201510059763.8A
Other languages
Chinese (zh)
Other versions
CN104657444A (en
Inventor
尹柳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongsou Cloud Business Network Technology Co ltd
Original Assignee
Beijing Zhongsou Cloud Business Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongsou Cloud Business Network Technology Co Ltd filed Critical Beijing Zhongsou Cloud Business Network Technology Co Ltd
Priority to CN201510059763.8A priority Critical patent/CN104657444B/en
Publication of CN104657444A publication Critical patent/CN104657444A/en
Application granted granted Critical
Publication of CN104657444B publication Critical patent/CN104657444B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The present invention relates to a kind of microblogging homepage data auto recommending methods, and the described method includes (1) to filter out microblogging list from massive micro-blog;(2) microblogging theme line is extracted, the blog article theme line of corresponding length is extracted according to picture size;(3) automatic cutting is carried out to selecting picture immediate with Target Photo size.The present invention recommends newest most hot blog article picture and summary from trend homepage, to meet user demand.Homepage data are filled using the method for programming count screening, freshness, range and the update cycle of data is improved, saves manpower and cost.Through artificial detection, the quality that picture screening is cut reaches 99.9%, the rate of accuracy reached of the recommendation of microblogging summary to more than 98%.

Description

Microblogging homepage data auto recommending method
Technical field
The present invention relates to a kind of recommendation methods, in particular to a kind of microblogging homepage data auto recommending method.
Background technology
Microblogging (Microblog) is a kind of network service emerging in recent years, it is the information based on customer relationship Share, propagate and obtain platform.User can send word by the client of network, mobile phone and various intelligent networkings, And it realizes and shares immediately.Microblogging has fast using the open multi-platform access way of simple and convenient, support, information updating spread speed The features such as, attract global more than one hundred million users in short 5 years, ended the first half of the year in 2011, Chinese microblog users have reached 1.95 hundred million.Microblogging has stronger information propagation capabilities and member organization's ability, this unique advantage than traditional social networks It is made to rapidly become one of current main Social Media, as a kind of very important informed source and route of transmission, more next It plays a key effect in more social events.
The miscellaneous vertical service for integrating content of microblog emits like the mushrooms after rain.The quality of homepage depends on head The quality of page data.One good homepage, can promote the quality entirely serviced, show that directly the vertical content serviced takes entire microblogging To guiding, excitation user interest improve page click ratio, therefore a good homepage is essential.Current homepage data push away Method is recommended, relies primarily on artificial recommendation, the data of newest hottest point are found by manual read, selects or makes by hand and meet The picture and word of homepage design.
The method manually recommended, shortcoming are exactly of high cost, and poor in timeliness, renewal speed is slow, and content category is narrow.Pass through It manually finds newest most dsc data, puts into artificial quantity, the range read and speed, determine the speed and quality of discovery, Therefore for homepage data that are newest, more preferable, shortening the update cycle it is necessary to put into substantial amounts of manpower, this substantially increases cost.
The content of the invention
In view of the deficiencies of the prior art, the present invention proposes a kind of method of automatic recommendation microblogging homepage data.According to microblogging Feature and user demand, analysis statisticaling data recommend the picture and microblogging of different first page size different channel by turns automatically Summary.Save manpower and maintenance cost.
The purpose of the present invention is what is realized using following technical proposals:
A kind of microblogging homepage data auto recommending method, it is improved in that the described method includes
(1) microblogging list is filtered out from massive micro-blog;
(2) microblogging theme line is extracted, the blog article theme line of corresponding length is extracted according to picture size;
(3) automatic cutting is carried out to selecting picture immediate with Target Photo size.
Preferably, the step (1) is included according to configuration template, according to the granularity and outer diameter of data volume, from database It is middle to read each microblog data of the channel with picture, obtain the data set of each channel;According to microblogging issuing time and number is forwarded, It is sorted to data set, takes newest most hot preceding N, obtain the microblogging list TopN of each channel.
Further, every microblogging includes a node storage, when node content includes blog article, picture, blog article issue Between and blog article forward number.
Preferably, the step (2) includes cycling successively from microblogging list, takes out the blog article in node, extracts blog article Theme line.
Preferably, the step (2) includes
(2.1) blog article is pre-processed;
(2.2) sentence is cut, according to the blog article feature of different channel, is sorted to sentence, is chosen the sentence of sequence first, be denoted as s;
(2.3) sentence length is calculated, is denoted as len, len>Wordi then cuts sentence to s;Wordi is the theme the length of i;
(2.4) whether the theme line judged is significant;
(2.5) next node is chosen, repeats step (2.1)-(2.4);
(2.6) terminate.
Further, the step (2.3) includes being intercepted according to the punctuate of punctuation mark, the priority scheduling of punctuation mark Grade be:
(a)“。”
(b)“!”、“”
(c)“;”
(d)“:”
(e)“,”
Ensure the integrality of the symbol occurred in pairs, half of symbol occur, then clip.
Further, whether theme line of the step (2.4) including the judgement be significant, and the method taken is number of words Judge, Chinese and English judges and modal particle judges, it is not intended to which justice then abandons.
Preferably, the step (3) includes the data set obtained according to step (2), takes out the picture in node, is put into certainly Dynamic screening washer, meets the requirements, then carries out automatic cutting according to the size in template, otherwise remove a pictures and continue to screen.
Preferably, the step (3) includes
(3.1) size of picture is calculated, is denoted as size;
(3.2) judge whether the quantity for meeting template picture i has reached the maximum quantity maxNumi, be not reaching to, carry out Step (3.3), reaches, and travels through next template picture, circulation step (3.2);If the maximum quantity of all template pictures is all Meet, then jump to step (3.6);
(3.3) matching degree of the size of size and template picture i is calculated, is denoted as d;
(3.4) judge whether matching degree d meets the requirements;Work as T1<d<T2 then carries out automatic cutting, to meeting template picture i Quantity add 1, jump to step (3.6);Otherwise it is undesirable, repeat step (3.2) and (3.3), until with the institute in template The picture for having species, which all compares, to be finished;It is undesirable, then continue step (3.5), wherein, T1, T2 are threshold value;
(3.5) pictures are removed, step (3.1) is carried out and arrives (3.4).
(3.6) terminate.
Compared with the prior art, beneficial effects of the present invention are:
The present invention recommends newest most hot blog article picture and summary from trend homepage, to meet user demand.Using automatic The method filling homepage data of statistics screening, improve freshness, range and the update cycle of data, save manpower and into This.Through artificial detection, the quality that picture screening is cut reaches 99.9%, the rate of accuracy reached of the recommendation of microblogging summary to 98% with On.It is embodied in the following
1st, several different sizes are designed, to adapt to the inconsistent picture specification of length and width of all kinds;
2nd, flexible configuration data volume particle and outer diameter improve the probability that each channel has a picture and summary is recommended.
3rd, comprehensive a variety of strategy extraction blog article summaries, coordinate picture, recommend homepage automatically.
4th, design picture automatic screening device, compression cut out give prominence to the key points, the high quality picture of image clearly;
Description of the drawings
Fig. 1 is a kind of microblogging homepage data auto recommending method flow chart provided by the invention.
Fig. 2 is wall scroll data manipulation flow chart of the present invention provided by the invention.
Specific embodiment
The specific embodiment of the present invention is described in further detail below in conjunction with the accompanying drawings.(content of the invention is tried one's best more It supplements specifically, technological means, technical solution, flow, reaches open abundant)
The structure chart of the present invention is as shown in Figure 1, mainly divide three big modules.First module, filters out from massive micro-blog Several former obtains newest most hot microblogging list (TopN);Second module is extracted microblogging theme line, is extracted according to picture size (because theme line is to be embedded in picture to show, the size of picture determines theme line to the blog article theme line of corresponding length Length);3rd module, picture automatic screening device select picture immediate with Target Photo size and carry out automatic cutting. Wall scroll data manipulation flow chart is as shown in Figure 2.Implementation steps are as follows:
Configuration template:
The path of zdpCfg--- downloader initialization files
Haarcascades--- picture automatic cutting class initialization files path
IntervalSec--- systems recommend interval time by turns
The time window of DisRptH--- not repeated datas
The index file of urlbak---url
The index file of tweetbak--- blog articles
The outer diameter of DBLoop--- data volumes
The granularity of DBCount--- data volumes
OutPath--- generates the storage path of homepage static page
PicType--- picture categories numbers
(i represents certain class picture number to the width of certain picture of Widthi--- i, since 1, adds up successively, and maximum is Picture categories number, similarly hereinafter)
The height of Heighti--- pictures i
The length of wordi--- themes i
The maximum number of maxNumi--- pictures i
Module one:
Calculate newest most hot microblogging list.According to configuration template, according to the granularity and outer diameter of data volume, from database Microblog data of each channel with picture is read, obtains the data set of each channel.Every microblogging is stored by a node, node Content includes blog article, picture, blog article issuing time, blog article forwarding number etc..According to microblogging issuing time and forwarding number, to data set It is sorted, takes newest most hot preceding N, obtain the microblogging list TopN of each channel.
Module two:
It is cycled successively from microblogging list, takes out the blog article in node, extract the theme line of blog article.Theme is selected according to importance Sentence.It is as follows:
1st, blog article is pre-processed, the particular content of processing is as follows:
(1) to some html label transcodings, such as " &lt ";
(2) denoising, such as "@Li Xiaomings ", expression, more spaces;
(3) double byte punctuation mark changes into single byte punctuation mark, fullstop exception;
2nd, sentence is cut, according to the blog article feature of different channel, is sorted to sentence, is chosen the sentence of sequence first, be denoted as s;
3rd, sentence length is calculated, len is denoted as, if len>Wordi cuts sentence to s.It is cut according to the punctuate of punctuation mark It takes, the priority level of punctuation mark is as follows:
(1)“。”
(2)“!”、“”
(3)“;”
(4)“:”
(5)“,”
And ensure the integrality of the symbol occurred in pairs as far as possible, such as " () ", "《》" etc., such as there is half of symbol, then cut It goes
Whether the theme line the 4th, judged is significant, and the method that can be taken such as number of words judges, Chinese and English judges, modal particle is sentenced Break, if meaningless, abandon
5th, next node is chosen, repeats step 1-4
6th, terminate
Module three:
Automatic screening device is designed, the data set obtained in slave module two takes out the picture in node, is put into automatic screening Device if met the requirements, carries out automatic cutting according to the size of picture in template, otherwise removes a pictures and continue to screen. One node screening picture is as follows:
1st, the size of picture is calculated, is denoted as size
2nd, judge whether the quantity for meeting template picture i has reached the maximum quantity maxNumi, if being not reaching to, carry out Step 3, if reaching, next template picture, circulation step 2 are traveled through, if the maximum quantity of all template pictures is full foot, Jump to step 6
3rd, the matching degree of the size of size and template picture i is calculated, is denoted as d
4th, judge whether matching degree d meets the requirements.Work as T1<d<T2 (T1, T2 are threshold value), then carry out automatic cutting, to symbol The quantity of shuttering picture i adds 1, jumps to step 6;Otherwise it is undesirable, repeat step 2,3, until with it is all in template The picture of species, which all compares, to be finished;If still undesirable, continue the 5th step.
5th, a pictures are removed, carry out step 1 to 4.
6th, terminate
Finally it should be noted that:The above embodiments are merely illustrative of the technical scheme of the present invention and are not intended to be limiting thereof, institute The those of ordinary skill in category field with reference to above-described embodiment still can to the present invention specific embodiment modify or Equivalent substitution, these are applying for this pending hair without departing from any modification of spirit and scope of the invention or equivalent substitution Within bright claims.

Claims (8)

1. a kind of microblogging homepage data auto recommending method, which is characterized in that the described method includes
(1) microblogging list is filtered out from massive micro-blog;
(2) microblogging theme line is extracted, the blog article theme line of corresponding length is extracted according to picture size;
(3) automatic cutting is carried out to selecting picture immediate with Target Photo size;
The step (2) includes:
(2.1) blog article is pre-processed;
(2.2) sentence is cut, according to the blog article feature of different channel, is sorted to sentence, is chosen the sentence of sequence first, be denoted as s;
(2.3) sentence length is calculated, is denoted as len, len>Wordi then cuts sentence to s, obtains theme line;Wordi is the theme the length of i Degree;
(2.4) judge whether theme line is significant;
(2.5) next node is chosen, repeats step (2.1)-(2.4);
(2.6) terminate.
A kind of 2. microblogging homepage data auto recommending method as described in claim 1, which is characterized in that step (1) bag It includes according to configuration template, according to the granularity and outer diameter of data volume, microblogging number of each channel with picture is read from database According to obtaining the data set of each channel;According to microblogging issuing time and forwarding number, sorted to data set, take it is newest most Preceding N of heat, obtain the microblogging list TopN of each channel.
3. a kind of microblogging homepage data auto recommending method as claimed in claim 2, which is characterized in that every microblogging includes one A node storage, node content include blog article, picture, blog article issuing time and blog article forwarding number.
A kind of 4. microblogging homepage data auto recommending method as described in claim 1, which is characterized in that step (2) bag It includes and is cycled successively from microblogging list, take out the blog article in node, extract the theme line of blog article.
A kind of 5. microblogging homepage data auto recommending method as described in claim 1, which is characterized in that the step (2.3) It is intercepted including the punctuate according to punctuation mark, the priority level of punctuation mark is:
(a)“。”
(b)“!”、“”
(c)“;”
(d)“:”
(e)“,”
Ensure the integrality of the symbol occurred in pairs, half of symbol occur, then clip.
A kind of 6. microblogging homepage data auto recommending method as described in claim 1, which is characterized in that the step (2.4) Judge whether theme line is significant including described, the method taken judges for number of words, Chinese and English judges and modal particle judges, it is not intended to Justice then abandons.
A kind of 7. microblogging homepage data auto recommending method as described in claim 1, which is characterized in that step (3) bag The data set obtained according to step (2) is included, the picture in node is taken out, is put into automatic screening device, meets the requirements, then according to template In size carry out automatic cutting, otherwise remove a pictures and continue to screen.
A kind of 8. microblogging homepage data auto recommending method as described in claim 1, which is characterized in that step (3) bag It includes
(3.1) size of picture is calculated, is denoted as size;
(3.2) judge whether the quantity for meeting template picture i has reached the maximum quantity maxNumi, be not reaching to, carry out step (3.3), reach, travel through next template picture, circulation step (3.2);If the maximum quantity of all template pictures is full foot, Then jump to step (3.6);
(3.3) matching degree of the size of size and template picture i is calculated, is denoted as d;
(3.4) judge whether matching degree d meets the requirements;Work as T1<d<T2 then carries out automatic cutting, to meeting the number of template picture i Amount plus 1, jumps to step (3.6);Otherwise it is undesirable, repeat step (3.2) and (3.3), until with all kinds in template The picture of class, which all compares, to be finished;It is undesirable, then continue step (3.5), wherein, T1, T2 are threshold value;
(3.5) pictures are removed, step (3.1) is carried out and arrives (3.4);
(3.6) terminate.
CN201510059763.8A 2015-02-04 2015-02-04 Microblogging homepage data auto recommending method Expired - Fee Related CN104657444B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510059763.8A CN104657444B (en) 2015-02-04 2015-02-04 Microblogging homepage data auto recommending method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510059763.8A CN104657444B (en) 2015-02-04 2015-02-04 Microblogging homepage data auto recommending method

Publications (2)

Publication Number Publication Date
CN104657444A CN104657444A (en) 2015-05-27
CN104657444B true CN104657444B (en) 2018-05-18

Family

ID=53248572

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510059763.8A Expired - Fee Related CN104657444B (en) 2015-02-04 2015-02-04 Microblogging homepage data auto recommending method

Country Status (1)

Country Link
CN (1) CN104657444B (en)

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5867164A (en) * 1995-09-29 1999-02-02 Apple Computer, Inc. Interactive document summarization
CN101600025B (en) * 2009-07-14 2012-10-10 深圳市五巨科技有限公司 Method and device for assessing news website by mobile terminal
CN102135996A (en) * 2011-03-15 2011-07-27 魏新成 Method for displaying microsite homepage in micro-window after micro search operation
CN102447520A (en) * 2011-11-07 2012-05-09 北京中广睛彩导航科技有限公司 Information acquisition method and system under mobile internet and broadcast mixed channel
CN102662972A (en) * 2012-03-09 2012-09-12 浙江大学 A visually disabled person-oriented automatic picture description method for web content barrier-free access
CN103473282B (en) * 2013-08-29 2016-10-05 北京奇虎科技有限公司 A kind of apparatus and method generating the Hot Contents page
CN103699525B (en) * 2014-01-03 2016-08-31 江苏金智教育信息股份有限公司 A kind of method and apparatus automatically generating summary based on text various dimensions feature

Also Published As

Publication number Publication date
CN104657444A (en) 2015-05-27

Similar Documents

Publication Publication Date Title
CN104657423B (en) Using content share method and its device
CN108197330B (en) Data digging method and device based on social platform
US20130297694A1 (en) Systems and methods for interactive presentation and analysis of social media content collection over social networks
CN104866557B (en) A kind of personalized instant learning theoretical based on constructive learning supports System and method for
CN103559315B (en) Information screening method for pushing and device
CN105281925B (en) The method and apparatus that network service groups of users divides
CN107391509B (en) Label recommending method and device
CN104636371A (en) Information recommendation method and device
CN106357416A (en) Group information recommendation method, device and terminal
CN105868267B (en) A kind of modeling method of mobile social networking user interest
CN103336766A (en) Short text garbage identification and modeling method and device
CN106033415A (en) A text content recommendation method and device
CN101000627A (en) Method and device for issuing correlation information
CN104199872A (en) Information recommendation method and device
CN104503597B (en) stroke input method, device and system
CN103188348A (en) Linkman management method based on file sharing
CN105574030A (en) Information search method and device
CN109033173A (en) It is a kind of for generating the data processing method and device of multidimensional index data
CN105898425A (en) Video recommendation method and system and server
CN106339382A (en) Method and device for pushing business objects
CN103336765B (en) A kind of markov matrix off-line correction method of text key word
CN103294670A (en) Searching method and system based on word list
CN105005555A (en) Chatting time-based keyword extraction method and device
CN104657444B (en) Microblogging homepage data auto recommending method
CN112667869B (en) Data processing method, device, system and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20170427

Address after: 100086 Beijing, Haidian District, North Third Ring Road West, No. 43, building 5, floor 08-09, No. 2

Applicant after: BEIJING ZHONGSOU CLOUD BUSINESS NETWORK TECHNOLOGY Co.,Ltd.

Address before: Shou Heng Technology Building No. 51 Beijing 100191 Haidian District Xueyuan Road room 0902

Applicant before: BEIJING ZHONGSOU NETWORK TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180518

Termination date: 20220204