Learning to extract and summarize hot item features from multiple auction web sites

Tak-Lam Wong¹ &
Wai Lam²

183 Accesses
28 Citations
3 Altmetric
Explore all metrics

Abstract

It is difficult to digest the poorly organized and vast amount of information contained in auction Web sites which are fast changing and highly dynamic. We develop a unified framework which can automatically extract product features and summarize hot item features from multiple auction sites. To deal with the irregularity in the layout format of Web pages and harness the uncertainty involved, we formulate the tasks of product feature extraction and hot item feature summarization as a single graph labeling problem using conditional random fields. One characteristic of this graphical model is that it can model the inter-dependence between neighbouring tokens in a Web page, tokens in different Web pages, as well as various information such as hot item features across different auction sites. We have conducted extensive experiments on several real-world auction Web sites to demonstrate the effectiveness of our framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Modelling human preferences for ranking and collaborative filtering: a probabilistic ordered partition approach

Article 13 May 2015

Learning to Rank and Discover for E-Commerce Search

Unsupervised Qualitative Scoring for Binary Item Features

Article Open access 15 June 2020

References

Agichtein E, Ganti V (2004) Mining reference tables for automatic text segmentation. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD), pp 20–29
Auction Sotware Review (2003) In http://www.auctionsoftwarereview.com/article-ebay-statistics.asp
Aumann Y, Feldman R, Liberzon Y, Rosenfeld B, Schler J (2006). Visual information extraction. Knowl Inform Syst 10(1):1–15
Article Google Scholar
Bunescu R, Mooney R (2004) Collective information extraction with relational markov networkds. In: Proceedings of the 42nd annual meeting of the association for computational linguistics (ACL), pp 439–446
Chang C, Lui SC (2001) IEPAD: information extraction based on pattern discovery. In: Proceedings of the tenth international conference on world wide web (WWW), pp 681–688
Ciravegna F (2001) (LP)² an adaptive algorithm for information extraction from web-related texts. In: Proceedings of the seventeenth international joint conference on artificial intelligence (IJCAI), pp 1251–1256
Collins M (2002) Ranking algorithms for named-entity extraction: boosting and the voted perceptron. In: Proceedings of the annual meeting of the association for computational linguistics (ACL), pp 489–496
Crescenzi V, Mecca G (2004) Automatic information extraction from large websites. J ACM 51(5):731–779
Article MathSciNet Google Scholar
Crescenzi V, Mecca G, Merialdo P (2001) ROADRUNNER: Towards automatic data extraction from large web sites. In: Proceedings of the 27th very large databases conference (VLDB), pp 109–118
Etzioni O, Cafarella M, Kok S, Popescu A, Shaked T, Soderland S, Weld D, Yates A (2005) Unsupservised named-entity extraction from the web: an experimental study. Artif Intell 165(1): 91–134
Article Google Scholar
Feldman R, Rosenfeld B, Fresko M (2006) TEG - a hybrid approach to information extraction. Knowl Inform Syst 9(1):1–18
Article Google Scholar
Freitag D, McCallum A (2000) Information extraction with HMM structures learned by stochastic optimization. In: Proceedings of the seventeenth national conference on artificial intelligence (AAAI), pp 584–589
Ghani R (2005) Price prediction and insurance for online auctions. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD), pp 411–418
Ghani R, Simmons H (2004) Predicting the end-price of online auctions. In: International workshop on data mining and adaptive modelling methods for economics and management
Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD), pp 168–177
Kschischang F, Frey B, Loeliger H (2001) Factor graphs and the sum-product algorithm. IEEE Trans on Inform Theory 47(2):498–519
Article MATH MathSciNet Google Scholar
Kushmerick N (2000) Wrapper induction: efficiency and expressiveness. Artif Intell 118(1–2): 15–68
Article MATH MathSciNet Google Scholar
Kushmerick N, Thomas B (2002) Adaptive information extraction: core technologies for information agents. In: Intelligents information agents R&d in europe: An agentLink perspective, pp 79–103
Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of eighteenth international conference on machine learning (ICML), pp 282–289
Li Z, Ng WK, Sun A (2005) Web data extraction based on structural similarity. Knowl Inform Syst 8(4):438–491
Article Google Scholar
Liu B, Grossman R, Zhai Y (2003) Mining data records in web pages. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD), pp 601–606
Mani I, Maybury M (1999) In advances in automatic text summarization. MIT press, Cambridge
Google Scholar
McCallum A, Jensen D (2003) A note on the unification of information extraction and data mining using conditional-probability, relational models. In: Proceedings of the IJCAI workshop on learning statistical models from relational data
McCallum A, Wellner B (2003) Toward conditional models of identity uncertainty with application to proper noun coreference. In: Proceedings of the IJCAI workshop on information integration on the web
Muslea I, Minton S, and Knoblock C (2001) Hierarchical wrapper induction for semistructured information sources. J Auton Agents Multi-Agent Syst 4(1–2):93–114
Article Google Scholar
Popescu A, Etzioni O (2005) Extracting product features and opinions from reviews. In: Proceedings of the human language technology conference conference on empirical methods in natural language processing, pp 339–346
Wang J, Karypis G (2005) On efficiently summarizing categorical databases. Knowl Inform Syst 9(1):19–37
Article Google Scholar
Wellner B, McCallum A, Peng F, Hay M (2004) An integrated, conditional model of information extraction and coreference with application to citation matching. In: Proceedings of the 20th conference on uncertainty in artificial intelligence (UAI), pp 593–601
Wong TL, Lam W (2004) A probabilistic approach for adapting information extraction wrappers and discovering new attributes. In: Proceedings of the 2004 IEEE international conference on data mining (ICDM), pp 257–264
Wong TL, Lam W, Chan SK (2006) Extracting and summarizing hot items features across different auction web sites. In: The tenth Pacific-Asia conference on knowledge discovery and data mining (PAKDD), pp 334–345
Wong TL, Lam W (2007) Adapting web information extraction knowledge via mining site- invariant and site-dependent features. ACM Trans Internet Technol (in press)
Yi J, Niblack W (2005) Sentiment mining in web fountain. In: Proceedings of the 21st international conference on data engineering (ICDE), pp 1073–1083

Download references

Author information

Authors and Affiliations

Department of Computer Science, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong
Tak-Lam Wong
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, Hong Kong
Wai Lam

Authors

Tak-Lam Wong
View author publications
You can also search for this author in PubMed Google Scholar
Wai Lam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tak-Lam Wong.

Additional information

The work described in this paper is substantially supported by grants from the Research Grant Council of the Hong Kong Special Administrative Region, China (Project Nos: CUHK 4179/03E and CUHK4193/04E) and the Direct Grant of the Faculty of Engineering, CUHK (Project Codes: 2050363 and 2050391). This work is also affiliated with the Microsoft-CUHK Joint Laboratory for Human-centric Computing and Interface Technologies.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wong, TL., Lam, W. Learning to extract and summarize hot item features from multiple auction web sites. Knowl Inf Syst 14, 143–160 (2008). https://doi.org/10.1007/s10115-007-0078-2

Download citation

Received: 12 April 2006
Revised: 14 November 2006
Accepted: 26 January 2007
Published: 31 March 2007
Issue Date: February 2008
DOI: https://doi.org/10.1007/s10115-007-0078-2

Learning to extract and summarize hot item features from multiple auction web sites

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Modelling human preferences for ranking and collaborative filtering: a probabilistic ordered partition approach

Learning to Rank and Discover for E-Commerce Search

Unsupervised Qualitative Scoring for Binary Item Features

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Learning to extract and summarize hot item features from multiple auction web sites

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Modelling human preferences for ranking and collaborative filtering: a probabilistic ordered partition approach

Learning to Rank and Discover for E-Commerce Search

Unsupervised Qualitative Scoring for Binary Item Features

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation