Local Buffer as Source of Web Mining Data

Andrzej Siemiński²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4253))

Included in the following conference series:

International Conference on Knowledge-Based and Intelligent Information and Engineering Systems

1235 Accesses

Abstract

The data for Web mining is usually extracted from the WWW server or proxy server log files. The paper examines the advantages and disadvantages of exploiting another source of input data – the browser buffer. The properties of data extracted from different types of sources are compared. The browser buffer contains data about user navigational habits as well as the formal properties and the content of all recently accessed WWW objects. The paper uses the data obtained from this source to examine the statistical properties of different types of texts extracted from HTML pages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 97.00; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Thorough Study on Weblog Files and Its Analysis Tools

Focused crawling for the hidden web

Article 21 May 2015

Web Page Structured Content Detection Using Supervised Machine Learning

References

Ajiferuke, I., Wolfram, D.: Analysis of Web Page Image Tag Distribution. Information Processing and Management 41, 987–1002 (2005)
Article Google Scholar
Cunha, C.A., Bestavros, A., Crovella, M.E.: Characteristics of WWW Client Traces. Boston University Department of Computer Science, Technical Report TR-95-010 (April 1995)
Google Scholar
Gelbukh, A., Sidorov, G.: Zipf and Heaps Laws’ Coefficients Depend on Language. In: Gelbukh, A. (ed.) CICLing 2001. LNCS, vol. 2004, pp. 332–335. Springer, Heidelberg (2001)
Chapter Google Scholar
Lovins, J.B.: Development of a Stemming Algorithm. Mechanical Translation and computation Linguistics 11(1), 23–31 (1968)
Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Google Scholar
Rabinowich, M., Spatschech, O.: Web Caching and Replication. Addison-Wesley, USA (2002)
Google Scholar
Siemiński, A.: The Cacheability of WWW Pages. In: Multimedia and Network Information Systems 2004, Technical University of Wrocław, Poland (2004)
Google Scholar
Sieminski, A.: Changebility of Web Objects. In: ISDA 2005 5th International Conference on Intelligent Systems Desin and Implementation, Wrocław (2005)
Google Scholar
Srivastava, J., Desikan, P., Kumar, V.: Web Mining: Accomplishments & Future Directions. In: National Science Foundation Workshop on Next Generation Data Mining (NGDM 2002) (2002)
Google Scholar
Szafran, K.: SAM 95 - Morphological Analyzer, TR 96-05 (226), Instytut Informatyki Uniwersytetu Warszawskiego (1996)
Google Scholar
Tran, L., Moon, C., Le, D., Thoma, G.: Web Page Downloading and Classification. In: The Fourteenth IEEE Symposium on Computer-Based Medical Systems (July 2001)
Google Scholar
Weiss, D.: A Survey of Freely Available Polish Stemmers and Evaluation of Their Applicability in Information Retrieval. In: 2nd Language and Technology Conference, Poznań, Poland, pp. 216–221 (2005)
Google Scholar
Zipf, G.K.: Human behavior and the principle of least effort. Addison-Wesley, Cambridge (1949)
Google Scholar
http://www.web-caching.com/cacheability.html
Common Log Format: http://www.bacuslabs.com/WsvlCLF.html
Gain Network: http://www.gainpublishing.com/
log data: http://www.ircache.net/Traces/
http://www.theregister.co.uk/2004/10/15/google_desktop_privacy/
Music Machines log data: http://www.cs.washington.edu/ai/adaptive-data/
Reed, D.: Privacy and the Future of Behavioral Marketing, http://www.claria.com/advertise/oas_archive/privacy.html?pub=imedia_module
http://validator.w3.org/
WorldCup98 log data: http://ita.ee.lbl.gov/html/contrib/WorldCup.html

Download references

Author information

Authors and Affiliations

Institute for Applied Informatics, Technical University of Wrocław, Wyb. Wyspińskiego 27, 50-370, Wrocław, Poland
Andrzej Siemiński

Authors

Andrzej Siemiński
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Design, Engineering and Computing, Bournemouth University, UK
Bogdan Gabrys
Centre for SMART Systems, School of Environment and Technology, University of Brighton, BN2 4GJ, Brighton, UK
Robert J. Howlett
School of Electrical and Information Engineering, Knowledge Based Intelligent Engineering Systems Centre, University of South Australia, Mawson Lakes, 5095, SA, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Siemiński, A. (2006). Local Buffer as Source of Web Mining Data. In: Gabrys, B., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2006. Lecture Notes in Computer Science(), vol 4253. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11893011_99

Download citation

DOI: https://doi.org/10.1007/11893011_99
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46542-3
Online ISBN: 978-3-540-46544-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Local Buffer as Source of Web Mining Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Thorough Study on Weblog Files and Its Analysis Tools

Focused crawling for the hidden web

Web Page Structured Content Detection Using Supervised Machine Learning

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Local Buffer as Source of Web Mining Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Thorough Study on Weblog Files and Its Analysis Tools

Focused crawling for the hidden web

Web Page Structured Content Detection Using Supervised Machine Learning

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation