[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/IMIS.2012.142guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Document Classification through Building Specified N-Gram

Published: 04 July 2012 Publication History

Abstract

This paper proposed a method to classify textural documents using specified n-gram data set. Human lives in the world where web documents have a great potential and the amount of valuable information has been consistently growing over the year. There is a problem that finding relevant web documents corresponding to what users want is more difficult due to the huge amount of web size. For this reason, many approaches have been suggested to overcome this obstacle. The most important task is classifying textural documents into predefined categories. Over the years, many statistical approaches were introduced though, no one can find perfect solution yet. In this paper, we suggest a method for textural document classification using n-gram model. The n-gram data frequency has a great potential to find similarities between documents. For this reason, we construct our own n-gram data sets from research papers. If an unknown document comes to the system, the system will extract n-grams from the given unknown documents. After this step, n-grams from unknown document and n-grams in previous data sets will be compared by proposed similarity measurement. The precision rate of this method comes to 86%.
  1. Document Classification through Building Specified N-Gram

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    IMIS '12: Proceedings of the 2012 Sixth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing
    July 2012
    974 pages
    ISBN:9780769546841

    Publisher

    IEEE Computer Society

    United States

    Publication History

    Published: 04 July 2012

    Author Tags

    1. Document Classification
    2. N-gram
    3. NLP
    4. Statistical Language Modeling

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 23 Jan 2025

    Other Metrics

    Citations

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media