[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2072221.2072227acmotherconferencesArticle/Chapter ViewAbstractPublication PageshtConference Proceedingsconference-collections
research-article

Using N-grams to identify mathematical topics in MXit lingo

Published: 03 October 2011 Publication History

Abstract

N-grams are used to quantify the similarity between two documents or the similarity between two collections of words. This paper shows how N-grams of length 3 and N-grams of length 4 both coupled with text preprocessing (including stop word removal and stemming according to MXit spelling conventions) can be used to categorize very short mathematical conversations conducted in MXit lingo into broad mathematical groups such as algebra, geometry, trigonometry, and calculus. MXit lingo is an abbreviated form of written English which children, teenagers and young adults utilise when communicating using the popular MXit chat mechanism over cell phones. Conversations from the "Dr Math" project were used for this analysis. "Dr Math" is a mathematics tutoring service which links primary and secondary school pupils to tutors from local universities. The tutors assist the pupils with their mathematics homework.

References

[1]
Schmandt-Besserat, D., "How Writing Came About," University of Texas Press, Texas, 1996.
[2]
Coulmas, F., "The Writing Systems of the World," Blackwell Publishers, Massachusetts, 1991.
[3]
Sampson, G., "Writing Systems: A Linguistic Introduction," Stanford University Press, Stanford, California, 1985.
[4]
Walker, C. B. F., "Cuneiform," Univ of California Press, 1987.
[5]
Myres, J. L., "The Order of the Letters in the Greek Alphabet," Man, Vol. 42, 1942, pp. 110--114.
[6]
Laufer, B., "A THEORY OF THE ORIGIN OF CHINESE WRITING1," American Anthropologist, Vol. 9, No. 3, 1907, pp. 487--492.
[7]
Vosloo, S., "The effects of texting on literacy: Modern scourge or opportunity?" Shuttleworth Foundation, 2009, pp. 2--6.
[8]
Butgereit, L., and Botha, R. A., "A Lucene Stemmer for MXit Lingo," Paper Accepted at ZA WWW 2011, Sept 14--16, Johannesburg.
[9]
Butgereit, L., "Math on MXit: using MXit as a medium for mathematics education," Meraka INNOVATE Conference for Educators, 2007,
[10]
Cavnar, W. B., and Trenkle, J. M., "N-gram-based text categorization," Ann Arbor MI, Vol. 48113, 1994, pp. 4001.
[11]
Khreisat, L., "Arabic text classification using N-gram frequency statistics a comparative study," Proceedings of the 2006 Iinternational Conference on Data Mining, 2006, pp. 78--82.
[12]
Güran. A., Akvokus. S., Bavazit. N. G., "Turkish Text Categorization Using N-Gram Words," Proceedings of the International Symposium on Innovations in Intelligent Systems and Applications, June 29 -- July 1, 2009, Trabzon, Turkey, 2009,
[13]
Kešeli, V., Peng, F., Cercone, N., "N-gram-based author profiles for authorship attribution," Proceedings of the Conference Pacific Association for Computational Linguistics, PACLING '03, Citeseer, 2003,
[14]
Butgereit, L., and Botha, R. A., "Stop Words for "Dr Math"," Proceedings of IST-Africa, 2011, May 11--13, Gabarone, Botswana, 2011,

Cited By

View all
  • (2013)A comparison of different calculations for N-gram similarities in a spelling corrector for mobile instant messaging languageProceedings of the South African Institute for Computer Scientists and Information Technologists Conference10.1145/2513456.2513458(1-7)Online publication date: 7-Oct-2013

Index Terms

  1. Using N-grams to identify mathematical topics in MXit lingo

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    SAICSIT '11: Proceedings of the South African Institute of Computer Scientists and Information Technologists Conference on Knowledge, Innovation and Leadership in a Diverse, Multidisciplinary Environment
    October 2011
    352 pages
    ISBN:9781450308786
    DOI:10.1145/2072221
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • University of Cape Town
    • SAICSIT: So. African Inst. Of Computer Scientists & Info Tecnologists

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 October 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. C3TO
    2. Dr Math
    3. MXit
    4. N-gram

    Qualifiers

    • Research-article

    Conference

    SAICSIT '11
    Sponsor:
    • SAICSIT

    Acceptance Rates

    Overall Acceptance Rate 187 of 439 submissions, 43%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 14 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2013)A comparison of different calculations for N-gram similarities in a spelling corrector for mobile instant messaging languageProceedings of the South African Institute for Computer Scientists and Information Technologists Conference10.1145/2513456.2513458(1-7)Online publication date: 7-Oct-2013

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media