Abstract
Tracking and relating news articles from several sources can play against misinformation from deceptive news stories since single source can not judge whether the information is a truth or not. Preventing misinformation in a computer system is an interesting research in intelligence and security informatics. For this task, association rule mining has been recently applied due to its performance and scalability. This paper presents an exploration on how term representation basis, term weighting and association measure affect the quality of relations discovered among news articles from several sources. Twenty four combinations initiated by two term representation bases, four term weightings, and three association measures are explored with their results compared to human judgement. A number of evaluations are conducted to compare each combination’s performance to the others’ with regard to top-k ranks. The experimental results indicate that a combination of bigram (BG), term frequency with inverse document frequency (TFIDF) and confidence (CONF), as well as a combination of BG, TFIDF and conviction (CONV), achieves the best performance to find the related documents by placing them in upper ranks with 0.41% rank-order mismatch on top-50 mined relations. However, a combination of unigram (UG), TFIDF and lift (LIFT) performs the best by locating irrelevant relations in lower ranks (top-1100) with rank-order mismatch of 9.63 %.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Thompson, P., Cybenko, G., Giani, A.: Cognitive Hacking, ch. 19. Book of Economics of Information Security, pp. 255–287. Springer, US (2004)
Ferizis, G., Bailey, P.: Towards practical genre classification of web documents. In: Proc. 15th international conference on World Wide Web, pp. 1013–1014. ACM, New York (2006)
Gamon, M.: Linguistic correlates of style: authorship classification with deep linguistic analysis features. In: Proc. Coling 2004, Geneva, Switzerland, COLING, August 23-27, pp. 611–617 (2004)
Carreira, R., Crato, J.M., Gonçalves, D., Jorge, J.A.: Evaluating adaptive user profiles for news classification. In: Proc. 9th international conference on Intelligent user interfaces, pp. 206–212. ACM, New York (2004)
Antonellis, I., Bouras, C., Poulopoulos, V.: Personalized news categorization through scalable text classification. In: Zhou, X., Li, J., Shen, H.T., Kitsuregawa, M., Zhang, Y. (eds.) APWeb 2006. LNCS, vol. 3841, pp. 391–401. Springer, Heidelberg (2006)
Mengle, S., Goharian, N., Platt, A.: Discovering relationships among categories using misclassification information. In: Proc. 2008 ACM symposium on Applied computing, pp. 932–937. ACM, New York (2008)
Zhang, N., Watanabe, T., Matsuzaki, D., Koga, H.: A novel document analysis method using compressibility vector. In: Proc. the First International Symposium on Data, Privacy, and E-Commerce, November 2007, pp. 38–40 (2007)
Weixin, T., Fuxi, Z.: Text document clustering based on the modifying relations. In: Proc. 2008 International Conf. on Computer Science and Software Engineering, December 2008, vol. 1, pp. 256–259 (2008)
Lin, F., Liang, C.: Storyline-based summarization for news topic retrospection. Decision Support Systems 45(3), 473–490 (2008)
Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y.: Topic detection and tracking pilot study final report. In: Proc. the DARPA Broadcast News Transcription and Understanding Workshop, pp. 194–218 (1998)
Papka, R., Allan, J.: Topic Detection and Tracking: Event Clustering as a Basis for First Story Detection, ch. 4. Book of Advances Information Retrieval: Recent Research from the CIIR, pp. 96–126. Kluwer Academic Publishers, Dordrecht (2006)
Kotsiantis, S., Kanellopoulos, D.: Association rules mining: A recent overview. International Transactions on Computer Science and Engineering 32(1), 71–82 (2006)
Sriphaew, K., Theeramunkong, T.: Quality evaluation for document relation discovery using citation information. IEICE Trans. Inf. Syst. E90-D(8), 1225–1234 (2007)
Kittiphattanabawon, N., Theeramunkong, T.: Relation discovery from thai news articles using association rule mining. In: Chen, H., Yang, C.C., Chau, M., Li, S.-H. (eds.) PAISI 2009. LNCS, vol. 5477, pp. 118–129. Springer, Heidelberg (2009)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proc. the 20th International Conf. on Very Large Data Bases, San Francisco, CA, USA, pp. 487–499. Morgan Kaufmann Publishers Inc., San Francisco (1994)
Zaki, M.J., Hsiao, C.J.: Charm: An efficient algorithm for closed association rule mining. Technical report, Computer Science, Rensselaer Polytechnic Institute (1999)
Zaki, M.J., Hsiao, C.J.: Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans. on Knowl. and Data Eng. 17(4), 462–478 (2005)
Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min. Knowl. Discov. 8(1), 53–87 (2004)
Lallich, S., Teytaud, O., Prudhomme, E.: Association rule interestingness: Measure and statistical validation. In: Quality Measures in Data Mining. Studies in Computational Intelligence, vol. 43, pp. 251–275. Springer, Heidelberg (2007)
Azevedo, P.J., Jorge, A.M.: Comparing rule measures for predictive association rules. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 510–517. Springer, Heidelberg (2007)
David, H.: The Method of Paired Comparisons. Oxford University Press, Oxford (1988)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kittiphattanabawon, N., Theeramunkong, T., Nantajeewarawat, E. (2010). Exploration of Document Relation Quality with Consideration of Term Representation Basis, Term Weighting and Association Measure. In: Chen, H., Chau, M., Li, Sh., Urs, S., Srinivasa, S., Wang, G.A. (eds) Intelligence and Security Informatics. PAISI 2010. Lecture Notes in Computer Science, vol 6122. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13601-6_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-13601-6_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13600-9
Online ISBN: 978-3-642-13601-6
eBook Packages: Computer ScienceComputer Science (R0)