[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2389686.2389692acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Feature selection for link prediction

Published: 02 November 2012 Publication History

Abstract

Networks that model relationships in the real world have attracted much attention in the past few years. Link prediction plays a central role in the network area. Supervised learning is an important class of algorithms used to address the link prediction problem. A big challenge in solving link prediction tasks is deciding how to choose relevant features. As an important machine learning technique to select relevant features, feature selection not only enhances classification accuracy, but also improves the efficiency of the training process. Thus, it is especially relevant for link prediction. However, to the best of our knowledge, feature selection under the link prediction scenario remains unstudied. In this paper, we propose FEature Selection for Link Prediction (FESLP), which contains a feature ranking algorithm and a feature weighting algorithm to address link prediction tasks. We measure the discriminative ability of each individual feature and select those features with greatest discriminative power. Simultaneously, we aim to minimize the correlations among features such that redundancy in the learned feature space is as small as possible. Thus, the feature space can accurately preserve the sketch of the original data. Feature weighting and feature ranking problems can be formalized as two quadratic optimization problems. The active set method is used to solve the feature weighting problem (via real number programming) while a greedy policy is applied to solve the feature ranking problem (via integer programming). In experiments, We evaluate FESLP on six large-scale email network datasets from a university. The results show the effectiveness of the FESLP.

References

[1]
M. E. J. Newman. The structure and function of complex networks. SIAM Review, 45:167--256, 2003.
[2]
D. Liben-Nowell and J. Kleinberg. The link prediction problem for social networks. In CIKM, 2003.
[3]
J. Kunegis and A. Lommatzsch and C. Bauckhage. The slashdot zoo: mining a social network with negative edges. In WWW, 2009.
[4]
D. Goldberg and D. Nichols and B. M. Oki and D. Terry. Using collaborative filtering to weave an information tapestry. Communication ACM, 35(12):61--70, 1992.
[5]
H. Yu and P. Braun et al. High-quality binary protein interaction map of the yeast interactome network. Science, 322(5898):104--110, 2008.
[6]
M. A. Hasan and V. Chaoji and S. Salem and M. Zaki. Link prediction using supervised learning. SDM Workshop, 2006.
[7]
C. Wang and V. Satuluri and S. Parthasarathy. Local probabilistic models for link prediction. ICDM, 2007.
[8]
R. N. Lichtenwalter and J. T. Lussier and N. V. Chawla. New perspectives and methods in link prediction. SIGKDD, 2010.
[9]
S. Gao and L. Denoyer and P. Gallinari. Temporal link prediction by integrating content and structure information. CIKM, 2011.
[10]
J. Hopcroft and T. Lou and J. Tang. Who will follow you back? Reciprocal relationship prediction. CIKM, 2011.
[11]
S. Samarasinghe. Neural Networks for Applied Sciences and Engineering. Auerbach Pubilcations, 2007.
[12]
Y. Song and D. Zhou and J. Huang and I. G. Councill and H. Zha and C. L. Giles. Boosting the feature space: Text classification for unstructured data on the web. In ICDM, 2006.
[13]
X. Geng and T.-Y. Liu and T. Qin and H. Li. Feature selection for ranking. In SIGIR, 2007.
[14]
Y. Xu, F. Shen, W. Ping and J. Zhao. TAKES: A Fast Method to Select Features in the Kernel Space. In CIKM, 2011.
[15]
T. Joachims. Making Large-Scale Support Vector Machine Learning Practical. MIT Press Cambridge, 1999.
[16]
W. Dong, M. Charikar, and K. Li. Asymmetric distance estimation with sketches for similarity search in high-dimensional spaces. In SIGIR, 2008.
[17]
J. Weston, A. Elisseeff, and B. Scholkopf. Use of zero-norm with linear models and kernel methods. JMLR, 3:1439--1461, 2003.
[18]
G. Wang, F. H. Lochovsky, and Q. Yang. Feature selection with conditional mutual information maximin in text categorization. In CIKM, 2004.
[19]
Y. Yang and J. O. Pedersen. A comparative study on feature selection in text categorization. In ICML, 1997.
[20]
Y. Sun. Iterative relief for feature weighting: algorithms, theories, and applications. TPAMI, 29(6):1035--1051, 2007.
[21]
G. Jeh and J. Widom. SimRank: a measure of structural-context similarity. In SIGKDD, 2002.
[22]
G. Jeh and J. Widom. Scaling personalized web search. In WWW, 2003.
[23]
L. Yen and M. Saerens and A. Mantrach and M. Shimbo. A family of dissimilarity measures between nodes generalizing both the shortest-path and the commute-time distances. In SIGKDD, 2008.
[24]
E. A. Leicht and P. Holme and M. E. J. Newman. Vertex similarity in networks. Physical Review E, 73(2), 2006.
[25]
Z. Burda and J. Duda and J. M. Luck and B. Waclaw. Localization of the maximal entropy random walk. Physical Review Letters, 102(16), 2008.
[26]
R.-H. Li and J. X. Yu and J. Liu. Link prediction: the power of maximal entropy random walk. In CIKM, 2011.
[27]
L. Backstrom and J. Leskovec. Supervised random walks: predicting and recommending links in social networks. In the WSDM, 2011.
[28]
P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In CVPR, 2001.
[29]
J. Nocedal and S. J. Wright. Numerical Optimization. Springer Press, 2000.
[30]
G. Wang, F. H. Lochovsky, and Q. Yang. Feature selection with conditional mutual information maximin in text categorization. In CIKM, 2004.
[31]
J. Kleinberg and E. Tardos. Algorithm Design. Addison-Wesley Longman Publishing, 2005.
[32]
L. Yu and H. Liu. Efficient feature selection via analysis of relevance and redundancy. JMLR, 5:1205--1224, 2004.
[33]
L. A. Wolsey. Integer Programming. Wiley Publishing, 1998.
[34]
A. Schrijver. Combinatorial Optimization: Polyhedra and Efficiency. Springer Publishing, 2003.
[35]
G. Cornuejols. Valid inequalities for mixed integer linear programs. Mathematical Programming, 112(1):3--44, 2008.
[36]
L. Wang and M. Sugiyama and Z. H. Zhou and J. Feng. On the margin explanation of boosting algorithms. In COLT, 2008.
[37]
Z. Xu and R. Jin and J. Ye and M. R. Lyu and I. King. Non-monotonic feature selection. In ICML, 2009.
[38]
R. Fletcher. Practical Methods of Optimization. John Wiley Publishing, 1987.
[39]
E. Wong. Active-Set Methods for Quadratic Programming. Ph.D. Thesis, University of California, San Diego, 2011.
[40]
L. Lu and T. Zhou. Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and its Applications, 390(6):1150--1170, 2011.
[41]
C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines. ACM Trans. Intelligent Systems and Technology,2(3):1--27, 2011.
[42]
M. McPherson and L. Smith-Lovin and J. M. Cook. Birds of a feature: homophile in social networks. Annual Review of Sociology, 27:415--444, 2001.
[43]
G. Kossinets and D. J. Watts. Empirical Analysis of an Evolving Social Network. Science, 311:88--90, 2006.

Cited By

View all
  • (2025)Human and Digital Technology RelationsDigital Geographies—Theory, Space, and Communities10.1007/978-981-97-4734-4_3(153-254)Online publication date: 3-Jan-2025
  • (2024)PyMulSim: a method for computing node similarities between multilayer networks via graph isomorphism networksBMC Bioinformatics10.1186/s12859-024-05830-625:1Online publication date: 13-Jun-2024
  • (2022)MorbidGCN: prediction of multimorbidity with a graph convolutional network based on integration of population phenotypes and disease networkBriefings in Bioinformatics10.1093/bib/bbac25523:4Online publication date: 3-Jul-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PIKM '12: Proceedings of the 5th Ph.D. workshop on Information and knowledge
November 2012
108 pages
ISBN:9781450317191
DOI:10.1145/2389686
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. feature selection
  2. link prediction

Qualifiers

  • Research-article

Conference

CIKM'12
Sponsor:

Acceptance Rates

Overall Acceptance Rate 25 of 62 submissions, 40%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)4
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Human and Digital Technology RelationsDigital Geographies—Theory, Space, and Communities10.1007/978-981-97-4734-4_3(153-254)Online publication date: 3-Jan-2025
  • (2024)PyMulSim: a method for computing node similarities between multilayer networks via graph isomorphism networksBMC Bioinformatics10.1186/s12859-024-05830-625:1Online publication date: 13-Jun-2024
  • (2022)MorbidGCN: prediction of multimorbidity with a graph convolutional network based on integration of population phenotypes and disease networkBriefings in Bioinformatics10.1093/bib/bbac25523:4Online publication date: 3-Jul-2022
  • (2020)Predicting the Signs of the Links in a NetworkJournal of Digital Science10.33847/2686-8296.2.2_22:2(14-22)Online publication date: 29-Dec-2020
  • (2020)Features Identification and Classification of Discussion Threads in Coursera MOOC ForumsTransforming Teaching and Learning in Higher Education10.1007/978-981-15-4980-9_10(189-202)Online publication date: 8-Jul-2020
  • (2020)Link-Sign Prediction in Signed Directed Networks from No Link PerspectiveIntegrated Science in Digital Age 202010.1007/978-3-030-49264-9_26(291-300)Online publication date: 27-May-2020
  • (2019)Reducing features to improve link prediction performance in location based social networks, non-monotonically selected subset from feature clustersProceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining10.1145/3341161.3343853(809-815)Online publication date: 27-Aug-2019
  • (2018)Automatic feature selection for supervised learning in link prediction applicationsKnowledge and Information Systems10.1007/s10115-017-1121-656:1(85-121)Online publication date: 1-Jul-2018
  • (2017)A Survey of Link Recommendation for Social NetworksACM Transactions on Management Information Systems10.1145/31317829:1(1-26)Online publication date: 25-Oct-2017
  • (2016)A supervised approach for intra-/inter-community interaction prediction in dynamic social networksSocial Network Analysis and Mining10.1007/s13278-016-0397-y6:1Online publication date: 27-Sep-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media