Abstract
At LinkedIn, our mission is to connect the world's professionals to make them more productive and successful. Our team, Communities Artificial Intelligence (AI), at LinkedIn helps our members achieve this goal is by providing a platform where communities can form around common interests and shared experiences.
Fostering active communities at LinkedIn can be broken down into the following components:
(1) Discover: Help members find new entities (members, companies, hashtags, and more) to follow that will expose them to communities that share their interests.
(2) Engage: Engage members in the conversations taking place in their communities by recommending content from their areas of interest.
(3) Contribute: Help members effectively engage with the right communities when they create or share content.
These three components form the main pillars of a content-driven ecosystem and our goal is to use AI to successfully close the loop between Discover (via providing relevant follow recommendations), Engage (via delivering engaging content to users from their areas of interest), and Contribute (via suggesting hashtags to content creators to target the right audience).
A diverse set of AI techniques is required to address the challenges that arise in each of these components. These techniques include: Supervised Learning (XGBoost, Logistic Regression, Linear Regression), Wide and Deep Models, Natural Language Processing (e.g., Word Embeddings, ngram matching), and Unsupervised Learning.
In this presentation, we will provide an overview of the AI techniques we use to form active communities on LinkedIn. We will describe two solutions in detail. First, we will describe how we have built our Follow Recommendations product. The goal of the Follow Recommendations product is to recommend entities to a member that the member finds both immediately relevant (i.e., increase the probability the member will follow the recommended entity) as well as engaging in the long run (i.e., the recommended entity produces content that the member finds relevant).
Our analysis of the performance of our follow recommendations models has shown the superiority of nonlinear models compared to their linear counterparts. To manage the explosion of data emanating from terabytes of features generated from (viewer, entity) pairs, we use an innovative 2-D hash join algorithm that was developed at LinkedIn.
We are also moving towards a hybrid scoring architecture. This allows us to score candidates with complex offline models and then re-rank these candidates based on more time-sensitive contextual features online. This generates more relevant and timely recommendations for the members based on their recent activity on different parts of the LinkedIn ecosystem.
Second, we will describe our approach to solve the problem of Hashtag Suggestion and Typeahead. Hashtags are a great tool that allows members to expand the reach of their posts to the right audience (or communities). Our Hashtag Suggestion and Typeahead (HST) product was built to aid members in adding hashtags to their posts. We do not only recommend hashtags that the member is likely to select into their post, but also hashtags that are more likely to get the member the most online feedback.
We call the latter aspect downstream utility (or engagement). However, before realizing this utility, the member has to actually select from the recommended hashtags. Therefore, the HST product is produced by combining two models. The first model maximizes the probability that the member will select the suggested hashtag and the second one optimizes for downstream utility. Based on content consumption behavior on LinkedIn, we have a good understanding of the supply and demand of content tagged with a specific hashtag. This information enables us to shape the inventory as well as traffic in individual hashtag domains, thus providing a better experience to content-starved communities.