1 Summary
2 Introduction
News reading, with the development of the Internet, access methods have expanded from subscribing to print media to accessing a large number of online news sources. News aggregation websites, such as Google News and Yahoo News, collect data from different news websites and provide an aggregated view. A serious problem for such news service websites is that the number of articles is huge for users. So the challenge is how to help users find the news that interests them
(Sounds so tempting).
Content-based recommendation is a technical solution to solve the problem of information overload. Based on the user profile of interests and tendencies, the system recommends items of interest or value to the user.
Content-based methods play a central role in recommendation systems because they can recommend information that has not been evaluated before and can adapt to differences in users’ personalities. This technology is used in different fields, such as email, news, search, etc. In the field of news, the goal of content-based recommendation technology is to aggregate news based on users’ interests and create a “newspaper” of their own for each user.
(Think about what a newspaper contains and how it solves users’ problems and makes them trustworthy).
We combine a content-based approach with a previously developed collaborative filtering approach to generate personalized recommendations for news visits. The combined method was evaluated online: a portion of Google News’ online traffic used the hybrid algorithm, and the results showed a significant improvement. Online experiments also revealed some interesting topics, such as recommendations, accidental visits, user satisfaction, etc.
The natural attribute of news reading is that news recommendation is different from content-based recommendations in other fields. When visiting a news site, users are looking for new information that was previously unknown or even surprising.
Since a user’s profile is inferred from past behavior, it is important to know how the user’s news interests have changed and whether it is effective to use past user behavior to predict future behavior.
In order to understand this problem, we implemented a large-scale log analysis based on Google News to measure the stability of users’ news interests. We find that users’ interests change over time and follow trends in news events.
Based on these findings, we implemented a Bayesian model to predict: the interests of a user based on his behavior; and the news trends based on the behavior of a group of users.
In order to recommend news to users, the system takes into account the user’s real interests and news trends.
As a result, users will receive news tailored based on their interests, while not missing important news events, even if these events do not strictly match the user’s special interests.
The contributions of this article are threefold:
(1) Conducted large-scale log analysis on the consistency of users’ news interests; (2) Proposed a novel method to predict user interests based on click behavior that combines users’ real interests and news trends; (3) Proposed A combined personalized news recommendation method based on content recommendation and collaborative filtering was proposed, experiments were conducted on real traffic, and improvements were achieved.
3 Personalization of google news
4 Related work
5 Log analysis of user interests
6 Data
7 Click Distribution
Google News classifies news articles into predefined topic categories, including international, sports, entertainment, etc. In log analysis, we calculate the distribution of clicks in each category for each user.
We divide the past time into 12 months. Therefore, for each user u, we calculate his click distribution D(u,t) in each topic list in each month t, represented by a vector:
Among them, Ni is the number of clicks on articles classified into Ci. Ntotal is the total number of clicks of users in the past time period (
Not the total clicks of time period t)
8 Changes in users’ news interests at different times
9 News Trends
In addition to the click distribution of individual users, we also calculated the public click distribution in different countries and regions. For each country, public interest can be represented by the distribution of all clicks by users in that region in the past time period t.
Future2 shows the click distribution for the US population. To clarify the illustration, only four categories are shown. We can see that the public interest of the American population fluctuates, and the graphs of other countries are also consistent with this phenomenon.elephant. Some categories such as social fluctuate more than categories such as health.
We assume that changes in public interest in a country are affected by key events in that country, and log analysis provides empirical evidence for this assumption.
10 The impact of hot events on individual interests
11 Bayesian model for user interest prediction
Log analysis reveals individual user interests, influenced by local news trends. For example, during the European Cup, Spanish users will click on more sports news. Similar phenomena have been reported in studies on user interest cycles. Based on these findings, we split users’ interests into two parts: users’ real interests and interests influenced by local news. The user’s real interests are generated by the user’s characteristics, such as the user’s gender, age, occupation, etc., and are relatively stable. On the other hand, users are influenced by local news when deciding what to read. This effect is short-term and prone to change over time. The user’s real interest and news influence correspond to long-term interest and short-term interest in [1] respectively. We use a clearer way to predict user interests. More importantly, we model users’ short-term interests from the perspective of news trends by using the public’s click patterns, rather than just users’ individual feedback.
We use a Bayesian framework,
Developed a method to predict a user’s click patterns (click patterns) based on the user and the user’s location Current interests. Predicted interests are used in news recommendations.
The method is as follows:
(1) The system uses the user’s clicks in each period of time in the past to predict the user’s true interests, regardless of current news trends;
(2) The prediction results for each time period are combined to obtain a more accurate user’s real interests;
(3) The system predicts the user’s current interests through the user’s real interests and local news trends.
12 Prediction of users’ real news interests
For each specific time period t in the past, we get the click distribution D(u,t) of a certain user, and the click distribution D(t) of all users in the region, which represents the news trend in the region.
We want to mine the real interests of user u from D(u,t) without being affected by D(t). A user’s true interest in a certain category Ci is modeled as:
That is, the probability that the user clicks on category Ci. Using Bayes’ rule, the above formula calculates to:
in:
13 Prediction of users’ current news interests
14 News Recommendations
15 Online Traffic Experiment
16 Conclusion and follow-up work