Deep Dive into Netflix’s Recommender System

栏目: IT技术 · 发布时间: 6年前

内容简介：Netflix is synonymous to most people in this day and age as the go-to streaming service for movies and tv shows. What most people do not know however, is that Netflix started out in the late 1990s with a subscription-based model, posting DVDs to people’s h

Netflix is synonymous to most people in this day and age as the go-to streaming service for movies and tv shows. What most people do not know however, is that Netflix started out in the late 1990s with a subscription-based model, posting DVDs to people’s homes in the US.

The Netflix Prize

In 2000, Netflix introduced personalised movie recommendations and in 2006, launched Netflix Prize, a machine learning and data mining competition with a $1 million dollar prize money. Back then, Netflix used Cinematch , its proprietary recommender system which had a root mean squared error (RMSE) of 0.9525 and challenged people to beat this benchmark by 10%. The team who could achieve target or got close to this target after a year would be awarded the prize money.

The winner of the Progress Prize a year later in 2007 used a linear combination of Matrix Factorisation (a.k.a. SVD) and Restricted Boltzmann Machines (RBM), achieving a RMSE of 0.88. Netflix then put those algorithms into production after some adaptations to the source code. What is worth noting is that despite some teams achieving a RMSE of 0.8567 in 2009, the company did not put those algorithms into production due to the engineering effort required to gain the marginal increase in accuracy. This serves an important point in real-life recommender systems — that there is always a positive relationship between model improvements and engineering efforts.

Streaming — the new way of consumption

A more important reason why Netflix did not incorporate the improved models from the Netflix Prize is because it introduced streaming in 2007. With streaming, the amount of data it has surged dramatically. It has to change the way its recommender system was generating recommendations and ingesting data.

Source: Recent Trends in Personalization — A Netflix Perspective

Fast forward to 2020, Netflix has transformed from a mail service posting DVDs in the US to a global streaming service with 182.8 million subscribers. Consequently, its recommender system transformed from a regression problem predicting ratings to a ranking problem, to a page-generation problem, to a problem maximising user experience (defined as maximising number of hours streamed i.e. personalising everything that can be personalised). The main question that this article aims to address is:

What is Netflix using as its recommender system?

Netflix as a Business

Netflix has a subscription based model. Simply put, the more members (term used by Netflix, synonymous to users/subscribers) Netflix has, the higher its revenue. Revenue can be seen as a function of three things:

Acquisition rate of new users
Cancellation rates
Rate at which former members rejoin

How important is Netflix’s Recommender System?

80% of stream time is achieved through Netflix’s recommender system, which is a highly impressive number. Moreover, Netflix believes in creating a user experience that will seek to improve retention rate, which in turn translates to savings on customer acquisition (estimated $1B per year as of 2016).

Netflix Recommender System

How does Netflix rank titles?

It is quite clear that Netflix utilises a two-tiered row-based ranking system, where ranking happens:

Within each row (strongest recommendations on the left)
Across rows (strongest recommendations on top)

Source: Netflix Tech Blog

Each row highlights a particular theme (e.g. Top 10, Trending, Horror, etc), and is typically generated using one algorithm. Each member’s homepage consists of approximately 40 rows of up to 75 items, depending on the device the member is using.

Why Rows?

The advantages can be seen from two perspectives — 1) As a user, it is more coherent when presented a row of items that are similar, and then decide if he or she is interested in watching something in that category; 2) As a company, it is easier to collect feedback as a right-scroll on a row would indicate interest whilst a scroll-down (ignoring the row) would indicate non-interest (not necessarily irrelevance).

Fun Fact:Did you know that artworks are personalised based on your profile and preferences as well? Find out more here !

What algorithms are used?

Netflix uses a variety of rankers mentioned in its paper, though specifics of each model’s architecture is not specified. Here is a summary of what they are:

Personalised Video Ranking (PVR)— This algorithm is a general-purpose one, which usually filters down the catalog by a certain criteria (e.g. Violent TV Programmes, US TV shows, Romance, etc), combined with side features including user features and popularity.

Example of PVR generated items

Top-N Video Ranker— Similar to PVR except that it only looks at the head of the rankings and looks at the entire catalog. It is optimised using metrics that look at the head of the catalog rankings (e.g. MAP@K, NDCG).

Example of Top-N ranker generated titles

Trending Now Ranker— This algorithm captures temporal trends which Netflix deduces to be strong predictors. These short-term trends can range from a few minutes a a few days. These events/trends are typically:

Events that have a seasonal trend and repeat themselves (e.g. Valentines day leads to an uptick in Romance videos being consumed)
One-off, short term events (e.g. Coronavirus or other disasters, leading to short-term interest in documentaries about them)

Example of Trending Now ranker generated titles

Continue Watching Ranker— This algorithm looks at items that the member has consumed but has not completed, typically:

Episodic content (e.g. drama series)
Non-episodic content that can be consumed in small bites (e.g. movies that are half-completed, series that are episode independent such as Black Mirror)

The algorithm calculates the probability of the member continue watching and includes other context-aware signals (e.g. time elapsed since viewing, point of abandonment, device watched on, etc).

Example of Continue Watching ranker generated titles

In a presentation by Justin Basilico [2], he presented on the use of RNNs in time-sensitive sequence prediction which I believe is used in this algorithm. He devised that Netflix could use a particular member’s past plays alongside the contextual information and use this to predict what the member’s next play might be. In particular, using continuous time together with discrete time context as input performs the best.

Source: Clipped slide from [2]

Video-Video Similarity Ranker—a.k.a. Because you watched (BYW)

This algorithm basically resembles that of a content-based filtering algorithm. Based on an item consumed by the member, the algorithm computes other similar items (using an item-item similarity matrix) and returns the most similar items. Amongst the other algorithms, this one is unpersonalised as no other side features are utilised. However, it is personalised in the sense that it is a conscious choice to display a particular item’s similar items a member’s homepage (more details in Page Generation below).

Example of BYW generated titles

Row Generation Process

Each of the above algorithms go through the row generation process seen in the image below. For example, if PVR is looking at Romance titles, it will find candidates that fit this genre, and at the same time come up with evidence to support the presentation of a row (e.g. previously watched Romance movies that the member has watched). From my understanding, this evidence selection algorithm is incorporated (or used together) in every other ranking algorithm listed above to create a more curated list ranking of items (see below Netflix’s model workflow image).

This evidence selection algorithm uses “all the information [Netflix] shows on the top left of the page, including the predicted star rating that was the focus on the Netflix prize; the synopsis; other facts displayed about the video, such as any awards, cast or other metadata; and the images [Netflix] use to support [their] recommendations in the rows and elsewhere in the UI. [1] ”

Each of the five algorithms go through the same row generation process as seen in the image below.

Source: Netflix Tech Blog [3]

Page Generation

After the algorithms generate candidate rows (already ranked within each row vector), how does Netflix decide which of these 10,000s of rows to display?

Netflix’s Model Workflow

Historically, Netflix has used a template-based approach to tackle this problem of page generation i.e. a massive blood bath of rows competing for precious screen real estate. It is a task focused on not only accuracy, but also providing diversity, accessibility and stability at the same time. Other considerations include hardware capabilities (what device is being used) and which rows/columns are visible at first glance and upon scroll.

This means that Netflix wants to accurately predict what users want to watch in that session, but not forgetting that he/she might want to pick up on videos that were left off halfway. At the same time, it wants to highlight the depth of its catalog by providing something fresh, and perhaps capture trends that are going on in the member’s region. Finally, stability is necessary when members have interacted with Netflix’s for awhile and are used to navigating the page in a certain manner.

With all these requirements, one can see why a template-based approach can work quite well for a start because one can have a few fixed set of criterions to be met at all times. However, having many of such rules in place naturally landed Netflix into a local optimum in terms of providing a good member experience.

How then do we approach this row ranking problem?

Row-based approach

The row-based approach uses existing recommendation or learning-to-rank approaches to score each row and rank them based on those scores. This approach can be relatively fast but lacks diversity. A member might end up seeing a page full of rows that generally matches his/her interest, but row-wise might be very similar. How do we then incorporate diversity?

Stage-wise approach

An improvement to the row-rise approach is to use a stage-wise approach, where each row is scored like the above method. However, rows are selected sequentially from the first, and whenever a row is selected, the next rows are recomputed to take into account its relationship to both the previous rows as well as the previous items already chosen for the page. This is a simple greedy stage-wise approach.

We could improve this by using a k -row lookahead approach, where we consider the next k rows when computing the scoring for each row. However, neither of these approaches would obtain a global optimum.

Machine Learning Approach

The solution and approach that Netflix uses is a Machine Learning one, where they aim to create a scoring function by training a model using historical information of which homepages they have created for their members — including what they actually see, how they interacted with and what they played.

Of course, there are many other features and ways that can represent a particular row in the homepage for the algorithm. It could be as simple as using all the item metadata (as an embedding) and aggregating them, or indexing them by position. Regardless of what features are used to represent the page, the main goal is to generate hypothetical pages and see what items the user would have interacted with. Scoring is then done using page-level metrics, such as Precision@m-by-n and Recall@m-by-n (which are adaptations of the Precision@k and Recall@k but in a two-dimensional space).

Cold-start, Deployment and Big Data

Cold-start Problem

The aged cold-start problem — Netflix has it too. Traditionally, Netflix tries to curb this by obtaining some user preference information by asking new members to fill up a survey to ‘jump start’ the recommendations[6]. If this step is skipped, the recommendation engine will then provide a diverse and popular set of titles.

Also, recently during this Covid-19 period, Netflix Party (a Chrome extension) was created and in my opinion, this has massive effects on curbing this cold-start problem as this data is likely sent back to Netflix to analyse.

Simply put, Netflix used to be single-person activity (at least what can be monitored by Netflix). You could be at home watching a title alone or with a group of friends, but Netflix has no idea of who you are watching it with physically. With Netflix Party , Netflix could potentially create a graph of who you have interacted with, and potentially perform a collaborative-filtering like algorithm to do recommendations to new users as well.

It’s All A/Bout Testing

The gap between offline evaluation and online evaluation remains. Whilst offline metrics help evaluate how well our model is performing on the training data, there is no guarantee that those results will translate to actual improvements in user experience (i.e. total watch time). As such, the Netflix team has in place an incredible and efficient A/B testing process to quickly test these new algorithms that they have built.

Do bear in mind that A/B testing itself is an art, as there are many variables to consider including how to select the control and test group, how to determine if an A/B test is statistically significant (i.e. improve the overall user experience as a whole), choosing a control/test group size, what metrics to use in A/B testing, and many more.

Fundamentally, offline evaluation helps Netflix in determine when to throw models into an A/B test and which models to A/B test. You can read more about Netflix’s A/B testing experimentation process here .

Data, data, data and more data

With online streaming, the data that Netflix manages and have access to is limitless. Managing this amount of data is only possible with the right architecture — that is segregating offline , online and nearline computation.

With offline computation , there are less limitations on the amount of data and the computational complexity of the algorithms since it runs in a batch manner with relaxed timing requirements. However, it can easily grow stale between updates because the most recent data is nor incorporated. For personalized architectures, a key issue is combining both online and offline computation in a seamless manner.

With online computation , we expect response to recent events and user interaction and hence has to be done so in real-time. Therefore, online computation cannot be too complex and computationally costly. Also, a fallback mechanism is necessary such as reverting to a precomputed result.

With nearline computation , we have an intermediate compromise between the two approaches, in that it can perform online-like computations, but do not require them to be served in real-time, allowing it (computing and serving) to be asynchronous. This opens the door for more complex processing to be done per event, such as updating recommendations to reflect that a movie has been watched immediately after a member begins to watch it. This is useful for incremental learning algorithms.

Below shows a detailed architecture diagram of Netflix.

Source: System Architectures for Personalization and Recommendation [5]

For a much in depth view into how these individual components are used, please read the following blog post .

Conclusion

Source: UX Planet: Binging on the Algorithm

That said, it’s time to binge-watch boys! Stay safe amid these tough times of Covid-19. Thank God for Netflix.

References

[1] The Netflix Recommender System

[2] Recent Trends in Personalization: A Netflix Perspective

[3] Learning a Personalized Homepage

[4] It’s All A/Bout Testing: The Netflix Experimentation Platform

[5] Selecting the best artwork for videos through A/B testing

[6] How Netflix’s Recommendation System Works

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

社群营销与运营/互联网+新媒体营销规划丛书

秦阳//秋叶|总主编:秋叶 / 人民邮电出版社 / 2017-5 / 45.00元

《社群营销与运营》共分6章。第1章重点介绍了社群营销的起因、概念、构成、价值和评估模型，引导读者全面认识社群以及社群营销；第2章介绍了如何从无到有、从小到大建设一个社群的手法和注意事项；第3章重点介绍维持社群活跃度的各种技巧；第4章介绍了组织一场社群线下活动五个阶段的执行方案；第5章介绍了如何从无到有、由弱到强地构建社群运营团队；第6章介绍如何正确看待社群商业变现以及社群商业变现的三大模式和四个基......一起来看看《社群营销与运营/互联网+新媒体营销规划丛书》这本书的介绍吧!

码农工具