A Case For Embeddings In Recommendation Problems

栏目: IT技术 · 发布时间: 4年前

内容简介:Once you have worked on different machine learning problems, most things in the field start to feel very similar. You take your raw input data, map it to a different latent space with fewer dimensions, and then perform your classification/regression/cluste

Once you have worked on different machine learning problems, most things in the field start to feel very similar. You take your raw input data, map it to a different latent space with fewer dimensions, and then perform your classification/regression/clustering. Recommender systems, new and old, are no different. In the classic collaborative filtering problem, you factorize your partially filled usage matrix to learn user-factors and item-factors , and try to predict user ratings with a dot-product of the factors.

A Case For Embeddings In Recommendation Problems

This has worked well for many people at different companies and I have also had successes with it firsthand at Flipboard . And of course, people try to incorporate more signals into this model to get better performance for cold-start and other domain specific problems .

However, I didn’t really care about using fancy deep learning techniques for my recommendation problems until a friend asked me a very simple question at a conference a few years ago. If I recall correctly, he questioned my use of a certain regularizer , and I soon realized that his clever suggestion required me to go back to the whiteboard, recompute all the gradients and optimization steps, and essentially reimplement the core algorithm from scratch to test a relatively straightforward modification - I wasn’t writing PyTorch code, I only wished I did.

Enter AutoGrad and Embeddings

So, as it turns out the classic matrix-factorization problem can be formulated as a deep learning problem if you just think of the user-factors and item-factors as embeddings . An embedding is simply a mapping of a discrete valued list to a real valued lower dimensional vector ( cough ). Looking at the problem from this perspective gives you a lot more modelling flexibility thanks to the number of great autograd software out there. If you randomly initialize these embeddings, and define mean-squared-error as your loss, backpropagation would get you embeddings that would be very similar to what you would get with matrix factorization.

A Case For Embeddings In Recommendation Problems

But as Justin Basilico showed in his informative ICML workshop talk , modelling the problem as a deep feed-forward network makes the learning task a lot trickier. Due to having more parameters and hyper-parameters, it requires more compute while only providing questionable improvements for the actual task. So why should we bother thinking of the problem in this way?

I would argue that modelling flexibility and experimentation ease are nothing to be scoffed at. This perspective allows you to incorporate all sorts of data into this framework fairly easily. Recommendation is also more than just predicting user-ratings, and you can solve many other recommendation problems such as sequence-aware recommendations a lot easier. Not to mention, because of autograd software, you end up with much shorter code that allows you to tweak things a lot quicker. I like optimizing my matrix-factorization with conjugate gradient as much as everyone else but please don’t ask me to recompute my CG steps after you add some new data and change your regularizer in the year 2020.

Other Ways To Learn Embeddings

The other great thing about embeddings is that there are several different ways of learning this mapping. If you don’t want to learn embeddings through random initialization and backpropagation from an input matrix, one very common approach is Skip-gram with negative-sampling . This method has been extremely popular in natural language processing, and has also been successful in creating embeddings from non-textual sequences such as graph-nodes , video games and Pinterest pins .

The core idea in skip-gram with negative sampling is to create a dataset with positive examples by sliding a context-window through a sequence and creating pairs of items that co-occur with a central item, and also generating negative data by random sampling items from the entire corpus, and create pairs of items that do not usually co-occur in the same window.

A Case For Embeddings In Recommendation Problems

Once you have the dataset with both postive and negative examples, you simply train a classifier with a deep neural network and learn your embeddings . In this formulation, things that co-occur close to each other would have similar embeddings, which is usually what we need for most search and recommendation tasks.

For recommendation, there are many different ways to create these sequences. Airbnb has a great paper on how they collect sequences of listings based on a user’s sequential clicks of listings during a search/booking session to learn item-embeddings. Alibaba has another interesting way where they maintain an item-item interaction graph, where an edge from an item A to B indicates how often a user clicked on an item B after an item A, and then use random walks in the graph to generate sequences.

So What’s So Cool About These Embeddings?

In addition to the task at hand that each of those representations help solve (such as finding similar items ), they are modular and amenable for transfer learning . One great thing about deep learning has been that you almost never have to start solving a problem from scratch, and all these different embeddings act as great places to start for new problems. If you wanted to build a new classifier (say, a spam detector), you could use your item embeddings as a starting point, and would be able to train a model much quicker with some basic fine-tuning .

These modular mappings to latent spaces have been extremely useful for me, and in addition to solving some recommendation problems, I have also been able to reuse and fine-tune these embeddings and solve many different end-tasks. Storing these embeddings in a centralized model storage further helps teams reduce redundancy and provides them with good foundations to build on for many problems.

While I hadn’t initially bought into the whole deep learning for recommender systems craze, I am starting to see beyond just the minimal performance gains on the original task, and highly recommend everyone to play around with this (still relatively new) paradigm in recommender systems!


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

村落效应

村落效应

[加] 苏珊·平克(Susan Pinker) / 青涂 / 浙江人民出版社 / 2017-3-1 / CNY 69.90

 面对面的接触是作为社会性动物的人类最古老、深刻的需求。在互联网时代,社交媒体已经成为人际沟通的主体,人际关系的维系越来越被社交媒体上的点赞、转发、评论代替,在冰冷的互动中,我们失去了真实与温度。面对面的人际关系与接触能让人感受到如村落生活般的归属感,它是一个人免疫力、复原力和影响力的真正来源。虽然互联网拥有毋庸置疑的优势,但是如果我们渴望快乐、健康、长寿……没错,还有智慧,我们就需要想方设法腾......一起来看看 《村落效应》 这本书的介绍吧!

CSS 压缩/解压工具
CSS 压缩/解压工具

在线压缩/解压 CSS 代码

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

MD5 加密
MD5 加密

MD5 加密工具