Los Angeles Ruby Conference 2013

Desarrollando un sistema de recomendacioes con Ruby

Ryan Weald  · 

Transcripción

Extracto de la transcripción automática del vídeo realizada por YouTube.

alright so today I'm going to talk about building recommendation systems in Ruby but first you're probably wondering like who is this guy and what does he know about recommendation systems right I'm far too young have a PhD so what gives me the

authority to talk on to you guys about the subject so I'm currently a data scientist at a start-up in San Francisco called share through where a native advertising platform so basically what we do is we take a brand content from throughout the web and

we promote it on other websites the important thing there is that our goal is to make it feel native which means it has to feel like content on that site which means ultimately our ad targeting comes down to content recommendations so a large part of my job

is building content recommendation systems so that we can power the ads we hope to serve so before I get going I have to give you all a warning there's going to be a little bit of math coming up I know it's late in the day so I try to keep it like

as light as possible but obviously recommendation systems is kind of a math heavy subject so we're going to have to look at a little bit of math so my goal with this talk is basically to start off with just describing like what is a recommendation system

right we have to understand what they are and then I'm going to look at collaborative filtering based recommendation systems probably the most common thing probably the word you've heard before then I'm going to move on to content-based recommendation

systems and then finally going to look at hybrid algorithms so basically getting the best of both worlds by combining collaborative filtering and content-based then I'm going to touch on how you evaluate recommendation systems and then finally I'm

gonna give you some resources and point out a couple of existing libraries so that if you want to learn more that I don't have enough time to cover you can go on and find out more information on your own so what this talk is not going to be though is not

going to be everything there is to know by recommendation systems obviously that's a very complex subject there are tons of people doing pasties on it companies employed entire departments my goal is really to give you an overview so you can have a good

foundation and what's going on behind the scenes of these common algorithms so that you're you know enough to look further on your own and so that you don't have to view the whole thing is a big black box and then obviously as a result it's

not going to be bleeding edge machine learning right this is not the right audience for that I like geeking out on that you probably wouldn't care probably find a little bit boring it's also not going to be how to use a specific library I feel like

it's really important when you're doing this kind of stuff to take that step inside the library to understand what the algorithm really does because otherwise you'll come to a point in time when you're using this library it doesn't work

and you have absolutely no idea how to fix it and why it isn't working so let's start off with what is a recommendation system all right so simply put recommendation system is a program that predicts a user's preferences about using information

about other users the user themselves and the items in your system right so these are prevalent throughout the web right pretty much every big companies using them in every domain one of the best examples in the social space is LinkedIn right people you may

know organizations you might be interested in all these things and recommendations Netflix ran a million-dollar bounty for improving their recommendation systems so you see it pretty much every time you log in top 10 movies they think you're most likely

alike helps them drive engagement with our platform keep users watching movies if you guys like Spotify they're also doing recommendations right radio at its core is really just a recommendation product what other songs are you likely to want to listen

to based on the track you previously listen to and finally the most common is Amazon's customers who bought this item also bought which clearly shows I read a lot of machine learning books so you're probably wondering right how do I build one of these

things right what is it really underneath the surface of all these products that we see every day and it turns out there's really two main categories of algorithms that power most of these recommendation systems and those are collaborative filtering based

oftentimes called nearest neighbor or neighborhood based algorithms and content based algorithms which is sometimes just called classification because at the core you're really just doing a classification task so we're going to start out by looking

at collaborative filtering collaborative filtering is a way of filling in user like missing user preferences based on similar users or similar items right so if I haven't rated movie a we can use information about other users to fill in my preference for

movie a or really infer my preference for movie a and within collaborative filtering there are really two types of collaborative filtering algorithms there's so-called memory-based and model-based so memory base uses similarity metrics between users or

items to infer these preferences in this case the data set is usually kept in memory and then we have model based model based tends to be much more complicated and trains a classifier or a collaborative filtering Adam offline and really what you're doing

is you're generating an algorithm that kind of explains the underlying phenomenon that help fill in these blanks I'm actually not going to talk about model based algorithms today because they tend to be much more complicated and unfortunately Ruby

at the moment doesn't really have the tools to do model based algorithms in a 40 minute talk so I'm going to talk about memory based memory-based the most common thing you'll hear is user based collaborative filtering right so user base collaborative

filtering kind of sounds scary when you break it down it's actually really two very simple components we have a user item user item matrix and a similarity function and using these two things what we get back is the top K most similar users to the user

you're interested in providing recommendation for so the first part of that the user item matrix is actually very simple right here's an example using videos so we have five videos and five users what we can see is just the way users have rated these

videos and basically the goal of our collaborative filtering is to fill in the values for those two question marks so now that we have that user item matrix the next thing you're going to need to build on these algorithms is you're going to need a

similarity function so similarity function is what you're going to use to determine which users are like you or like the user you're trying to provide a recommendation for and there are two really common sort of similarity functions you're likely

[ ... ]

Nota: se han omitido las otras 3.487 palabras de la transcripción completa para cumplir con las normas de «uso razonable» de YouTube.