Which supervised learning algorithms are available for matching?

https://datascience.stackexchange.com/questions/12327

16-10-2019
|

문제

I'm working on a non-profit where we try to help potential university applicants by matching them with alumni that want to share their experience/wisdom and, at the moment, it is happening manually. So I'll have two tables, one with students and one with alumni (they may have some features in common, but not necessarily all of them)

$\begin{array}{|l|c|c|} \text{Name} & \text{Gender} & \text{Height} \\ \hline \text{Kathy} & F & 165 \\ \hline \text{Tommy} & M & 182 \\ \hline \text{Ruth} & F & 163 \\ \hline ... & ... & ... \\ \end{array}$ $\begin{array}{|l|c|c|} \text{Name} & \text{Gender} & \text{Weight} \\ \hline \text{Miss Lucy} & F & 65 \\ \hline \text{Miss Geraldine} & F & 70 \\ \hline \text{Miss Emily} & F & 60 \\ \hline ... & ... & ... \\ \end{array}$

Currently, we are manually matching the members of table 1 with those in table 2. We will also collect information after the match ("Was it a good match? Please rate it on a scale from 1 to 10"). So it will look something like this: $$ \begin{array}{|l|l|c|} \text{Person #1} & \text{Person #2} & \text{Match?} \\ \hline \text{Ruth} & \text{Miss Lucy} & N \\ \hline \text{Tommy} & \text{Miss Emily} & Y \\ \hline \text{Kathy} & \text{Miss Geraldine} & N \\ \hline \text{Ruth} & \text{Miss Emily} & N \\ \hline ... & ... & ... \\ \end{array}$$

I would like to use a learning algorithm for this process. I know a little bit of machine learning, but I am still very much a novice (so it's also an opportunity for me to learn more about it), but I can't wrap my head around how you would do this kind of supervised learning when you have two sets both of which have multiple features. What sort of matching algorithms are available to do this? (Also, I prefer to work in R)

(By the way, I would be grateful if you could just point me in the right direction and I'll try to read about it and solve it myself. Also, I know how deeply frustrating it is to see questions that have already been answered -- if this is case, please don't hesitate to let me know without answering the question. I have already tried to search for various strings on Google and StackExchange, but mostly find lecture slides on graph theory that don't seem to be what I'm looking for (although it may just be because it's a bit over my head). Many thanks!)

해결책

You can try to frame this problem as a recommender systems situation. Where you have your users (prospective students) and items (alumni) and want to recommend to the users one item.

It's not a perfect fit as you want just one item for each user and you don't have previous match data for each user. However you could investigate this idea a bit further. I'm applying these techniques to the recruitment problem, I'm matching users with job offers and I'm having some success.

Try to read a bit about recommender systems, to start I recommend chapter 9 of mining massive data sets, it's really introductory, but gives a good overview of the most common techniques.

다른 팁

I would have separate the problem into two:

Predicting whether a certain pair will be a good match.
Matching the pairs.

First, lets discuss the prediction problem. I think you should treat matching the pairs as a supervised learning problem and not as a recommendation problem. As João Almeida wrote new student won't have any previous relations with alumni.
Even the alumnus will have very few previous relations. I would have add to each alumni some features based on aggregations (e.g., the number of past relations, the ratio of past good matches).

After that you should build a dataset of the past pairs, using 'Match?' as the concept. It is not clear whether you will be able to learn a good match rule, even if it exists. I guess that your dataset is quite small. If the probability of a match is low, you might have imbalance problem. As AN6U5 commented, height and weight are quite strange features to match students to alumnus. Compute the relations between the features and the concept (e.g. mutual information, Pearson correlation) in order to see if you have useful features.

As for the second question, even if you can predict well if a pair will be a good match, you still have an algorithmic problem of which pair to use. Consider a "super alumni" that will be a good match to any student. You wouldn't like to match it to a "super student" but to a student that it will be hard to match to other alumni. Luckily, there matching algorithms that you can use. Build a graph with the students and alumnus as nodes. Create an edge if you predict a good match and run a matching algorithm upon it.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 datascience.stackexchange