Secretary Problem

wide

The “secretary problem”1 (aka marriage problem or dowry problem), with its mysterious origin, first came in public on February 1960 issue of Scientific American in Martin Gardner’s beloved column on recreational mathematics. The problem: “Imagine you’re interviewing a set of applicants for a position as a secretary, and your goal is to maximize the chance of hiring the single best applicant in the pool. While you have no idea how to assign scores to individual applicants, you can easily judge which one you prefer. (A mathematician might say you have access only to ordinal numbers–the relative ranks of the applicants compared to each other–but not to the cardinal numbers, their ratings on some kind of general scale.) You interview the applicants in random order, one at a time. You can decide to offer the job to an applicant at any point and they are guaranteed to accept, terminating the search. But if you pass over an applicant, deciding not to hire them, they are gone forever.”2

Solution

Secretary Problem at a glance:

  1. There is one secretarial position available
  2. The number \(n\) of applicants is known.
  3. The applicants are interviewed sequentially in random order, each order being equally likely.
  4. It is assumed that you can rank all the applicants from best to worst without ties. The decision to accept or reject an applicant must be based only on the relative ranks of those applicants interviewed so far.
  5. An applicant once rejected cannot be recalled.
  6. You are very particular and will be satisfied with nothing but the very best. (That is, your payoff is 1 if you choose the best of the \(n\) applicants and 0 otherwise.)

First, we can interpret the pass-and-reject rule where \(r\) is the number of rejects for \(r \ge 1\), interviewer rejects the first \(r - 1\) applicants, and then chooses the next best in the relative ranking. So, the probability, \(\phi_{n}(r)\), of choosing the best applicant for \(r = 1\), and for \(r > 1\) is

$$ \begin{aligned} \phi_{n}(r) &= \sum_{i = r}^{n} P\text{(ith applicant is selected and is the best)} \\ &= \sum_{i = r}^{n} P\text{(ith applicant is selected or ith applicant is the best)} \cdot P\text{(ith applicant is the best)} \\ &= \left [ \sum_{i = 1}^{r - 1} 0 + \sum_{i = 1}^{n} P\left ( \text{the best of the first i - 1 applicants is in the first r - 1 applicants or ith applicant is the best} \right ) \right ] \cdot \frac{1}{n} \\ &= \left [ \sum_{i = r}^{n} \frac{r - 1}{i - 1} \right ] \cdot \frac{1}{n} \\ &= \frac{r - 1}{n} \sum_{i = r}^{n} \frac{1}{i - 1} \\ \end{aligned} $$

The sum does not include when the number of automatically rejected applicants is \(r = 1\), therefore, the only feasible solution to this is \(\phi_{n}(1) = \frac{1}{n}\). This sum is only valid if and only if the best applicant among the first \(i - 1\) applicants is among the first \(r - 1\) applicants that were rejected. Hence, the selected better candidate can be selected from \(i - 1\) where \(i > r\). Put simply, there has to be a better candidate after the rejected ones, else, no candidate can be selected since the best one among all candidates were already rejected and cannot be bested by the remaining non-rejects.

Integrating the sum letting \(n\) tend to infinity, writing \(x\) as the limit of \(\frac{r - 1}{n}\), using \(t\) for \(\frac{i - 1}{n}\) and \(dt\) for \(\frac{1}{n}\), the sum can be approximated by the integral

$$ \phi_{n}(x) = x \int_{x}^{1} \frac{1}{t}\;dt = -x\ln(x) $$

The value of \(x\) that maximizes the equation can be found by setting the derivative of \(\phi_{n}(x)\) with respect to \(x\) equal to zero, \(\frac{d}{dx}\phi_{n}(x) = 0\), then solving for \(x\) we find that \(x = 1/e = 0.367879 \cdots\).

Thus, the optimal cutoff tends to \(n/e\) as \(n\) increases, and the best applicant is selected with probability \(1/e\) or roughly 37%. Hence, the 37% Rule#.

Conclusion

With this algorithm, there is a 37% chance of optimally finding the best one in any given amount of applicant pool virtually consistent. However, there is, in fact, a 67% chance of failure and thus very tempting to steer away from the solution. But, put things into perspective, brute force solution isn’t remotely any better than the 37% Rule#. For instance, reviewing a hundred applicants by random with no optimization algorithm would only result to only 1% success rate, that is when the best applicant is the last one in the pool, and with a million applicants, only a whopping 0.0001% chance of finding the best one, considering the interviewer have infinite amount of time to review all applicants. Using the optimal solution, the chances of finding the best applicant approaches 37% as the applicant pool grows, or put simply, the more applicants to review the more consistent the result is going to be.

Obviously, the secretary problem is impractical and does not apply to real life situations as there are too many variants. For one, applicant recall is always an option regardless if the interviewer further the search, provided the applicant is still available. Also, there is about 50/50 chance of being rejected by the applicants themselves which further complicates the optimal stopping. Furthermore, the interviewer knows nothing about the applicants other than the relative comparison to each other, but by how much? There is also a risk of passing up an exceptional applicants while in the calibration phase or before the optimal cutoff, and not to mention, a better one after hiring. Finally, a question of preparation over capabilities, as one could be so prepared than the ones who are more capable. There are a lot more variants the secretary problem oversimplified, but the fact we quantify this much is a good basis for other variations to come as did some brilliant mathematicians have, such as the odds algorithm3 or the Threshold Rule#, after the seminal 37% rule.

TL;DR

Problem: "Imagine you're interviewing a set of applicants for a position as a secretary, and your goal is to maximize the chance of hiring the single best applicant in the pool. While you have no idea how to assign scores to individual applicants, you can easily judge which one you prefer. (A mathematician might say you have access only to ordinal numbers--the relative ranks of the applicants compared to each other--but not to the cardinal numbers, their ratings on some kind of general scale.) You interview the applicants in random order, one at a time. You can decide to offer the job to an applicant at any point and they are guaranteed to accept, terminating the search. But if you pass over an applicant, deciding not to hire them, they are gone forever."

Answer: As the 37% rule suggest, reject the first 37% of the total amount of applicants, assuming the applicants are reviewed randomly, to find the relative ranking among them, then hire the first best applicant after.

Resources

  1. https://en.m.wikipedia.org/wiki/Secretary_problem “Secretary Problem”

    ↩︎
  2. Algorithms to Live By by Brian Christian and Tom Griffiths - Optimal Stopping

    ↩︎
  3. https://en.wikipedia.org/wiki/Odds_algorithm “Odds Algorithm”

    ↩︎