From RankNet to LambdaRank to LambdaMART: An Overview (2010)

https://www.microsoft.com/en-us/research/publication/from-ranknet-to-lambdarank-to-lambdamart-an-overview/

1 Introduction

2 RankNet

$s_i = f(x_i)$ and $s_j = f(x_j)$

$P_{ij} = P(U_i > U_j) = \frac{1}{1 + e^{-\sigma(s_i - s_j)}}$

$C = - \bar{P_{ij}} \log{P_{ij}} - (1 - \bar{P_{ij}})\log(1 - P_{ij})$

$\frac{\partial C}{\partial s_i} = \sigma(\frac{1}{2}(1 - S_{ij}) -\frac{1}{1 + e^{\sigma(s_i - s_j)}}) = -\frac{\partial C}{\partial s_j}$

$w_k \rightarrow w_k - \eta \frac{\partial C}{\partial w_k}$

2.1 Factoring RankNet: Speeding Up Ranknet Training

where

where

3 Information Retrieval Measures

NDCG

4 LambdaRank

The key observation of LambdaRank is thus that in order to train a model, we don’t need the costs themselves: we only need the gradients (of the costs with respect to the model scores).

5 MART

GBDT(Gradient Boosting Decision Tree)

MART(Multiple Additive Regression Tree)

6 MART for Two Class Classification

7 LambdaMART

LambdaRank Gradient + MART