A Bayesian learning-to-rank implementation of the Plackett-Luce top-1 listwise model using Infer.NET.
Plackett-Luce is a listwise model: it scores all items in a query jointly rather than comparing pairs or scoring items in isolation. This makes it more label-efficient than pairwise approaches (one observation per query instead of O(n²) pairs) and avoids the imbalance problems of pointwise regression. Crucially, observing only the top-ranked item is a sufficient statistic under the Plackett-Luce distribution, so no full ranking is needed during training — a weak supervision signal that is common in real retrieval logs. The probabilistic formulation also enables a full Bayesian treatment: rather than a point estimate of the weight vector, we obtain a posterior distribution that captures uncertainty and can be updated incrementally as new data arrives.
git clone <repository-url>
cd LearningToRank
dotnet build
# Train
dotnet run --project TrainLtR -- data/train.small.ltr model.json
# Predict
dotnet run --project PredictLtR -- model.json data/predict.ltr predictions.csvPriors:
For each query
The observed winner for each query is the item with the lowest rank label (rank 1 = best in SVM-Light format). Each query contributes a single top-1 observation under the Plackett-Luce model.
Posterior inference over
Item scores are computed as
An
Input — SVM-Light format (feature IDs start at 1):
<rank> qid:<query_id> <feature_id>:<value> ...
1 qid:1 1:0.5 2:1.0 3:0.2
2 qid:1 1:0.3 2:0.8 3:0.9
- Lower rank number = better position (rank 1 = best)
- Items sharing a
qidbelong to the same query; the item with the lowest rank label is the observed winner
Output — CSV with per-item rank probability distributions (rank 0 = best):
QueryIndex,ItemIndex,Rank0,Rank1,...,Rank9
0,0,0.500222,0.499778,0.000000,...
# Training
dotnet run --project TrainLtR -- <train.ltr> <model.json>
# Prediction (output defaults to predictions.csv)
dotnet run --project PredictLtR -- <model.json> <predict.ltr> [output.csv]If the prediction file has more features than the training file, extra features are automatically ignored to match the model dimension.
The data/ folder contains LETOR MQ2008 benchmark datasets:
| File | Description |
|---|---|
train.small.ltr |
Small training set |
train.ltr |
Full training set |
predict.ltr |
Prediction set |
test.small.ltr / test.sorted.ltr |
Test sets |
-
T.-Y. Liu, "Learning to Rank for Information Retrieval," Foundations and Trends in Information Retrieval, vol. 3, no. 3, pp. 225–331, 2009. A comprehensive survey covering pointwise, pairwise, and listwise approaches. https://doi.org/10.1561/1500000016
-
Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li, "Learning to Rank: From Pairwise Approach to Listwise Approach," ICML, 2007. Foundational paper introducing the listwise learning paradigm. https://dl.acm.org/doi/10.1145/1273496.1273513