Databeestjes - 91

Jan Westerdiep, Rahiel Kasim and Jaro Camphuijsen

How we won/lost the data mining competition

Score

overview

Strategy
Implementation

LambdaMART
Feature Engineering

Results

strategy

Blind Orion Searching for the Rising Sun by Nicolas Poussin

Learn from the Best

LambdaMART

implementation

Magic


from rankpy.models import LambdaMART
.
.
.
model = LambdaMART(
    metric='nDCG@38', max_leaf_nodes=7, shrinkage=0.1,
    estopping=10, n_jobs=-1, min_samples_leaf=50,
    random_state=42
)

Feature Engineering

Missing Values

Derived Features

$$ \mathtt{count\_window} = \mathtt{srch\_room\_count} \times \max(\mathtt{srch\_booking\_window}) + \mathtt{srch\_booking\_window} $$ [1]

We convert the timestamp, ,
to four additional features: month, week, day and hour:
, , , .

[1] (5th place) X. Liu, B. Xu, Y. Zhang, Q. Yan, L. Pang, Q. Li, H. Sun, and B. Wang. Combination of Diverse Ranking Models for Personalized Expedia Hotel Searches.

Averaged Features

For all searches, group all ratings per hotel and take the average/median/std.

Do this for the $$ \verb|prop_starrating|, \verb|prop_review_score|, \verb|prop_location_score1|, \text{and } \verb|prop_location_score2| $$

(1st place) Kaggle inc. Personalize expedia hotel searches - icdm 2013, 2016. [https://www.kaggle.com/c/expedia-personalized-sort; accessed 11-May-2016].

Results

Scores

Final score:

Databeestjes - 91

overview

strategy

Learn from the Best

LambdaMART

implementation

Magic

Feature Engineering

Missing Values

Missing Values

Derived Features

Averaged Features

Results

Scores

Feature Importances

Thanks for listening!