Databeestjes - 91

Jan Westerdiep, Rahiel Kasim and Jaro Camphuijsen

How we won/lost the data mining competition



Score

overview

  • Strategy
  • Implementation
    • LambdaMART
    • Feature Engineering
  • Results

strategy









Blind Orion Searching for the Rising Sun by Nicolas Poussin

Learn from the Best

LambdaMART

implementation

Magic


from rankpy.models import LambdaMART
.
.
.
model = LambdaMART(
    metric='nDCG@38', max_leaf_nodes=7, shrinkage=0.1,
    estopping=10, n_jobs=-1, min_samples_leaf=50,
    random_state=42
)
                  

Feature Engineering

Missing Values

Missing Values

Derived Features

$$ \mathtt{count\_window} = \mathtt{srch\_room\_count} \times \max(\mathtt{srch\_booking\_window}) + \mathtt{srch\_booking\_window} $$ [1]

We convert the timestamp, ,
to four additional features: month, week, day and hour:
, , , .


[1] (5th place) X. Liu, B. Xu, Y. Zhang, Q. Yan, L. Pang, Q. Li, H. Sun, and B. Wang. Combination of Diverse Ranking Models for Personalized Expedia Hotel Searches.

Averaged Features

For all searches, group all ratings per hotel and take the average/median/std.

Do this for the $$ \verb|prop_starrating|, \verb|prop_review_score|, \verb|prop_location_score1|, \text{and } \verb|prop_location_score2| $$






(1st place) Kaggle inc. Personalize expedia hotel searches - icdm 2013, 2016. [https://www.kaggle.com/c/expedia-personalized-sort; accessed 11-May-2016].

Results

Scores

Final score:

Feature Importances

Thanks for listening!