Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
116 commits
Select commit Hold shift + click to select a range
c701940
Merge branch 'hotfix-v3.0.2'
Aug 20, 2017
62a5de7
test new executable in learn/
hksskh Nov 30, 2017
6ae97f2
parsing train/vali/test dataset files
hksskh Dec 1, 2017
c511742
parsing train dataset files
hksskh Dec 1, 2017
e0766c2
temp train and getrandompair
Dec 1, 2017
7749ff2
merged
Dec 1, 2017
0592356
ignore
Dec 1, 2017
035552d
clear
hksskh Dec 1, 2017
3c650d8
add LETOR dataset
hksskh Dec 1, 2017
ab149ab
delete line
Dec 1, 2017
35f67dc
Merge bradnch 'spd' of https://github.com/mihikadave/meta into spd
Dec 1, 2017
455d062
delete line
Dec 1, 2017
2df7c0b
fix
Dec 1, 2017
565bf7e
fix
Dec 1, 2017
3900802
fix
Dec 1, 2017
67163cb
fix
Dec 1, 2017
bd26c15
Finished training individual samples
Dec 1, 2017
fe37512
fix ya yb
Dec 1, 2017
2a3ea45
comments
Dec 1, 2017
65aba3a
use pointer for feature_vector
hksskh Dec 1, 2017
0eb5e1d
Training v2
Dec 1, 2017
032f78f
code cleaning and resource de-allocation
hksskh Dec 1, 2017
eab9748
add MQ2008 dataset for LETOR 4.0
hksskh Dec 1, 2017
6754dd7
Add basic testing logic
hksskh Dec 1, 2017
1a312e2
fix
hksskh Dec 1, 2017
b5ce381
fix
hksskh Dec 1, 2017
50a120f
do no ranking evaluation for queries with documents only in one relev…
hksskh Dec 1, 2017
1fb52be
fix
hksskh Dec 1, 2017
ca344df
add OHSUMED dataset
hksskh Dec 1, 2017
a1b2a9d
add support for different number of features
hksskh Dec 1, 2017
6dcabb2
compute precisions and mean average precision
hksskh Dec 2, 2017
570ea58
fix
hksskh Dec 2, 2017
45c19a0
tune training interation num
hksskh Dec 2, 2017
ebfce89
compute ndcg
hksskh Dec 2, 2017
00714bb
fix
hksskh Dec 2, 2017
025e35b
fix evaluation on 2004 datasets
hksskh Dec 16, 2017
10451d1
fix
hksskh Dec 16, 2017
1afb53a
fix
hksskh Dec 16, 2017
775630e
fix
hksskh Dec 16, 2017
c602925
fix
hksskh Dec 16, 2017
beaf287
fix
hksskh Dec 16, 2017
98464d1
read dataset for svm
hksskh Dec 16, 2017
bef88bd
fix
hksskh Dec 16, 2017
9f4eea2
fix
hksskh Dec 16, 2017
845816c
fix
hksskh Dec 16, 2017
023c184
fix
hksskh Dec 16, 2017
579309c
finish letor with liblinear
hksskh Dec 17, 2017
e4cd008
fix
hksskh Dec 17, 2017
cb389ab
fix
hksskh Dec 17, 2017
e80d738
fix
hksskh Dec 17, 2017
59befd5
fix
hksskh Dec 17, 2017
101da61
fix
hksskh Dec 17, 2017
cd74a4b
fix
hksskh Dec 17, 2017
cf6144b
fix
hksskh Dec 17, 2017
f92ab2c
fix
hksskh Dec 17, 2017
0f424a8
fix
hksskh Dec 17, 2017
1d50bf6
fix
hksskh Dec 17, 2017
9aabd85
save model file when training with libsvm
hksskh Dec 17, 2017
b86a65b
save model file when training with libsvm
hksskh Dec 17, 2017
fcffc09
fix
hksskh Dec 17, 2017
edb2760
fix
hksskh Dec 17, 2017
b317a33
measure running time in ms
hksskh Dec 17, 2017
7af2079
measure running time in seconds
hksskh Dec 17, 2017
5fe8ec2
fix docid
hksskh Dec 17, 2017
9da52da
fix docid
hksskh Dec 17, 2017
2c9ab7a
fix docid
hksskh Dec 17, 2017
2b75f31
fix docid
hksskh Dec 17, 2017
bae0c78
fix docid
hksskh Dec 17, 2017
98b5e63
fix svm dataset
hksskh Dec 17, 2017
1b3491f
fix svm dataset
hksskh Dec 17, 2017
c592268
fix svm dataset
hksskh Dec 17, 2017
92cc5e5
fix svm dataset
hksskh Dec 17, 2017
8a9d0dd
try rank svm
hksskh Dec 17, 2017
a6bcf0e
fix
hksskh Dec 17, 2017
9bae551
fix
hksskh Dec 17, 2017
c7ac751
fix svm dataset
hksskh Dec 17, 2017
95ec1da
fix
hksskh Dec 17, 2017
00f49a7
fix
hksskh Dec 17, 2017
d3dc46a
add loading and saving model file
hksskh Dec 17, 2017
0d7ac3f
add loading and saving model file
hksskh Dec 17, 2017
d060483
fix svm model loading/saving
hksskh Dec 17, 2017
3bd5ffa
fix letor model loading/saving
hksskh Dec 17, 2017
31a365e
fix letor model loading/saving
hksskh Dec 17, 2017
b4ad46b
fix letor model loading/saving
hksskh Dec 18, 2017
fa5326c
fix letor model loading/saving
hksskh Dec 18, 2017
1e77c2a
fix letor model loading/saving
hksskh Dec 18, 2017
9f7e07c
fix letor model loading/saving
hksskh Dec 18, 2017
90d78f3
allow incremental trainig with sgd model
hksskh Dec 18, 2017
c544fa7
allow incremental trainig with sgd model
hksskh Dec 18, 2017
3f19b8b
allow incremental trainig with sgd model
hksskh Dec 18, 2017
119c4dc
fix incremental training with sgd model
hksskh Dec 18, 2017
611e4b6
print time spent on read_data
hksskh Dec 18, 2017
fed50fb
Documentation
Dec 18, 2017
1ed8037
Header file and refactoring
Dec 18, 2017
48fc5dd
Move letor.h to include
Dec 18, 2017
a499a65
Move letor.h to include
Dec 18, 2017
88ebc9e
Add missing files
Dec 18, 2017
5f3defa
Make letor object oriented
Dec 18, 2017
26d449b
Refactor
Dec 18, 2017
0da37d3
Functional Code
Dec 18, 2017
5932615
Refactoring
Dec 18, 2017
2b19dae
Add documentation
Dec 18, 2017
4c4de07
update code style in pairwise letor
hksskh Dec 19, 2017
1ed0b77
fix pairwise letor
hksskh Dec 19, 2017
df9650f
fix pairwise letor
hksskh Dec 19, 2017
1e3c1ef
fix pairwise letor
hksskh Dec 19, 2017
bfffa3d
pairwise letor: save pure letor weights value to letor.weights
hksskh Dec 19, 2017
5002ca2
fix
hksskh Dec 19, 2017
6636332
Merge branch 'develop' of https://github.com/meta-toolkit/meta into spd
hksskh Dec 19, 2017
d8e78fc
fix
hksskh Dec 19, 2017
c5d7086
remove huge data files
hksskh Dec 19, 2017
8ee54ce
add data ignore
hksskh Dec 19, 2017
a0fa86d
fix
hksskh Dec 19, 2017
cd92526
fix submodules in deps/
hksskh Dec 19, 2017
3cb4e97
fix deps
hksskh Dec 19, 2017
64892d9
fix errors from appveyor ci
hksskh Dec 19, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,12 @@ data/ceeaus
data/breast-cancer
data/housing
data/cranfield
data/Gov
data/MQ2007
data/MQ2008
data/OHSUMED
biicode.conf
bii/
bin/
cmake-build-debug/*

37 changes: 37 additions & 0 deletions include/meta/classify/classifier/svm_wrapper.h
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,17 @@ class svm_wrapper : public classifier
svm_wrapper(dataset_view_type docs, const std::string& svm_path,
kernel kernel_opt = kernel::None);

/**
* Constructor. Should only be used as RankSVM.
* @param svm_path The path to the liblinear/libsvm library
* @param kernel_opt Which kind of kernel you want to use (default:
* None)
* This constructor assumes that caller has written training documents
* into svm-train file
*/
svm_wrapper(const std::string& svm_path,
kernel kernel_opt = kernel::None);

/**
* Loads a svm_wrapper from a stream.
* @param in The stream to read from
Expand All @@ -81,6 +92,12 @@ class svm_wrapper : public classifier

void save(std::ostream& out) const override;

/**
* Save weights as RankSVM to a stream. Should only be used as RankSVM.
* @param out
*/
void save_weights(std::ostream& out) const;

/**
* Classifies a document into a specific group, as determined by
* training data.
Expand All @@ -89,6 +106,14 @@ class svm_wrapper : public classifier
*/
class_label classify(const feature_vector& doc) const override;

/**
* Compute score of given document by dot product with weights
* learned by this SVM (should only be used as RankSVM).
* @param doc The document to compute score
* @return score of this document
*/
double computeScore(feature_vector& doc);

/**
* Classifies a collection document into specific groups, as determined
* by training data.
Expand Down Expand Up @@ -121,6 +146,18 @@ class svm_wrapper : public classifier

/** the list of class_labels (mainly for serializing the model) */
std::vector<class_label> labels_;

/** weights learned by this SVM */
std::vector<double> weights_;

/**
* Load weights from train model file written by this SVM. Should
* only be used as RankSVM.
*
* @param
* @return
*/
void load_weights();
};

class svm_wrapper_exception : public std::runtime_error
Expand Down
204 changes: 204 additions & 0 deletions include/meta/learn/learntorank/pairwise_letor.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
/**
* @file pairwise_letor.h
* @author Mihika Dave, Anthony Huang, Rachneet Kaur
* @date 12/18/17
*/

#ifndef META_PAIRWISE_LETOR_H
#define META_PAIRWISE_LETOR_H

#include <functional>
#include <iostream>
#include <vector>
#include <random>
#include <unordered_map>
#include <fstream>
#include <sstream>
#include <cmath>
#include <string>
#include <chrono>
#include <algorithm>

#include "meta/learn/loss/all.h"
#include "meta/learn/loss/hinge.h"
#include "meta/learn/loss/loss_function.h"
#include "meta/learn/loss/loss_function_factory.h"
#include "meta/learn/sgd.h"
#include "meta/learn/instance.h"
#include "meta/learn/dataset.h"
#include "meta/classify/classifier/svm_wrapper.h"
#include "meta/classify/classifier/classifier.h"


using namespace std;
using namespace meta::util;
using namespace meta::classify;

namespace meta
{
namespace learn
{
namespace learntorank
{
/**
* This class implements pairwise learning to rank with binary classifiers.
* The ranker here mainly follows the Stochastic Pairwise Descent algorithm
* based on D. Sculley's paper on 'Large Scale Learning to Rank'.
*
* @see https://static.googleusercontent.com/media/research.google.com/en//
* pubs/archive/35662.pdf
*/
class pairwise_letor {
public:
using tupl = std::tuple<feature_vector, int, string>;

enum DATA_TYPE {
TRAINING,
VALIDATION,
TESTING
};

enum CLASSIFY_TYPE {
LIBSVM,
SPD,
};

typedef struct forwardnode {
operator int() const {
return label;
}
operator feature_vector() const {
return fv;
}
int label;
feature_vector fv;
} forward_node;

public:

/**
* Constructor.
* @param num_features The number of features for the pairwise model
* @param classify_type The type of classifier to use
* @param hasModel If the sgd/svm model is loaded from file
* @param model_file The path to model file
*/
pairwise_letor(size_t num_features, CLASSIFY_TYPE classify_type,
bool hasModel, string model_file);

~pairwise_letor();

/**
* Train the pairwise ranker model
* @param data_dir Path to directory containing train.txt
*/
void train(string data_dir);

/**
* Train the svm with a pair of data samples
* @param data_dir Path to directory containing train.txt
* @param svm_path The path to the liblinear/libsvm library
*/
void train_svm(string data_dir, string svm_path);

/**
* Validate the learnt model
* @param data_dir The path to the directory containing vali.txt
*/
void validate(string data_dir);

/**
* Test the model on testing dataset
* @param data_dir The path to the directory containing test.txt
*/
void test(string data_dir);

private:

/// number of features for this letor model
size_t num_features_;

/// type of classifier to use
CLASSIFY_TYPE classify_type_;

/// sgd_model for training and testing
unique_ptr<sgd_model> model_;

/// binary svm wrapper for training and testing
unique_ptr<svm_wrapper> wrapper_;

/**
* Read data from the dataset and store it as nested hash-tables
* @param data_type The type of data (train, vali, or test)
* @param data_dir Path to directory containing train/vali/test.txt
* @param qids Vector to store ids of queries
* @param dataset Map to store nested data mapping: query => label => doc
* @param docids Map to store docids in each query and label
* @param relevance_map Map to store relevance of docs in each query
*/
void read_data(DATA_TYPE data_type,
string data_dir,
vector<string>& qids,
unordered_map<string, unordered_map<int, vector<feature_vector>>>& dataset,
unordered_map<string, unordered_map<int, vector<string>>>& docids,
unordered_map<string, unordered_map<string, int>>& relevance_map);

/**
* Return a random pair of tuple for training the svm classifier
* Tuple is of type (feature_vec, label, qid)
* @param training_qids Vector holding ids of all queries
* @param train_dataset Map holding nested data mapping: query => label => doc
* @param random_seed The random seed used to randomly choose data
* @return the random pair
*/
pair<tupl, tupl> getRandomPair(
vector<string>& training_qids,
unordered_map<string,unordered_map<int,vector<feature_vector>>>& train_dataset,
int random_seed);

/**
* Build nodes from dataset for training svm_wrapper
* @param train_dataset Map holding nested data mapping: query => label => doc
* @param dataset_nodes Vector holding data nodes for SVM training
*/
void build_dataset_nodes(
unordered_map<string,unordered_map<int,vector<feature_vector>>>& train_dataset,
vector<forward_node>& dataset_nodes);

/**
* Compare the relative rank between the 2 data samples
* @param p1 The first pair to compare
* @param p2 The second pair to compare
* @return whether p1 is ranked before p2
*/
static bool compare_docscore(
const pair<string, double> &p1, const pair<string, double> &p2) {
return p1.second > p2.second;
}

/**
* Compute the DCG
* @param limit The number of positions to compute DCG
* @param rankings Vector holding ranking at each position
* @return computed DCG
*/
double compute_dcg(int limit, vector<int> &rankings);

/**
* Evaluate the dataset for precision, mean average precision, NDCG
* @param qids Vector holding id for queries
* @param dataset Map holding nested data mapping: query => label => doc
* @param docids Map holding doc ids for each query and label
* @param relevance_map Map holding relevance of each doc for each query
*/
void evaluate(vector<string>& qids,
unordered_map<string, unordered_map<int, vector<feature_vector>>>& dataset,
unordered_map<string, unordered_map<int, vector<string>>>& docids,
unordered_map<string, unordered_map<string, int>>& relevance_map);

};

}
}
}
#endif
5 changes: 5 additions & 0 deletions include/meta/learn/sgd.h
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,11 @@ class sgd_model
*/
void save(std::ostream& out) const;

/**
* Saves the weights of current model in non-compact format.
*/
void save_weights(std::ostream& out) const;

/**
* Calibrates the learning rate for the model based on sample data.
* Search strategy inspired by Leon Bottou's SGD package.
Expand Down
6 changes: 6 additions & 0 deletions include/meta/meta.h
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,12 @@ namespace loss
}
}

/**
* Learning to rank algorithms
*/
namespace learn2rank
{
}
/**
* Algorithms for regression.
*/
Expand Down
Loading