Quickstart

This repo is the code of ICML2025 submission "From Feature Interaction to Feature Generation: A Generative Paradigm of CTR Prediction Models". This repo is implemented based on FuxiCTR.

Enviroments

conda create -n FuxiCTR_analysis python=3.10 -y
conda activate FuxiCTR_analysis
pip3 install torch torchvision torchaudio
pip3 install -r requirements.txt

Prepare dataset

bash 1.prepare.sh

Reproduce experiments

The following script will reproduce results on DCN V2

bash 2.reproduce.sh
bash 3.analyze.sh

Advanced usage & Code explanations

Faster training with preprocessed data (Highly Recommended!!!)

After the first run, FuxiCTR generates the parquet format dataset (that can be found in data/Avazu/avazu_x4_3bbbc4c9). You should change the following entries of dataset config files for faster training. For example, in model_zoo/FM/config/dataset_config.yaml, and similarly for other models, make these changes:

avazu_x4_3bbbc4c9:
   data_format: parquet # original: csv
   ...
   ...
   rebuild_dataset: false # original: true
   test_data: ../../data/Avazu/avazu_x4_3bbbc4c9/test.parquet # original: test.csv
   train_data: ../../data/Avazu/avazu_x4_3bbbc4c9/train.parquet # original: train.csv
   valid_data: ../../data/Avazu/avazu_x4_3bbbc4c9/valid.parquet # original: valid.csv

Generate other embeddings for analysis

After experiments, we can perform model inference based on the saved checkpoints (e.g., model_zoo/DeepFM/Avazu/DeepFM_avazu_x4_001/avazu_x4_3bbbc4c9/).

We should register embeddings that need to be saved in a init_record function (please refer to model_zoo/DCNv2/src/DCNv2.py). This should follow a record_XXX format, where XXX is the name of embeddings that you want to save for future analysis. The following line will save the feature embeddings. Remarkably, embeddings required for analysis in the paper have already been registered.

def init_record(self):
   self.record_feature_emb = []
   ...

We should change the forward function to record the specified embedding, like:

def forward(self, inputs):
   X = self.get_inputs(inputs)
   feature_emb = self.embedding_layer(X, flatten_emb=True)
   if self.analyzing:
      self.record_feature_emb.append(
         feature_emb.detach().clone().cpu()
      )
   ...

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
fuxictr		fuxictr
model_zoo		model_zoo
.gitignore		.gitignore
1.prepare.sh		1.prepare.sh
2.reproduce.sh		2.reproduce.sh
3.analyze.sh		3.analyze.sh
README.md		README.md
analyze.py		analyze.py
plot_paper_activation.py		plot_paper_activation.py
plot_paper_activation_auc.py		plot_paper_activation_auc.py
plot_paper_collapse.py		plot_paper_collapse.py
plot_paper_cov_matrix.py		plot_paper_cov_matrix.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Quickstart

Enviroments

Prepare dataset

Reproduce experiments

Advanced usage & Code explanations

Faster training with preprocessed data (Highly Recommended!!!)

Generate other embeddings for analysis

About

Uh oh!

Releases

Packages

Uh oh!

Languages

USTC-StarTeam/GE4Rec

Folders and files

Latest commit

History

Repository files navigation

Quickstart

Enviroments

Prepare dataset

Reproduce experiments

Advanced usage & Code explanations

Faster training with preprocessed data (Highly Recommended!!!)

Generate other embeddings for analysis

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages