README for Predictive Modeling in Sequential Recommendation: Bridging Performance Laws with Data Quality Insights
This project provides the PerformanceLaw Python library for evaluating the complexity and information quality of sequence data. The library includes several metrics:
actual_entropy(seq): Estimates actual entropy of a sequence.actual_entropy_tq(seq): Same as above, shows a progress bar (needstqdm).ApEn(U, m, r): Calculates the Approximate Entropy of a sequenceU, with embedding dimensionmand thresholdr.compression_ratio(data): Measures the compressibility of a list of integers. Low ratio means data is more compressible (less random).shannon_entropy(sequence): Computes the Shannon entropy (in bits) of a sequence.
From the project root (where setup.py is located), run:
pip install -e .Make sure you have installed numpy (and optionally tqdm):
pip install numpy tqdmfrom PerformanceLaw import (
actual_entropy,
actual_entropy_tq,
ApEn,
compression_ratio,
shannon_entropy
)
seq = [1, 2, 1, 2, 3]
print(actual_entropy(seq))
print(shannon_entropy(seq))
print(ApEn(seq, m=2, r=0.2))
print(compression_ratio(seq))These functions help measure sequence complexity, data randomness, and information content. They can be used in recommendation systems, time-series analysis, and data quality evaluation.
This README provides an overview of the project and instructions on how to navigate and utilize the different components available in the /General_Transformer and /Performance_Law_Appendix_Result directories.
The project focuses on advancing sequential recommendation systems through innovative models and performance law fitting strategies. It is divided into two main components:
- General Transformer for Sequential Recommendation: Located in the
/General_Transformerdirectory, this component implements a general transformer architecture for sequential recommendation tasks. - Performance Law Fitting Analysis: Found in the
/Performance_Law_Appendix_Resultdirectory, this component focuses on fitting performance laws for metrics like HR (Hit Rate) and NDCG (Normalized Discounted Cumulative Gain).
- /General_Transformer: Contains scripts and code for training and evaluating transformer models for recommendation systems.
- /Performance_Law_Appendix_Result: Includes scripts and generated results for performance law fitting analysis, along with supplementary images referenced in the research paper.
The scripts in the /General_Transformer directory are designed to train transformer models tailored for sequential recommendation tasks. The main features are:
- Model Training: Utilizing DDP (Distributed Data Parallel) to efficiently train on multiple GPUs.
- Hyperparameter Configurations: Flexible adjustments for layers, heads, batch sizes, and more.
- Logging and Evaluation: Detailed performance metrics are logged using libraries like WandB.
To learn more about using these scripts, refer to the README provided within the /General_Transformer directory.
Located in the /Performance_Law_Appendix_Result, this segment of the project analyzes performance laws through an innovative fitting approach. Key elements include:
- Performance Law Fitting: Detailed scripts for fitting performance laws to key metrics.
- Supplementary Images: Includes images such as
PerformanceLaw_HR,PerformanceLaw_NDCG,ScalingLaw_HR, andScalingLaw_NDCGfor deeper insights and validation of research findings these images serve as supplementary material for the paper.
To understand the scripts and their execution, refer to the README within the /Performance_Law_Appendix_Result directory.
-
Navigate to the relevant directory:
- For transformer models, explore
/General_Transformer. - For performance law analysis, visit
/Performance_Law_Appendix_Result.
- For transformer models, explore
-
Install required dependencies: Ensure all necessary Python libraries are installed as indicated in the README files within each directory.
-
Run the scripts: Follow the instructions to execute model training, evaluation, or performance fitting as required.
-
Explore Results and Graphs: Analyze outputs, performance metrics, and graphical results included in each component.