Skip to content

Releases: jvalegre/robert

v2.1.0

21 Nov 20:02
11147ee

Choose a tag to compare

  • In classification problems now we can use 2 different categorial class labels (e.g., "active"/"inactive").
  • Changing the way of selecting the 10 first initial points in Bayesian Optimization (now using Latin Hypercube Sampling)
  • Deleting first the most correlated features in CURATE module
  • Sorting the columns and rows in the csv files to ensure reproducibility
  • Using only KNN imputer if you have more than 100 datapoints
  • Fixing RFECV (each model now has its own set of descriptors after feature selection)
  • Fixing bug in the AQME module when using --csv_test
  • Changing pkg_resources to importlib.resources to avoid deprecation warnings
  • Fixed bug when selecting test set datapoints with the EVEN option
  • Fixed bug in the name of extra_q1 and extra_q5 splitting methods
  • Updating packages versions in setup.py
  • Molssi databases link in easyROB GUI
  • Default split is 'RND' for classification problems
  • The sklearn-intelex accelerator was removed

v2.0.2

23 May 11:58
d769991

Choose a tag to compare

  • Fixed bug in MAC and Linux OS from the GENERATE
  • Updating AQME version to 1.7.3
  • Fixing libgfortran library to version 14.2.0

v2.0.1

14 Apr 14:41
692b5a0

Choose a tag to compare

  • The AQME and EVALUATE modules are now fully functional and have been reactivated in this version.
  • Fixed classification with external predictions
  • Fixed scores from VERIFY tests in clas
  • Fixed bug in the detection of automatic classification problems
  • Fixed bug in 'load_variables' where the model type and target value were not being saved
  • Fixed bug in 'sort_n_load' to ensure reproducibility of sorted CV across different operating systems

v2.0.0

17 Feb 11:42
e5d92ea

Choose a tag to compare

Adaptation of the code to avoid overfitting and to use with low-data problems

  • Fixed a bug in one-hot encoding in the one-hot test
  • Adding the possibility to disable the automatic standarization of descriptors (--std False)
  • Changing CV_test (now it standardizes the full database with sklearn functions)
  • Fixing a bug with the sklearn-intelex accelerator
  • Fixing a threading bug with matplotlib in SHAP
  • train:validation split was replaced by a repeated k-fold CV
  • The program always holds out a test set
  • The average results of the repeated k-fold CV are used to measure predictive ability and to predict new results
  • The BayesianOptimization() is used to find the bets model, using a combined metric that depends on interpolation and extrapolation of diferent types of CVs
  • This version does not work with classification problems and the AQME and EVALUATE modules were disabled until v2.0.1.
  • Updated ROBERT score, which is more robust towards small data problems

v1.2.1

31 Oct 17:47
e96f68d

Choose a tag to compare

  • NN solver are now set to 'lbfgs' by default in the MLPRegressor to work with small datasets
  • Thres_x is now set to 0.7 by default in the CURATE module
  • Fixing bug in the PREDICT module when using EVALUATE module (it was not showing the linear model equation)
  • Adding linear model equation in the REPORT module
  • Changing the threshold for correlated features in predict_utils to adjust to the new thres_x
  • Changing the way missing values are treated (previously filled with 0s, now using KNN imputer)
  • Adding .csv in --csv_test in case the user forgets to add it
  • Adding ROBERT score number in the REPORT module
  • Creating --descp_lvl to select which descriptors to use in the AQME-ROBERT workflow (interpret/denovo/full)
  • The AQME-ROBERT workflow now uses interpretable descriptors by default (--descp_lvl interpret)

v1.2.0

01 Oct 21:04
c86c564

Choose a tag to compare

  • Changing cross-validation (CV) in VERIFY to LOOCV for datasets with less than 50 points
  • Changing MAPIE in PREDICT to LOOCV for datasets with less than 50 points
  • By default, RFECV uses LOOCV for small datasets and 5-fold CV for larger datasets
  • The external test set is chosen more evenly along the range of y values (not fully random)
  • Changing the format of the VERIFY plot, from donut to bar plots
  • Automatic KN data splitting for databases with less than 250 datapoints
  • Change CV_test from ShuffleSplit to Kfold
  • Predictions from CV are now represented in a graph and stored in a CSV
  • Changing the ROBERT score to depend more heavily on results from CV
  • Fixing auto_test (now it works as specified in the documentation)
  • Adding clas predictions to report PDF
  • Adding new pytests that cover the ROBERT score section from the report PDF
  • Adding the EVALUATE module to evaluate linear models with user-defined descriptors and partitions
  • Adding Pearson heatmap in PREDICT for the two models, with individual variable correlation analysis
  • Adding y-distribution graphs and analysis of uniformity
  • Major changes to the report PDF file to include sections rather than modules
  • Improving explanation of the ROBERT score on Read The Docs
  • Printing coefficients in MVL models inside PREDICT.dat
  • Fixing bug in RFECV for classification problems, now it uses RandomForestClassifier()
  • Automatic recognition of classification problems

v1.1.2

01 Aug 09:11
e2a5fd2

Choose a tag to compare

  • Fixing conda-forge install and making pip install the preferred installation method in ReadtheDocs

v1.1.1

31 Jul 19:06
f80ede8

Choose a tag to compare

  • Hotfix of release 1.1.0 (changes in shap version of setup.py and improving slightly the documentation)

v1.1.0

31 Jul 15:55
703d589

Choose a tag to compare

  • Adding RFECV in CURATE to fix the maximum number of descriptors to 1/3 of datapoints
  • Added the possibility to use more than 1 SMILES column in the AQME module
  • Change the scoring criteria in the PFI workflow (from R2 to RMSE)
  • Fixing models where R2 in validation is much better than in training (if the validation set is very small or unrepresentative, the model may appear to perform excellently simply by chance)
  • Fixing PFI_plot bug (now takes all the features into account)
  • Fixing a bad allocation memory issue in GENERATE
  • Fixing bug in classification models when more than 2 classes of the target variable are present
  • Fixing reproducibility when using a specific seed in GENERATE module
  • Change CV_test from Kflod to ShuffleSplit and adding a random_state to ensure reproducibility
  • Allows CSV inputs that use ; as separator
  • Fixing CV_test bug in VERIFY (now it uses equal test size to the model tested)
  • Adding variability in the prediction with MAPIE python library
  • Adding sd in the predictions table when using external test set
  • Fixing error_type bug for classification models
  • MCC as default metric for classification models (better to check performance in unbalanced datasets)
  • PFI workflow now uses the same metric as error_type

v1.0.5

01 Dec 16:01
3e3cad7

Choose a tag to compare

  • Fixing some overfitted models with train and validation R2 0.99-1
  • Including the easyROB graphical user interface (GUI)