You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+29-9Lines changed: 29 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
-
# Predicting Grokking via Local Intrinsic Dimensions of Contextual Language Models
1
+
# Detecting Grokking via Local Intrinsic Dimensions of Contextual Language Models
2
2
3
3
*Grokking* is the phenomenon where a machine learning model trained on a small dataset learns to generalize well beyond the training set after a long period of overfitting.
4
-
We demonstrate that the grokking phenomenon can be predicted by the local intrinsic dimension of the model's hidden states.
4
+
We demonstrate that the grokking phenomenon can be detected by a change of the local intrinsic dimension (LID) of the model's hidden states.
5
5
6
6
This repository is based on an unofficial re-implementation of the paper [Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets](https://arxiv.org/abs/2201.02177) by Power et al.
7
7
The original codebase that we base our work on was written by Charlie Snell.
@@ -53,18 +53,19 @@ This step can be achieved by running the setup script in the `grokking/setup/` d
53
53
54
54
1. (Optional) If required, e.g. when planning to run jobs on a cluster via a custom Hydra launcher, set the correct environment variables in the `.env` file in the project root directory.
55
55
56
-
1. (Optional) For setting up the repository to support job submissions to a cluster using a Hydra multi-run launcher, follow the instructions here: [[ANONYMIZED_HYDRA_HPC_LAUNCHER_LINK]].
56
+
1. (Optional) For setting up the repository to support job submissions to a cluster using a Hydra multi-run launcher, follow the instructions in the [Hydra-HPC-Launcher repository](https://github.com/carelvniekerk/Hydra-HPC-Launcher).
57
57
58
58
## Usage
59
59
60
60
We define `uv run` commands in the `pyproject.toml` file, which can be used as entry points to run the code.
61
61
62
62
The training script uses [Weights And Biases](https://wandb.ai/home) (wandb) by default to generate plots in realtime.
63
-
If you would not like to use wandb, just set `wandb.use_wandb=False` in `config/train_grokk.yaml` or as an argument when calling `train_grokk.py`.
64
-
In our modified version of the repository, this includes:
63
+
If you want to disable wandb, just set `wandb.use_wandb=False` in `config/train_grokk.yaml` or as an argument when calling `train_grokk.py`.
65
64
66
-
- Training and validation loss curves and accuracy curves
67
-
- Topological local estimates of the hidden states during training (with selected hyperparameters)
65
+
In our modified version of the repository, the logging includes:
66
+
67
+
- Training and validation loss curves and accuracy curves;
68
+
- Topological local estimates of the hidden states during training (with selected hyperparameters).
68
69
69
70
Note that since the computation of the local intrinsic dimension is expensive, we only compute it in certain intervals during training.
70
71
This can be controlled via the `topological_analysis.compute_estimates_every=500` parameter in the `config/train_grokk.yaml` file.
@@ -90,7 +91,7 @@ uv run train_grokk dataset.frac_train=0.5 wandb.use_wandb=false
90
91
91
92
You can try different operations or learning and architectural hyperparameters by modifying configurations in the `config/` directory.
92
93
93
-
### Experiments: Local Dimensions Predict Grokking
94
+
### Experiments: Local Dimensions Detect Grokking
94
95
95
96
To reproduce the results in our paper, which compares the onset of grokking with the timing of the drop in local intrinsic dimension, you can run the following command:
96
97
@@ -114,7 +115,26 @@ The description of the local estimates contains the parameters used for its comp
114
115
-`n-neighbors=64`: Number of neighbors (L) to use for the local intrinsic dimension estimate.
115
116
-`mean`: Log the mean of the local intrinsic dimension estimates over all token samples.
116
117
117
-
Note: We provide scripts for creating the figures in the paper from the wandb logs as part of the Topo_LLM repository in `topollm/plotting/wandb_export/`.
118
+
Note: We provide scripts for creating the figures in the paper from the wandb logs as part of the [Topo_LLM repository](https://github.com/aidos-lab/Topo_LLM) in `topollm/plotting/wandb_export/`.
119
+
120
+
## References
121
+
122
+
Further discussion of the results can be found in our paper [Less is More: Local Intrinsic Dimensions of Contextual Language Models](https://arxiv.org/abs/2506.01034).
123
+
124
+
```tex
125
+
@misc{ruppik2025morelocalintrinsicdimensions,
126
+
title={Less is More: Local Intrinsic Dimensions of Contextual Language Models},
127
+
author={Benjamin Matthias Ruppik and Julius von Rohrscheidt and Carel van Niekerk and Michael Heck and Renato Vukovic and Shutong Feng and Hsien-chin Lin and Nurul Lubis and Bastian Rieck and Marcus Zibrowius and Milica Gašić},
0 commit comments