Skip to content

Commit ef5f7ca

Browse files
authored
Merge pull request #26 from ben300694/update/edit_licensing_information_and_update_references
Update/edit licensing information and update references
2 parents 169f969 + 40423eb commit ef5f7ca

34 files changed

+48
-513
lines changed

LICENSE

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
MIT License
22

33
Copyright (c) 2022 Charlie Snell
4-
Copyright (c) 2025 AUTHOR_1 ([email protected])
4+
Copyright (c) 2025 Benjamin Matthias Ruppik ([email protected])
5+
Copyright (c) 2025 Julius von Rohrscheidt ([email protected])
56

67
Permission is hereby granted, free of charge, to any person obtaining a copy
78
of this software and associated documentation files (the "Software"), to deal
@@ -20,3 +21,9 @@ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
2021
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
2122
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
2223
SOFTWARE.
24+
25+
Code generation tools and workflows:
26+
First versions of this code were potentially generated
27+
with the help of AI writing assistants including
28+
GitHub Copilot, ChatGPT, Microsoft Copilot, Google Gemini.
29+
Afterwards, the generated segments were manually reviewed and edited.

README.md

Lines changed: 29 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
1-
# Predicting Grokking via Local Intrinsic Dimensions of Contextual Language Models
1+
# Detecting Grokking via Local Intrinsic Dimensions of Contextual Language Models
22

33
*Grokking* is the phenomenon where a machine learning model trained on a small dataset learns to generalize well beyond the training set after a long period of overfitting.
4-
We demonstrate that the grokking phenomenon can be predicted by the local intrinsic dimension of the model's hidden states.
4+
We demonstrate that the grokking phenomenon can be detected by a change of the local intrinsic dimension (LID) of the model's hidden states.
55

66
This repository is based on an unofficial re-implementation of the paper [Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets](https://arxiv.org/abs/2201.02177) by Power et al.
77
The original codebase that we base our work on was written by Charlie Snell.
@@ -53,18 +53,19 @@ This step can be achieved by running the setup script in the `grokking/setup/` d
5353

5454
1. (Optional) If required, e.g. when planning to run jobs on a cluster via a custom Hydra launcher, set the correct environment variables in the `.env` file in the project root directory.
5555

56-
1. (Optional) For setting up the repository to support job submissions to a cluster using a Hydra multi-run launcher, follow the instructions here: [[ANONYMIZED_HYDRA_HPC_LAUNCHER_LINK]].
56+
1. (Optional) For setting up the repository to support job submissions to a cluster using a Hydra multi-run launcher, follow the instructions in the [Hydra-HPC-Launcher repository](https://github.com/carelvniekerk/Hydra-HPC-Launcher).
5757

5858
## Usage
5959

6060
We define `uv run` commands in the `pyproject.toml` file, which can be used as entry points to run the code.
6161

6262
The training script uses [Weights And Biases](https://wandb.ai/home) (wandb) by default to generate plots in realtime.
63-
If you would not like to use wandb, just set `wandb.use_wandb=False` in `config/train_grokk.yaml` or as an argument when calling `train_grokk.py`.
64-
In our modified version of the repository, this includes:
63+
If you want to disable wandb, just set `wandb.use_wandb=False` in `config/train_grokk.yaml` or as an argument when calling `train_grokk.py`.
6564

66-
- Training and validation loss curves and accuracy curves
67-
- Topological local estimates of the hidden states during training (with selected hyperparameters)
65+
In our modified version of the repository, the logging includes:
66+
67+
- Training and validation loss curves and accuracy curves;
68+
- Topological local estimates of the hidden states during training (with selected hyperparameters).
6869

6970
Note that since the computation of the local intrinsic dimension is expensive, we only compute it in certain intervals during training.
7071
This can be controlled via the `topological_analysis.compute_estimates_every=500` parameter in the `config/train_grokk.yaml` file.
@@ -90,7 +91,7 @@ uv run train_grokk dataset.frac_train=0.5 wandb.use_wandb=false
9091

9192
You can try different operations or learning and architectural hyperparameters by modifying configurations in the `config/` directory.
9293

93-
### Experiments: Local Dimensions Predict Grokking
94+
### Experiments: Local Dimensions Detect Grokking
9495

9596
To reproduce the results in our paper, which compares the onset of grokking with the timing of the drop in local intrinsic dimension, you can run the following command:
9697

@@ -114,7 +115,26 @@ The description of the local estimates contains the parameters used for its comp
114115
- `n-neighbors=64`: Number of neighbors (L) to use for the local intrinsic dimension estimate.
115116
- `mean`: Log the mean of the local intrinsic dimension estimates over all token samples.
116117

117-
Note: We provide scripts for creating the figures in the paper from the wandb logs as part of the Topo_LLM repository in `topollm/plotting/wandb_export/`.
118+
Note: We provide scripts for creating the figures in the paper from the wandb logs as part of the [Topo_LLM repository](https://github.com/aidos-lab/Topo_LLM) in `topollm/plotting/wandb_export/`.
119+
120+
## References
121+
122+
Further discussion of the results can be found in our paper [Less is More: Local Intrinsic Dimensions of Contextual Language Models](https://arxiv.org/abs/2506.01034).
123+
124+
```tex
125+
@misc{ruppik2025morelocalintrinsicdimensions,
126+
title={Less is More: Local Intrinsic Dimensions of Contextual Language Models},
127+
author={Benjamin Matthias Ruppik and Julius von Rohrscheidt and Carel van Niekerk and Michael Heck and Renato Vukovic and Shutong Feng and Hsien-chin Lin and Nurul Lubis and Bastian Rieck and Marcus Zibrowius and Milica Gašić},
128+
year={2025},
129+
eprint={2506.01034},
130+
archivePrefix={arXiv},
131+
primaryClass={cs.CL},
132+
url={https://arxiv.org/abs/2506.01034},
133+
note={To appear in NeurIPS 2025},
134+
}
135+
```
136+
137+
- [Topo_LLM repository](https://github.com/aidos-lab/Topo_LLM)
118138

119139
## Acknowledgements
120140

grokking/analysis/local_estimates_computation/estimator/get_estimator.py

Lines changed: 0 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,3 @@
1-
# Copyright 2024-2025
2-
# [ANONYMIZED_INSTITUTION],
3-
# [ANONYMIZED_FACULTY],
4-
# [ANONYMIZED_DEPARTMENT]
5-
#
6-
# Authors:
7-
# AUTHOR_1 (2025) ([email protected])
8-
# AUTHOR_2 ([email protected])
9-
#
10-
# Code generation tools and workflows:
11-
# First versions of this code were potentially generated
12-
# with the help of AI writing assistants including
13-
# GitHub Copilot, ChatGPT, Microsoft Copilot, Google Gemini.
14-
# Afterwards, the generated segments were manually reviewed and edited.
15-
#
16-
17-
181
import logging
192
import pprint
203

grokking/analysis/local_estimates_computation/get_n_neighbors_from_array_len_and_pointwise_config.py

Lines changed: 0 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,3 @@
1-
# Copyright 2025
2-
# [ANONYMIZED_INSTITUTION],
3-
# [ANONYMIZED_FACULTY],
4-
# [ANONYMIZED_DEPARTMENT]
5-
#
6-
# Authors:
7-
# AUTHOR_1 (2025) ([email protected])
8-
#
9-
# Code generation tools and workflows:
10-
# First versions of this code were potentially generated
11-
# with the help of AI writing assistants including
12-
# GitHub Copilot, ChatGPT, Microsoft Copilot, Google Gemini.
13-
# Afterwards, the generated segments were manually reviewed and edited.
14-
#
15-
16-
171
import logging
182

193
from grokking.config_classes.local_estimates.pointwise_config import LocalEstimatesPointwiseConfig

grokking/analysis/local_estimates_computation/global_and_pointwise_local_estimates_computation.py

Lines changed: 0 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,3 @@
1-
# Copyright 2025
2-
# [ANONYMIZED_INSTITUTION],
3-
# [ANONYMIZED_FACULTY],
4-
# [ANONYMIZED_DEPARTMENT]
5-
#
6-
# Authors:
7-
# AUTHOR_1 (2025) ([email protected])
8-
# AUTHOR_2 ([email protected])
9-
#
10-
# Code generation tools and workflows:
11-
# First versions of this code were potentially generated
12-
# with the help of AI writing assistants including
13-
# GitHub Copilot, ChatGPT, Microsoft Copilot, Google Gemini.
14-
# Afterwards, the generated segments were manually reviewed and edited.
15-
#
16-
17-
181
"""Compute global and local estimates from prepared embeddings."""
192

203
import logging

grokking/config_classes/constants.py

Lines changed: 0 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,3 @@
1-
# Copyright 2024-2025
2-
# [ANONYMIZED_INSTITUTION],
3-
# [ANONYMIZED_FACULTY],
4-
# [ANONYMIZED_DEPARTMENT]
5-
#
6-
# Authors:
7-
# AUTHOR_1 (2025) ([email protected])
8-
# AUTHOR_2 ([email protected])
9-
#
10-
# Code generation tools and workflows:
11-
# First versions of this code were potentially generated
12-
# with the help of AI writing assistants including
13-
# GitHub Copilot, ChatGPT, Microsoft Copilot, Google Gemini.
14-
# Afterwards, the generated segments were manually reviewed and edited.
15-
#
16-
17-
181
"""Script for setting global variables for the config files."""
192

203
import logging
@@ -23,7 +6,6 @@
236

247
from dotenv import load_dotenv
258

26-
279
default_logger: logging.Logger = logging.getLogger(
2810
name=__name__,
2911
)

grokking/config_classes/local_estimates/filtering_config.py

Lines changed: 0 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,3 @@
1-
# Copyright 2024-2025
2-
# [ANONYMIZED_INSTITUTION],
3-
# [ANONYMIZED_FACULTY],
4-
# [ANONYMIZED_DEPARTMENT]
5-
#
6-
# Authors:
7-
# AUTHOR_1 (2025) ([email protected])
8-
# AUTHOR_2 ([email protected])
9-
#
10-
# Code generation tools and workflows:
11-
# First versions of this code were potentially generated
12-
# with the help of AI writing assistants including
13-
# GitHub Copilot, ChatGPT, Microsoft Copilot, Google Gemini.
14-
# Afterwards, the generated segments were manually reviewed and edited.
15-
#
16-
17-
181
"""Configurations for specifying filtering of the data for local estimates computation."""
192

203
from pydantic import BaseModel, Field

grokking/config_classes/local_estimates/local_estimates_config.py

Lines changed: 0 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,3 @@
1-
# Copyright 2024-2025
2-
# [ANONYMIZED_INSTITUTION],
3-
# [ANONYMIZED_FACULTY],
4-
# [ANONYMIZED_DEPARTMENT]
5-
#
6-
# Authors:
7-
# AUTHOR_1 (2025) ([email protected])
8-
# AUTHOR_2 ([email protected])
9-
#
10-
# Code generation tools and workflows:
11-
# First versions of this code were potentially generated
12-
# with the help of AI writing assistants including
13-
# GitHub Copilot, ChatGPT, Microsoft Copilot, Google Gemini.
14-
# Afterwards, the generated segments were manually reviewed and edited.
15-
#
16-
17-
181
"""Configuration class for embedding data preparation."""
192

203
from pydantic import BaseModel, Field

grokking/config_classes/local_estimates/noise_config.py

Lines changed: 0 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,3 @@
1-
# Copyright 2024-2025
2-
# [ANONYMIZED_INSTITUTION],
3-
# [ANONYMIZED_FACULTY],
4-
# [ANONYMIZED_DEPARTMENT]
5-
#
6-
# Authors:
7-
# AUTHOR_1 (2025) ([email protected])
8-
# AUTHOR_2 ([email protected])
9-
#
10-
# Code generation tools and workflows:
11-
# First versions of this code were potentially generated
12-
# with the help of AI writing assistants including
13-
# GitHub Copilot, ChatGPT, Microsoft Copilot, Google Gemini.
14-
# Afterwards, the generated segments were manually reviewed and edited.
15-
#
16-
17-
181
"""Configurations for adding artificial noise into the local estimates computation."""
192

203
from pydantic import BaseModel, Field

grokking/config_classes/local_estimates/plot_config.py

Lines changed: 0 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,3 @@
1-
# Copyright 2025
2-
# [ANONYMIZED_INSTITUTION],
3-
# [ANONYMIZED_FACULTY],
4-
# [ANONYMIZED_DEPARTMENT]
5-
#
6-
# Authors:
7-
# AUTHOR_1 (2025) ([email protected])
8-
#
9-
# Code generation tools and workflows:
10-
# First versions of this code were potentially generated
11-
# with the help of AI writing assistants including
12-
# GitHub Copilot, ChatGPT, Microsoft Copilot, Google Gemini.
13-
# Afterwards, the generated segments were manually reviewed and edited.
14-
#
15-
16-
171
from pydantic import BaseModel, Field
182

193

0 commit comments

Comments
 (0)