This repository contains the code and data for the paper:
@inproceedings{li2025crabs,
title={CRABS: A syntactic-semantic pincer strategy for bounding LLM interpretation of Python notebooks},
author={Li, Meng and McPhillips, Timothy M and Wang, Dingmin and Tsai, Shin-Rong and Lud{\"a}scher, Bertram},
booktitle={Conference on Language Modeling (COLM)},
year={2025}
}├── data
│ ├── ProspectiveCorrectGivenSpecification.json: human-annotated ground truth
│ ├── builtin_funcs_and_vars.txt
│ ├── crabs_meta.csv
│ ├── inputs: this folder contains 50 notebooks of the CRABS dataset
│ └── outputs
│ ├── informationflowgraph: this folder stores results per notebook for information flow graphs
│ ├── cellexecutiondependencygraph: this folder stores results per notebook for cell execution dependency graphs
│ ├── records: this folder records the prompts together with the LLM's output
│ ├── informationflowgraph.csv: this file merges results from the informationflowgraph folder by computing the average and standard deviation, together with "em" for exact matching
│ └── cellexecutiondependencygraph.csv: this file merges results from the cellexecutiondependencygraph folder by computing the average and standard deviation, together with "em" for exact matching
├── parsers
│ ├── prompts: this folder contains prompts for LLM-based approaches
│ │ ├── baseline.txt
│ │ ├── crabs_subtask1.txt
│ │ ├── crabs_subtask2.txt
│ │ ├── a1.txt
│ │ └── a2.txt
│ ├── baseline.py: baseline approach using an LLM to query the entire notebook
│ ├── crabs.py: our main approach
│ ├── a1.py: ablation study 1 (w/o syntactic parsing)
│ ├── a2.py: ablation study 2 (w/o cell-by-cell prompting)
│ └── yw_generator.py: syntactic processor
├── key
├── environment.yml: for environment setup
├── 1_model_and_eval.py: extract information flow graphs and cell execution dependency graphs for notebooks under the data/inputs folder
├── 2_crabs_llm_acc.py: compute the accuracy for an LLM to resolve ambiguities left by analyzing the syntactic structure of these notebooks
├── 3_metrics_across_notebooks.py: compute the average and standard deviation, together with exact matching across all notebooks by taking files under the informationflowgraph and the cellexecutiondependencygraph folders as inputs
└── readme.md
Note for the builtin_funcs_and_vars.txt file: built-in functions and variables are recognized through the curated list of Python version 3.13.2 (retrieved on Mar. 05, 2025)
-
Install conda
-
Create a Python virtual environment
conda env create -f environment.yml -n crabs
conda activate crabs
- Data preparation
-
Download the Meta Kaggle Code dataset12, and retrieve notebooks from it based on the
kernelVersionIdcolumn of thedata/crabs_meta.csvfile. Note that thedata/crabs_meta.csvfile contains the meta data of the CRABS dataset, which is created by selecting 50 highly up-voted Python notebooks in Kaggle and linking them withkernelVersionIdby using the Meta Kaggle dataset23. These kernel version IDs can be further used to identify notebooks in the Meta Kaggle Code dataset since they are named as{kernelVersionId}.ipynb. -
Move these notebooks under the
data/inputsfolder and rename them as{crabsId}-{authorName}:{urlSlug}.ipynbwherecrabsId,authorName, andurlSlugare columns in thedata/crabs_meta.csvfile.
-
The key for OpenAI: put your OpenAI key in the
keyfile -
Go to the
1_model_and_eval.pyscript and update the Parameters section as needed
After you have installed and configured CRABS, run the following commands in sequence:
python 1_model_and_eval.py
python 2_crabs_llm_acc.py
python 3_metrics_across_notebooks.py
The table below maps column names from the output CSV files to the corresponding terms used in the paper for consistency and clarity.
| CSV Column | Paper Terminology |
|---|---|
MinIOParser |
lower estimate |
MaxIOParser |
upper estimate |
A1 |
Ablation study S1 |
A2 |
Ablation study S2 |
Footnotes
-
Jim Plotts, and Megan Risdal. (2023). Meta Kaggle Code [Data set]. Kaggle. https://doi.org/10.34740/KAGGLE/DS/3240808 ↩
-
Our experiments are based on the dataset Retrieved on Mar. 27, 2025. ↩ ↩2
-
Megan Risdal, and Timo Bozsolik. (2022). Meta Kaggle [Data set]. Kaggle. https://doi.org/10.34740/KAGGLE/DS/9 ↩