This is the repository for the grain size analysis project. The project uses the pyDGS tool to classify image data and create visualizations of grain sizes. This guide explains how to set up and run the project.
- Anaconda
- Python 3.10
- Git
First, the repository must be cloned on your local computer:
git clone https://github.com/Ecohydraulics/gs-analysis-frozencores.git
cd gs-analysis-frozencores This repository requires a non-packaged dependecy, notably pyDGS (Buscombe, 2013), to run. This is a non-packaged algorithm that requires some packages to be able to run. To install the dependencies that are necessary to run the gs-analysis-frozencores codes, install the environment by running:
conda env create -f environment.yml
conda activate gsenv
After activating the environment, install the project in editable mode and run the GUI:
```python src/gs_analysis_frozencores/app.py``
Alternative (without installation):
The application analyzes grain sizes from orthorectified frozen core images, segments layers, and classifies them. Outputs (enhanced image, layer plots, cropped layers, visualizations) are saved under output/.
gs-analysis-frozencores/
pyproject.toml
README.md
environment.yml
models/ # saved models (.pkl)
output/ # generated outputs (images/plots)
src/
gs_analysis_frozencores/
app.py # entry point (GUI launcher)
analysis/
grain.py # GrainSizeAnalyzer (processing, layer detection, outputs)
ml/
textures.py # TexturAnalyzer (GLCM features, model load/predict)
ui/
main_window.py # PyQt5 MainWindow wiring the workflow
dialogs.py # Layer count & Pebbles value dialogs
worker.py # QThread worker for analysis
- GUI workflow (PyQt5): Load
.tif/.tiffimages, run analysis in a background thread with a progress bar, and display results. - TIFF metadata handling: Reads resolution from
XResolution/ResolutionUnitto compute pixel size; falls back to a reasonable estimate when missing. - Preprocessing: Denoising (Non-local Means) and contrast enhancement (CLAHE).
- Layer detection: Automatic layer boundary detection using vertical Sobel gradient profiling; user specifies expected number of layers (1–20).
- Outputs:
- Enhanced image saved to
output/. - Layer boundary plot saved to
output/. - Cropped per-layer images saved to
output/.
- Enhanced image saved to
- Texture classification: Loads a scikit-learn model (
.pkl) frommodels/, computes multi-scale GLCM features, and classifies layers intoClay/Silt,Sand,Pebbles. - Grain-size visualization:
- Side-by-side visualization with scale bar, colored layer boundaries, and per-layer labels.
- Distribution panel plotting d30/d50/d90 vs. depth (log-x), with default values for
SandandClay/Siltand user-entered values forPebbles.
- Persistence: Saves artifacts under
output/with timestamps. - Headless-safe plotting: Uses the
Aggbackend to render and save figures without a display.
The application is based on four steps for obtaining the grain size classes using as input the orthorectified frozen core image:
Step 1. Load Image: A .tif image with the orthorectified frozen core can be uploaded.
Step 2. Analyze: This step will perform the layer segmentation, for instance, between Pebbles and Sand.
Step 3. Load Classification Model: This step is just to load the .pkl model that performs the grain size classification between the above-mentioned three categories.
Step 4. Classify Layers: This step will apply the .pkl model to perform the classification of the identified layers of the frozen core image provided in the first step.
- Boundary editing: Add UI to manually adjust detected layer boundaries (drag lines, nudge controls) after auto-detection.
- Model improvement: The .pkl model currently classifies between the classes
Clay/Silt,Sand, andPebbles, which was trained usingold_codes_franz/trainingsmodel_klassifizierung.py, but there is no information about what training data was used and about the accuracy of the model (no mention in the BSc. thesis, see attached). This model should be vetted with further orthorectified frozen cores to measure its accuracy. Also, the data used in the training should be investigated so that the model is not augmented with the same data used in the training. - Export data and true grain size values: Write CSV with per-layer depths and with actual d30/d50/d90 values, not just default values for the
Silt/Clayand theSandclasses, and the values provided from BASEGRAIN for the layerPebbles; optionally include classification probabilities. - Model/version safety: Validate selected
.pklagainst the expected scikit-learn version (1.6.1) and feature vector length; show clear errors. - Pebbles integration: Import d30/d50/d90 directly from BASEGRAIN output files; allow batch entry/import.
- Configurable parameters: Expose kernel sizes, smoothing/CLAHE parameters, and gradient window settings via UI or config file.
- Robust metadata: Improve pixel-size inference and unit handling; warn when units are ambiguous.
- Performance: Optional downscaling for detection, cached intermediate images, and potential GPU/OpenCL paths where available.
- Packaging: Create Windows-friendly packaging (e.g., PyInstaller) and troubleshoot PyQt5 runtime requirements.
In the current workflow, BASEGRAIN should be used to determine the d30, d50, and d90 for the cropped image classified as Pebbles after Step 2 (Analyze). The values obtained in BASEGRAIN should then be provided (in a pop-up window) by the user after clicking on the last workflow Step 4, "Classify Layers". The classification results appear in the next step as a pop-up window. The results of the grain sizes d30, d50, and d90 are then saved in the output/ directory. However, these values are only from default (not computed) values for typical Silt/Clay and Sand mixtures and do not reflect the actual characteristic grain sizes of the frozen core under analysis.