HistoPath: Histopathology Image Analysis and Clustering

HistoPath is a comprehensive toolkit for analyzing histopathology images by extracting quantitative features, performing feature selection, and clustering patient samples based on tissue characteristics. The project uses various machine learning techniques to identify patterns and relationships in histopathology data.

Project Overview

The goal of this project is to analyze histopathology images through a pipeline that includes:

Sampling tissue patches from whole slide images
Extracting morphological and textural features from these patches
Aggregating features across multiple patches per patient
Selecting the most discriminative features
Clustering patients based on selected features
Visualizing and analyzing the clustering results
Integrating with MRI radiomics features

Requirements

The following Python libraries are required to run this project:

numpy>=1.20.0
pandas>=1.3.0
scikit-learn>=0.24.0
scikit-image>=0.18.0
matplotlib>=3.4.0
seaborn>=0.11.0
openslide-python>=1.1.0
histomicstk>=1.0.0
umap-learn>=0.5.0
scipy>=1.7.0
Pillow>=8.0.0

Installation

Clone the repository:

git clone https://github.com/BMGLab/HistoPath.git
cd HistoPath

Install the required packages:

pip install -r requirements.txt

Install OpenSlide for whole slide image processing:
- For macOS: brew install openslide
- For Ubuntu: sudo apt-get install openslide-tools

Install HistomicsTK:

HistomicsTK requires a specific installation procedure due to its dependencies:

# Create a conda environment (recommended)
conda create -n histopath python=3.8
conda activate histopath

# Install HistomicsTK
pip install histomicstk

# If issues occur with dependencies, install them individually:
pip install large_image
pip install girder-client

For more detailed installation instructions and troubleshooting, see the HistomicsTK documentation.

Project Structure

The project is organized into several modules, each handling a specific step in the analysis pipeline:

HistoPath/
│
├── Sampling/                # Extract tissue patches from whole slide images
│   └── sampling.py
│
├── Feature_Extraction/      # Extract and aggregate features from images
│   ├── feature_extraction_density.py
│   └── mean_features.py
│
├── Feature_selection/       # Select most discriminative features
│   └── feature_selection.py
│
├── Clustering/              # Cluster samples based on selected features
│   └── clustering_selected_features.py
│
├── MR/                      # MRI radiomic features processing
│   └── mr_feature_list.py
│
├── HistoandMR/              # Integration of histopathology and MRI features
│   └── all_features.py
│
├── Cluster_result_analysis/ # Analyze clustering results
│
├── Outputspdf/              # Output visualizations and results
│
└── README.md

Execution Order

The modules should be executed in the following sequence:

Tissue Sampling:
```
python Sampling/sampling.py
```
This extracts tissue patches from whole slide images and saves them to the slides/ directory.
Feature Extraction:
```
python Feature_Extraction/feature_extraction_density.py
```
This processes the tissue patches, extracts features, and saves them to the output_density/ directory.
Feature Aggregation:
```
python Feature_Extraction/mean_features.py
```
This aggregates features across multiple patches and creates a combined feature file.
Feature Selection:
```
python Feature_selection/feature_selection.py
```
This selects the most discriminative features using multiple methods and saves them to selected_features.csv.
Clustering and Visualization:
```
python Clustering/clustering_selected_features.py
```
This performs hierarchical clustering on the selected features and generates visualizations.
MRI Feature Processing:
```
python MR/mr_feature_list.py
```
This processes MRI radiomic features from CaPTk output files.
Histopathology-MRI Integration:
```
python HistoandMR/all_features.py
```
This integrates histopathology features with MRI features for multimodal analysis.

Dataset

The toolkit is designed to work with standard histopathology whole slide images (WSIs) in formats supported by OpenSlide (e.g., .svs, .ndpi, .tiff). Sample images are not included in the repository due to size constraints.

Output Files

The project generates various output files:

Extracted image patches in PNG format
Feature CSV files for each image and patient
Combined feature matrices
Selected feature lists
Clustering visualizations (dendrogram, PCA, UMAP plots)
Heatmaps of feature patterns
Statistical analysis of discriminative features

Acknowledgements

This project makes use of several open-source libraries and tools that deserve acknowledgement:

HistomicsTK for histopathology image analysis
OpenSlide for reading whole slide image formats
scikit-learn for machine learning and feature selection algorithms
UMAP for dimensionality reduction
CaPTk (Cancer Imaging Phenomics Toolkit) for MRI feature extraction
pandas and NumPy for data manipulation
Matplotlib and seaborn for visualization

Special thanks to the developers and contributors of these libraries for making this research possible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HistoPath: Histopathology Image Analysis and Clustering

Project Overview

Requirements

Installation

Project Structure

Execution Order

Dataset

Output Files

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Cluster_result_analysis		Cluster_result_analysis
Clustering		Clustering
Feature_Extraction		Feature_Extraction
Feature_selection		Feature_selection
HistoandMR		HistoandMR
MR		MR
Outputspdf		Outputspdf
Sampling		Sampling
output		output
output_density		output_density
slides		slides
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

BMGLab/HistoParser

Folders and files

Latest commit

History

Repository files navigation

HistoPath: Histopathology Image Analysis and Clustering

Project Overview

Requirements

Installation

Project Structure

Execution Order

Dataset

Output Files

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages