DeepBreath AI 🫁 – Lung Cancer Malignancy Detection

Empowering early lung cancer detection with advanced deep learning models. 🚀

DeepBreath AI is an end-to-end solution designed to assist radiologists and healthcare professionals in identifying and classifying lung cancer malignancies from CT scans. By leveraging advanced machine learning and data preprocessing techniques, our goal is to accelerate diagnosis and improve patient outcomes.

About the Project 🏥

Lung cancer (LC) is one of the leading causes of cancer-related deaths worldwide. Early detection is crucial for improving survival rates. However, manual analysis of CT scans can be time-consuming and prone to errors.

What Does DeepBreath AI Do? 💡

Binary Malignancy Classification 🔵
- Distinguishes benign (Classes 1, 2, 3) from malignant (Classes 4, 5) tumors.
Multi-class Malignancy Classification 🌈
- Predicts one of five malignancy scores (1–5).
Full-Slice vs. Zoomed-Slice Analysis 🔍
- Evaluates performance on both the entire CT slice and a zoomed-in nodule slice to better capture tumor details.

Authors 🧑‍💻

Sofia Brunori
Alberto Eusebio
Alessandro Tognotti
Andrea Tomasella

Dataset 📦

Our dataset includes CT scan slices from 2,363 patients, provided in NRRD format with Hounsfield Unit (HU) values.

Each patient has:
1. A full CT slice (512×512 int16 matrix)
2. A zoomed-in slice focusing on the nodule

Class Distribution 📊

Binary: 1,793 benign vs. 570 malignant
Multi-class: 5 imbalance-prone classes (1–5 malignancy scores)

Data avilability🚨: The data is not made publicly available, as consense from the authors of the dataset was negated.

Methods 🧪

Data Preprocessing 🔧

Normalization
- Min-max normalization to scale values between [0, 1].
HU Windowing Transformation
- Emphasizes lung tumors using a window width of 1300 and a window level of -220.
Augmentation
- Applied for both binary and multi-class tasks to improve generalization.
Balancing Techniques
- Binary: Random oversampling
- Multi-class: SMOTE + class weight balancing

Model Architectures 🔮

We utilize EfficientNet variants, each customized for the specific task:

Task	Architecture	Notes
Full-slice Binary	EfficientNetV2S	Fine-tuning last 10 layers
Full-slice Multi-class	EfficientNetB1	Class balancing + advanced preprocessing
Zoomed-slice Binary	EfficientNetV2S	SMOTE augmentation
Zoomed-slice Multi-class	EfficientNetB0	Oversampling + augmentations

Training Details:

Loss Functions: Binary Focal Crossentropy (binary) / Focal Loss (multi-class)
Optimizer: Adam (learning rates from 1e-5 to 1e-3)
Callbacks: EarlyStopping, ReduceLROnPlateau

Results 📈

Task	Accuracy	Precision	Recall	F1 Score	AUC
Full-slice Binary	0.73 ±0.02	0.44 ±0.03	0.52 ±0.05	0.48 ±0.03	0.66 ±0.02
Full-slice Multi-class	0.23 ±0.04	0.20 ±0.04	0.21 ±0.05	0.19 ±0.04	0.51 ±0.04
Zoomed-slice Binary	0.72 ±0.03	0.45 ±0.05	0.63 ±0.08	0.52 ±0.05	0.75 ±0.04
Zoomed-slice Multi-class	0.53 ±0.02	0.46 ±0.04	0.44 ±0.03	0.43 ±0.03	0.74 ±0.02

Key Observations 🔍

Binary classification consistently outperforms multi-class approaches.
Zoomed-slice models often yield higher accuracy, indicating that focusing on the tumor region is beneficial.
Class imbalance severely affects multi-class performance, highlighting the need for robust balancing strategies.

Conclusion 🚀

DeepBreath AI demonstrates the feasibility of automated lung cancer malignancy detection via deep learning. Our experiments show that zoomed-slice binary classification offers the most promising results. However, multi-class classification still faces challenges due to imbalanced datasets and subtle malignancy differences.

Future Directions:

Expanding data with more advanced augmentation.
Investigating transformer-based architectures.
Exploring additional balancing strategies to tackle class imbalance.

Known Issues 🚨

Oversampling Instead of Undersampling
Rationale: When data is highly unbalanced, undersampling should be performed. However, we chose oversampling to preserve as many minority-class examples as possible. Undersampling would have reduced the diversity of the minority classes, potentially leading to worse generalization.
Applying SMOTE Directly to Images
Rationale: While SMOTE is typically designed for tabular data, we adapted it for image data to increase the representation of minority classes without losing important image features. However, this approach may introduce some artifacts in the synthetic images.

Hardware Requirements 🖥️

Our training setup was conducted on a single NVIDIA Tesla P100 GPU provided by Kaggle. While the code can be run on CPU, a GPU is strongly recommended to reduce training times.

Recommended Specifications:

GPU: At least 1 × NVIDIA Tesla P100 (16GB) or equivalent/better
Memory: 16GB RAM (minimum)
Processor: Modern multi-core CPU
Storage: 1Gb storage

Installation & Usage ⚙️

Clone the Repository

git clone https://github.com/yourusername/DeepBreathAI.git
cd DeepBreathAI

Install Dependencies
```
pip install -r requirements.txt
```
Run the notebooks
```
jupiter notebook.ipynb
```

References 📚

For more in-depth explanations, methodology, and research citations, please refer to the associated research report.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
models		models
notebooks		notebooks
README.md		README.md
Report.pdf		Report.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DeepBreath AI 🫁 – Lung Cancer Malignancy Detection

Table of Contents

About the Project 🏥

What Does DeepBreath AI Do? 💡

Authors 🧑‍💻

Dataset 📦

Class Distribution 📊

Data avilability🚨: The data is not made publicly available, as consense from the authors of the dataset was negated.

Methods 🧪

Data Preprocessing 🔧

Model Architectures 🔮

Results 📈

Key Observations 🔍

Conclusion 🚀

Known Issues 🚨

Hardware Requirements 🖥️

Installation & Usage ⚙️

References 📚

About

Uh oh!

Releases

Packages

Uh oh!

Languages

AlbertoEusebio/Lung-Cancer-Classification

Folders and files

Latest commit

History

Repository files navigation

DeepBreath AI 🫁 – Lung Cancer Malignancy Detection

Table of Contents

About the Project 🏥

What Does DeepBreath AI Do? 💡

Authors 🧑‍💻

Dataset 📦

Class Distribution 📊

Data avilability🚨: The data is not made publicly available, as consense from the authors of the dataset was negated.

Methods 🧪

Data Preprocessing 🔧

Model Architectures 🔮

Results 📈

Key Observations 🔍

Conclusion 🚀

Known Issues 🚨

Hardware Requirements 🖥️

Installation & Usage ⚙️

References 📚

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages