This is the repository that has the code and figures for the paper titled Information Maximization-Based Clustering of Histopathology Images Using Deep Learning.
128, which is a folder contains models (in foldermodels14) for 14-cluster set using 128x128 pixels patches, script for training this particular model (part1_kpc_cae_128_14.ipynb), clustering output using this model (part2_clustering.ipynb) and umap plotting (part3_umap.ipynb) using upper-dimensional latent features obtained from this model. Similarly, Folder64includes models (in foldermodels14) for 14-cluster set using 64x64 pixels patches, script for training this particular model (part1_kpc_cae_64_14.ipynb), clustering output using this model (part2_clustering.ipynb) and umap plotting (part3_umap.ipynb) using upper-dimensional latent features attained from this model.We have only shown these two models because they have been designated as the optimal cluster sets by internal validation metrics. By changing in one place, the scripts can be run for cluster sets using 8, 9, 10, 11, 12, 13, 15, 16, 17 and 18 as well. We mentioned where to change in the training scipts (
part1_kpc_cae_128_14.ipynbandpart1_kpc_cae_64_14.ipynb). Check that out.
-
'Figures' (another folder) has every figure used in the manuscript (including
Supporting informationsegment). TheSupporting informationfigures shown in the paper can be found in theAdditional Informationsubfolder here. Besides, some high resolution WSIs can also be found in theFiguresfolder. -
'MODULE' folder has been used to calculate internal validation indices like Xie-Beni index, Calinski-Harabasz index, C index, Dunn index, Hartigan index and Mclain-Rao index for both patch sizes (128x128 and 64x64 pixels). Go through these two scripts:
Internal_validation_128x128.ipynbandInternal_validation_64x64.ipynbto see index calculations. -
internal_validation_indices_plot.ipynbhas the code for creating Figure 5 as shown in the manuscript.
sequential_patch_creation_from_1_WSI.ipynbstores the code for creating 18252 patches sequentially (without overlap) of size 128x128 pixels from a randomly selected WSI with 5 stainings. We could not upload the dataset (seq_patches_18252_from_1wsi.npy) in GitHub because it has a size in Gigabytes but it can be found by accessing this link:seq_patches_18252_from_1wsi.npy. In this script, we have visualized the WSI that was used for sequential patch creation with 5 stainings. Additionally, we have shown a patch from this dataset with 5 stainings. A similar procedure can be followed to create 64x64 patches with minor changes in the script but we did not do that in this research.In
sequential_patch_creation_from_1wsi.ipynb, you will noticeHE_2,HE_3andHE_4besidesHE_1. TheseHEseries were taken for alignment and registration of images. The original tissue was sliced into a series of thin slices. Then, they were stained withHEand other staining methods alternately, e.g., HE-MT-HE-CD31-HE-Ki67-HE-CK...So, all stained images should be sandwiched by two contacts of
HEimages.
-
IM_output_64x64_100_1.ipynbandIM_output_of_unused_samples_during_training.ipynbpossess the code to generate Figure 4 and Figure 5, respectively of theSupporting informationportion in the manuscript. -
original_transformed_128x128.ipynbhas the code to visualize the original and transformed versions of a patch (128x128 pixels). Look over Figure 2 of theSupporting informationdivision in the paper. -
the_learning_curve.ipynbcontains the code to create Figure 3 of theSupporting informationsegment in the paper.
patch_visualization_128x128_64x64.ipynbhas the code to create Figure 1 of the manuscript. In addition, we have shown first 20 samples of both datasets (slice128_Block2_20K.npy---> contains 128x128 pixels patches and 'slice64_Block2_20K.npy' ---> stores 64x64 pixels patches) with 5 stainings. The sizes ofslice128_Block2_20K.npyandslice64_Block2_20K.npyare not suitable for being uploaded in GitHub (Gigabyte files).But, these two datasets can be found in this link: Location of 128x128 and 64x64 patches
histogram_colormap_WSI.ipynbkeeps the code to create Figure 10 of the manuscript.
- By using
random_patch_creation.ipynbscript, randomly patches can be created from 191 WSIs. Here, we have presented the code for creating 128x128 patches but with minor changes, 64x64 patches can also be created. We indicated where to change in the script to create 64x64 patches.
You will observe some of the locations in different scipts of this repository that points to the placement of different types of files (either dataset or something else) as shown below:
/project/dsc-is/mahfujul-r/M/slice128_Block2.20K.ipynbmodels14/model_en_202211280036_4000.ckptmodels14/model_cl_202211280036_4000.ckptmodels14/model_de_202211280036_4000.ckptproject/dsc-is/mahfujul-r/M/128/14/HHH14 & C14 & D14.csv/project/dsc-is/mahfujul-r/M/slice64_Block2.20K.ipynbmodels14/model_encoder_3000models14/model_classifier_3000models14/model_decoder_3000project/dsc-is/mahfujul-r/M/64/14/HHH14 & C14 & D14.csv./64/models14/model_encoder_3000./64/models14/model_classifier_3000./64/models14/model_decoder_3000./128/models14/model_en_202211280036_4000.ckpt./128/models14/model_cl_202211280036_4000.ckpt./128/models14/model_de_202211280036_4000.ckptproject/dsc-is/mahfujul-r/M/128/08/HHH08 & C08 & D08.csvproject/dsc-is/mahfujul-r/M/128/09/HHH09 & C09 & D09.csvproject/dsc-is/mahfujul-r/M/128/10/HHH10 & C10 & D10.csvproject/dsc-is/mahfujul-r/M/128/11/HHH11 & C11 & D11.csvproject/dsc-is/mahfujul-r/M/128/12/HHH12 & C12 & D12.csvproject/dsc-is/mahfujul-r/M/128/13/HHH13 & C13 & D13.csvproject/dsc-is/mahfujul-r/M/128/15/HHH15 & C15 & D15.csvproject/dsc-is/mahfujul-r/M/128/16/HHH16 & C16 & D16.csvproject/dsc-is/mahfujul-r/M/128/17/HHH17 & C17 & D17.csvproject/dsc-is/mahfujul-r/M/128/18/HHH18 & C18 & D18.csvproject/dsc-is/mahfujul-r/M/64/08/HHH08 & C08 & D08.csvproject/dsc-is/mahfujul-r/M/64/09/HHH09 & C09 & D09.csvproject/dsc-is/mahfujul-r/M/64/10/HHH10 & C10 & D10.csvproject/dsc-is/mahfujul-r/M/64/11/HHH11 & C11 & D11.csvproject/dsc-is/mahfujul-r/M/64/12/HHH12 & C12 & D12.csvproject/dsc-is/mahfujul-r/M/64/13/HHH13 & C13 & D13.csvproject/dsc-is/mahfujul-r/M/64/15/HHH15 & C15 & D15.csvproject/dsc-is/mahfujul-r/M/64/16/HHH16 & C16 & D16.csvproject/dsc-is/mahfujul-r/M/64/17/HHH17 & C17 & D17.csvproject/dsc-is/mahfujul-r/M/64/18/HHH18 & C18 & D18.csv./64/models14/hist_modelS_3000