-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Hi, thanks for this interesting new approach for studying single-cell trajectories. I was following the tutorial notebook at https://github.com/Teichlab/Genes2Genes/blob/main/notebooks/Tutorial.ipynb and ran into errors during the Clustering alignments step:
df = ClusterUtils.run_clustering(aligner, metric='levenshtein', experiment_mode=True)
errors with:
IndexError Traceback (most recent call last)
<ipython-input-141-2242a2d1f27f> in <module>
----> 1 df = ClusterUtils.run_clustering(aligner, metric='levenshtein', experiment_mode=True)
/mnt/volume/resources/miniconda3/envs/jupyter/lib/python3.9/site-packages/genes2genes/ClusterUtils.py in run_clustering(aligner, metric, DIST_THRESHOLD, experiment_mode)
115 eval_dists = []
116 for D_THRESH in tqdm(dist_thresholds):
--> 117 gene_clusters, cluster_ids, silhouette_score, silhouette_score_mode, n_small_cluster = run_agglomerative_clustering(E, aligner.gene_list, D_THRESH)
118
119 if(len(gene_clusters.keys())==1):
/mnt/volume/resources/miniconda3/envs/jupyter/lib/python3.9/site-packages/genes2genes/ClusterUtils.py in run_agglomerative_clustering(E, gene_list, DIST_THRESHOLD, linkage)
53 silhouette_score = sklearn.metrics.silhouette_score(X=E , labels = model.labels_, metric='precomputed')
54 silhouette_score_samples = sklearn.metrics.silhouette_samples(X=E , labels = model.labels_, metric='precomputed')
---> 55 silhouette_score_mode = scipy.stats.mode(silhouette_score_samples)[0][0]
56
57 n_clusters_less_members = []
IndexError: invalid index to scalar variable.
This error in scipy.stats.mode might be related to the changes introduced with scipy v1.9 (https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mode.html):
Beginning in SciPy 1.9, np.matrix inputs (not recommended for new code) are converted to np.ndarray before the calculation is performed. In this case, the output will be a scalar or np.ndarray of appropriate shape rather than a 2D np.matrix. Similarly, while masked elements of masked arrays are ignored, the output will be a scalar or np.ndarray rather than a masked array with mask=False.
This is fixed by replacing line 55 in ClusterUtils.py:
silhouette_score_mode = scipy.stats.mode(silhouette_score_samples)[0][0]
with
silhouette_score_mode = scipy.stats.mode(silhouette_score_samples)[0]
or checking generally with something like:
mode_result = scipy.stats.mode(silhouette_score_samples)
if mode_result.count.size == 1:
silhouette_score_mode = mode_result.mode[0]
else:
silhouette_score_mode = mode_result[0][0]