The main goal of this project was to address a fundamental limitation in traditional counterfactual explanation generation: they typically begin as optimization problems over the entire input feature space. However, not all input features are equally necessary for generating meaningful counterfactual explanations. To solve this problem, we leverage Self-Explaining Neural Networks (SENN), which extract important concepts from the input data and assign them appropriate relevance scores. Using these learned concepts as a foundation, we can reduce our search space from the entire input feature set to a more focused set of relevant concepts.
The dataset used is the MNIST data set.
The code is highly annotated. There is a document string present explaining the details in almost every major parts of the code
For SENN classification :
- accuracy For counterfactual generation :
- a robustness check
- L2 distance
- sparsity
- concept change
- concept relevance scores
- also visual aid for comparison
Metrics Summary:
Class 0→1: L2=11.69; Sparsity=55.5%; Concept Δ=-1.32
Class 0→2: L2=10.48; Sparsity=42.5%; Concept Δ=-0.60
Class 0→3: L2=9.16; Sparsity=35.1%; Concept Δ=-0.52
Class 0→4: L2=10.00; Sparsity=41.6%; Concept Δ=-1.57
Class 0→5: L2=6.03; Sparsity=22.4%; Concept Δ=-0.09
Class 0→6: L2=7.69; Sparsity=32.7%; Concept Δ=-1.13
Class 0→7: L2=8.24; Sparsity=35.8%; Concept Δ=-0.62
Class 0→8: L2=7.89; Sparsity=36.1%; Concept Δ=-0.31
Class 0→9: L2=8.28; Sparsity=35.1%; Concept Δ=-0.75
Metrics Summary:
Class 1→0: L2=10.37; Sparsity=30.9%; Concept Δ=-0.98
Class 1→2: L2=9.61; Sparsity=31.9%; Concept Δ=-0.45
Class 1→3: L2=9.39; Sparsity=34.3%; Concept Δ=-0.42
Class 1→4: L2=5.61; Sparsity=21.0%; Concept Δ=0.24
Class 1→5: L2=8.68; Sparsity=29.8%; Concept Δ=-0.26
Class 1→6: L2=9.05; Sparsity=32.0%; Concept Δ=-0.82
Class 1→7: L2=7.87; Sparsity=26.8%; Concept Δ=-0.56
Class 1→8: L2=8.09; Sparsity=25.5%; Concept Δ=-0.11
Class 1→9: L2=6.96; Sparsity=27.6%; Concept Δ=-0.47

Metrics Summary:
Class 2→0: L2=6.65; Sparsity=24.9%; Concept Δ=-0.16
Class 2→1: L2=8.63; Sparsity=42.2%; Concept Δ=-0.92
Class 2→3: L2=7.03; Sparsity=31.1%; Concept Δ=0.24
Class 2→4: L2=9.95; Sparsity=48.3%; Concept Δ=-1.92
Class 2→5: L2=8.79; Sparsity=35.6%; Concept Δ=-0.30
Class 2→6: L2=7.75; Sparsity=29.5%; Concept Δ=-0.82
Class 2→7: L2=10.83; Sparsity=43.5%; Concept Δ=-1.62
Class 2→8: L2=7.30; Sparsity=26.9%; Concept Δ=0.07
Class 2→9: L2=9.57; Sparsity=41.3%; Concept Δ=-1.83
And similar outputs for the rest of the classes.
Alvarez-Melis & Jaakkola (2018) - Towards Robust Interpretability with Self-Explaining Neural Networks
AmanDaVinci et al. - Self-Explaining Neural Networks: A Review with Extensions

