Skip to content

Conversation

@SevKod
Copy link
Contributor

@SevKod SevKod commented Nov 6, 2025

Introducing the ability to get activations and steering on the post-instruction tokens (as described in the appendix of https://arxiv.org/pdf/2406.11717 )

@codecov
Copy link

codecov bot commented Nov 6, 2025

Codecov Report

❌ Patch coverage is 7.96020% with 185 lines in your changes missing coverage. Please review.
✅ Project coverage is 53.83%. Comparing base (75a2743) to head (3865a98).
⚠️ Report is 181 commits behind head on main.

Files with missing lines Patch % Lines
src/sdialog/interpretability/__init__.py 7.18% 168 Missing ⚠️
src/sdialog/agents.py 15.00% 17 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #103      +/-   ##
==========================================
+ Coverage   46.82%   53.83%   +7.01%     
==========================================
  Files          20       34      +14     
  Lines        4171     6133    +1962     
==========================================
+ Hits         1953     3302    +1349     
- Misses       2218     2831     +613     
Files with missing lines Coverage Δ
src/sdialog/agents.py 50.28% <15.00%> (-1.79%) ⬇️
src/sdialog/interpretability/__init__.py 15.57% <7.18%> (-5.91%) ⬇️

... and 16 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

# Negative index means "from the end of generated tokens", not from the end of all tokens
input_response = self.agent._hooked_responses[self.response_index]['input'][0]
if self.token_index < 0:
# Convert negative index to positive relative to generated tokens
Copy link
Member

@sergioburdisso sergioburdisso Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this block of 3 lines of code can be replaced simply by activation_index = self.token_index

@sergioburdisso sergioburdisso merged commit 45a99d9 into idiap:main Nov 14, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants