-
Notifications
You must be signed in to change notification settings - Fork 21
Adding post instruction tokens for steering #103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #103 +/- ##
==========================================
+ Coverage 46.82% 53.83% +7.01%
==========================================
Files 20 34 +14
Lines 4171 6133 +1962
==========================================
+ Hits 1953 3302 +1349
- Misses 2218 2831 +613
🚀 New features to boost your workflow:
|
…i-inspector capabilities
| # Negative index means "from the end of generated tokens", not from the end of all tokens | ||
| input_response = self.agent._hooked_responses[self.response_index]['input'][0] | ||
| if self.token_index < 0: | ||
| # Convert negative index to positive relative to generated tokens |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this block of 3 lines of code can be replaced simply by activation_index = self.token_index
Introducing the ability to get activations and steering on the post-instruction tokens (as described in the appendix of https://arxiv.org/pdf/2406.11717 )