Skip to content

Conversation

@shahzadhaider1
Copy link
Contributor

This PR addresses false positive issues in the Twilio detector by making the regex patterns more context-aware and removing an overly generic keyword.

Changes Made

1. Added Context-Aware Regex Patterns

Updated both sidPat and keyPat to require contextual keywords within 40 characters of the credential:

// Before
sidPat = regexp.MustCompile(`\bAC[0-9a-f]{32}\b`)
keyPat = regexp.MustCompile(`\b[0-9a-f]{32}\b`)

// After
sidPat = regexp.MustCompile(detectors.PrefixRegex([]string{"twilio", "account", "sid"}) + `\b(AC[0-9a-f]{32})\b`)
keyPat = regexp.MustCompile(detectors.PrefixRegex([]string{"twilio", "auth", "token", "key"}) + `\b([0-9a-f]{32})\b`)

Why: The previous keyPat matched any 32-character hexadecimal string, which is extremely common in codebases (MD5 hashes, commit SHAs, etc.). By requiring proximity to Twilio-related keywords, we significantly reduce false matches while maintaining detection of legitimate credentials.

2. Removed "sid" from Keywords

// Before
func (s Scanner) Keywords() []string {
    return []string{"sid", "twilio"}
}

// After
func (s Scanner) Keywords() []string {
    return []string{"twilio"}
}

Why: The keyword "sid" is extremely common in code (session IDs, database fields, variable names like user_sid, request_sid, etc.) and was causing the detector to run unnecessarily on a large percentage of scanned files. Since Twilio Account SIDs always start with "AC" and our regex already requires contextual keywords, keeping only "twilio" as the trigger is sufficient and improves performance.

3. Switched to FindAllStringSubmatch

Updated the pattern matching to use FindAllStringSubmatch instead of FindAllString:

// Before
keyMatches := keyPat.FindAllString(dataStr, -1)
sidMatches := sidPat.FindAllString(dataStr, -1)

// After
keyMatches := keyPat.FindAllStringSubmatch(dataStr, -1)
sidMatches := sidPat.FindAllStringSubmatch(dataStr, -1)

for _, sidMatch := range sidMatches {
    sid := sidMatch[1]  // Extract capture group

Why: With the addition of capturing groups in the regex patterns, we need FindAllStringSubmatch to properly extract just the credential values (capture group [1]) without the surrounding context keywords that are used for filtering.

Impact

  • Reduces false positives: Only matches hex strings that appear near Twilio-related keywords
  • Improves performance: Detector runs less frequently by removing the generic "sid" keyword
  • Maintains detection accuracy: Legitimate Twilio credentials will still be detected as they typically appear with contextual keywords like "twilio_auth_token", "TWILIO_ACCOUNT_SID", etc.

Testing

Verified that the detector still matches valid Twilio credentials in common formats while filtering out unrelated hex strings and reducing unnecessary detector invocations.

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

@shahzadhaider1 shahzadhaider1 requested a review from a team as a code owner October 23, 2025 12:26
@shahzadhaider1 shahzadhaider1 changed the title added contextual keywords in regex and removed unnecessary detector k… Reduce False Positives in Twilio Detector Oct 24, 2025
@shahzadhaider1 shahzadhaider1 requested a review from a team October 24, 2025 11:45
@shahzadhaider1 shahzadhaider1 force-pushed the INS-65-enhance-twilio-detector branch from d48eec3 to 7dfc3d6 Compare October 24, 2025 13:42
Copy link
Contributor

@camgunz camgunz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your analysis here is correct--the profile indicated that regexp was allocating a lot and the O(N2) combined with a very common pattern is probably why that was happening. Let's merge this and be very careful w/ the rollout--I'll sync up w/ you on Slack

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants