-
Notifications
You must be signed in to change notification settings - Fork 0
fix(dashboard-components): sequence code X is now accepted as valid when parseAmbiguousSymbols=true #1066
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
| const nucleotideChars = 'ACGTRYKMSWBDHVN'; | ||
| const aminoAcidChars = 'ACDEFGHIKLMNPQRSTVWY'; | ||
| // Ambiguous IUPAC symbols (excluded from standard parsing but can be enabled via parseAmbiguousSymbols flag) | ||
| const ambiguousNucleotideChars = 'X'; // Unknown nucleotide |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I think we need all the nucleotideChars that are not ATCG:
| const ambiguousNucleotideChars = 'X'; // Unknown nucleotide | |
| const ambiguousNucleotideChars = 'RYKMSWBDHVN'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I guess we shouldn't have them in nucleotideChars then anymore.
| } | ||
| } | ||
|
|
||
| export function toMutation( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this method unused?
| expect(SubstitutionClass.parse('gene1:A234X')).to.equal(null); | ||
| expect(SubstitutionClass.parse('gene1:A234X', false, false)).to.equal(null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that the desired behavior? I thought that ambiguity codes should always be allowed in the substitutionValue?
| }; | ||
| } | ||
| const maybeSubstitution = SubstitutionClass.parse(code); | ||
| const maybeSubstitution = SubstitutionClass.parse(code, false, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| const maybeSubstitution = SubstitutionClass.parse(code, false, true); | |
| const maybeSubstitution = SubstitutionClass.parse(code, { segmentIsOptional: false, parseAmbiguousSymbols: true }); |
I don't really like unnamed boolean arguments. Having them named with this "options object workaround" also helps with making the order of optional arguments irrelevant.
resolves #1064
Summary
In some cases we do want X to be rejected, so I opted instead to make it possible to parse with ambiguity. This is used now for the mutations-over-time component, which fixes the problem we had before.
Screenshot
The previously failing plot now working:
Claude code plan
Read the Claude Code plan below for more details on the investigation of the bug as well as finding an appropriate solution.
Details
Bug Investigation: X234T Mutation Validation Error
Problem Summary
Mutations like "X234T" supplied in
includeMutationsare incorrectly rejected as invalid inqueryMutationsOverTime.ts, even though 'X' is a valid IUPAC character representing an unknown nucleotide or amino acid.Data Flow Trace
From includeMutations to Error
Key Files
components/src/query/queryMutationsOverTime.ts:269components/src/utils/mutations.ts:17-18components/src/utils/mutations.ts:82-98Why We're Parsing Mutations
The code parses mutations in two scenarios:
1. User-Supplied Mutations (includeMutations)
In
codeToEmptyEntry()(queryMutationsOverTime.ts:70-78):Purpose: Create empty entry objects (with count: 0) for mutations that users want to display, even if they don't appear in the API response. This ensures the mutations-over-time grid shows these mutations with zero prevalence rather than omitting them entirely.
2. API Response Mutations
In
parseMutationCode()(queryMutationsOverTime.ts:261-270):Purpose: Convert mutation code strings from the API into typed
SubstitutionClassorDeletionClassobjects. These objects provide:Root Cause of Bug
Incomplete IUPAC Character Sets
In
components/src/utils/mutations.ts:17-18:Per IUPAC standards (https://www.bioinformatics.org/sms/iupac.html):
Regex Validation Failure
The regex in
buildSubstitutionRegex()(mutations.ts:28-37):For "X234T":
nucleotideChars→ nucleotide regex failsaminoAcidChars→ amino acid regex failsBroader Impact Analysis
Where Mutation Parsing is Used
The mutation parsing logic (
SubstitutionClass.parse()andDeletionClass.parse()) is used in 8 files:queryMutationsOverTime.ts(lines 60-80, 260-269)codeToEmptyEntry()- Creates empty entries for user-supplied mutationsparseMutationCode()- Primary entry point for parsing API response mutationsqueryWastewaterMutationsOverTime.ts(line 60)transformMutations()- Parses wastewater mutation frequency dataparseAndValidateMutation.ts(lines 35-76)mutation-comparison-venn.tsx(line 147)getMutationsGridData.ts(lines 3-7)basesexport to initialize grid columnsmutations-grid.tsx(line 46)basesarray7-8. Test files:
mutations.spec.ts,queryWastewaterMutationsOverTime.spec.ts, etc.Documentation Inconsistency Found
Critical finding: The documentation explicitly states that 'X' is supported:
From
mutation-filter-info.tsx:140-142:However, the implementation does not support 'X' - this is an inconsistency between docs and code.
Is the Exclusion Intentional?
Evidence suggests it may be intentional for domain-specific reasons:
Coverage Calculations (queryMutationsOverTime.ts:226-228):
// 'coverage' in the API resp. is the number of seqs. that have a non-ambiguous symbol at positionThe system explicitly distinguishes between ambiguous and non-ambiguous symbols for statistical calculations.
Proportion Calculations (mutation-info.tsx):
Y(meaning T or C) are excluded from proportion calculationsUI Display (
basesarray in mutations.ts:262-287):The UI grid only shows concrete bases, not ambiguous symbols.
LAPIS Backend Design:
Potential Impact if 'X' Were Added
Critical dependencies:
basesarray would need updating (creates new grid column)Recent Related Changes
Stop Codon Support (PR #987, Sept 2025):
*(stop codon) support to character setsFiles That Would Need Changes
If 'X' support were to be added:
mutations.ts:17-18- Add 'X' to character setsmutations.ts:262-287- Add 'X' tobasesarray (creates UI column)mutations.spec.ts- Add test cases for 'X' parsinggetMutationsGridData.ts- Verify grid initialization handles 'X'Recommended Solution
Add an optional
parseAmbiguousSymbolsparameter to the parsing logic that allows ambiguous symbols (like 'X') to be parsed when explicitly enabled.Design Approach
Key Principle: Maintain backward compatibility by defaulting to current behavior (rejecting ambiguous symbols), but allow specific use cases to opt-in.
Benefits:
includeMutationsto accept user-specified ambiguous mutationsImplementation Plan
1. Update Character Sets (
mutations.ts:16-18)Add ambiguous character definitions:
2. Update Regex Builders (
mutations.ts:28-38, 101-108, 163-172)Modify
buildSubstitutionRegex,buildDeletionRegex, andbuildInsertionRegexto accept a parameter:Apply similar changes to
buildDeletionRegexandbuildInsertionRegex.3. Update Regex Initialization (
mutations.ts:40-42, 110-111, 174-175)Create two sets of regexes:
Do the same for deletion and insertion regexes.
4. Update Parse Methods (
mutations.ts:82-98)Add
parseAmbiguousSymbolsparameter toSubstitutionClass.parse(),DeletionClass.parse(), andInsertionClass.parse():5. Update Call Sites
Enable for includeMutations (
queryMutationsOverTime.ts:70-78):Keep disabled (default) for all other call sites:
parseMutationCode()in queryMutationsOverTime.ts:260-269transformMutations()in queryWastewaterMutationsOverTime.ts:60parseAndValidateMutation()in parseAndValidateMutation.ts:35-766. Add Tests (
mutations.spec.ts)Add test cases for ambiguous symbol parsing:
Add similar tests for
DeletionClassandInsertionClass.7. Add Integration Tests (
queryMutationsOverTime.spec.ts)Test that includeMutations works with 'X':
8. Update Documentation
Update
mutation-filter-info.tsxto clarify when 'X' is supported:Add code comments in
mutations.ts:Files to Modify
components/src/utils/mutations.tsbuildSubstitutionRegexsignature and implementationSubstitutionClass.parse()signaturecomponents/src/query/queryMutationsOverTime.tscodeToEmptyEntry()to passparseAmbiguousSymbols=truecomponents/src/utils/mutations.spec.tscomponents/src/query/queryMutationsOverTime.spec.tsSummary
This solution:
PR Checklist