-
Notifications
You must be signed in to change notification settings - Fork 1.8k
SSA: Add a shared signature for SSA and a module to implement it. #20646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a shared signature for SSA (Static Single Assignment) and a module to implement it, as a step towards a unified SSA API. The change enables code sharing across languages by providing standardized signatures and implementations that eliminate the need for individual languages to cache SSA predicates and add wrapper classes.
Key changes:
- Added a comprehensive
SsaSigsignature defining the complete SSA class hierarchy and predicates - Introduced a
MakeSsamodule that implements theSsaSigsignature using existing SSA infrastructure - Updated the
InputSigto use generic type signatures instead of specificBasicBlockSig
caaaf36 to
b196714
Compare
hvitved
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, some comments.
| * For SSA definitions occurring at the beginning of a basic block, such as | ||
| * phi definitions, this will get the first control flow node of the basic block. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that is the case for existing languages. Would it be problematic for this predicate to not return a value for phi nodes (as well as SsaImplicitEntryDefinitions)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that this is a change for several languages, but Java depends on this in a couple of places, and I actually also made the change independently for C# over in #20558, since I needed it for nullness.
And in a vacuum I think it does make sense to always return a value - not returning a value for phis and entries present a bit of a gotcha, I think.
shared/ssa/codeql/ssa/Ssa.qll
Outdated
| * variable would be visible. | ||
| */ | ||
| class SsaPhiDefinition extends SsaDefinition { | ||
| /** Holds if `inp` is an input to the phi definition along the edge originating in `bb`. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the phi -> this phi
| * point in the flow graph where otherwise two or more definitions for the | ||
| * variable would be visible. | ||
| */ | ||
| class SsaPhiDefinition extends SsaDefinition { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest copying the example from
codeql/ruby/ql/lib/codeql/ruby/dataflow/SSA.qll
Lines 370 to 381 in 7b32cd4
| * A phi node. For example, in | |
| * | |
| * ```rb | |
| * if b | |
| * x = 0 | |
| * else | |
| * x = 1 | |
| * end | |
| * puts x | |
| * ``` | |
| * | |
| * a phi node for `x` is inserted just before the call `puts x`. |
shared/ssa/codeql/ssa/Ssa.qll
Outdated
| /** Holds if `inp` is an input to the phi definition along the edge originating in `bb`. */ | ||
| predicate hasInputFromBlock(SsaDefinition inp, BasicBlock bb); | ||
|
|
||
| /** Gets an input of this phi definition. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy example from
| final Definition getAnInput() { this.hasInputFromBlock(result, _) } |
shared/ssa/codeql/ssa/Ssa.qll
Outdated
| */ | ||
| Expr getValue(); | ||
|
|
||
| /** Holds if this write is an initialization of a parameter. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a parameter -> parameter p.
shared/ssa/codeql/ssa/Ssa.qll
Outdated
| * pseudo-read of the captured variable at the point of capture. | ||
| */ | ||
| predicate variableCapture( | ||
| SourceVariable captured, SourceVariable closureVar, BasicBlock bb, int i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that in C# and Java, we tag local variables with a callable to account for captures, but I forgot why that is needed? In both Ruby and Rust, we simply have SourceVariable = LocalVariable, and that works perfectly fine, but would that work with the signature above (i.e., captured = closureVar)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I seem to recall that it was necessary - at least at some point, but I also forgot exactly why. Looking at the shared SSA lib, I don't think it's necessary there. It might be necessary (or perhaps just convenient) in the language-specific portions for C# and Java.
This signature is indeed an example where it would break if SourceVariables were not tied to a callable - we could fix it by adding a column with e.g. the first BasicBlock of the closure to implicitly identify the callable. Do you think we should do that here? Otherwise we'll indeed require tagging if this is to be implemented with anything other than none().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was also thinking if we should replace closureVar with the callable representing the closure or, as you suggest, its entry basic block.
| * phi nodes, this will get the first control flow node of the basic block. | ||
| */ | ||
| ControlFlowNode getControlFlowNode() { | ||
| exists(BasicBlock bb, int i | this.definesAt(_, bb, i) | result = bb.getNode(0.maximum(i))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I'm not convinced that we want 0.maximum(i).
shared/ssa/codeql/ssa/Ssa.qll
Outdated
| */ | ||
| class SsaWriteDefinition extends SsaDefinition instanceof WriteDefinition { } | ||
|
|
||
| private predicate explicitWrite(WriteDefinition def, VariableWrite write) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would cache this.
shared/ssa/codeql/ssa/Ssa.qll
Outdated
| Expr getValue() { result = this.getDefinition().getValue() } | ||
| } | ||
|
|
||
| private predicate parameterInit(WriteDefinition def, Parameter p) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would cache this.
shared/ssa/codeql/ssa/Ssa.qll
Outdated
| SsaImplicitEntryDefinition() { this.definesAt(_, any(EntryBasicBlock bb), -1) } | ||
|
|
||
| /** Holds if this is a closure definition that captures the value of `capturedvar`. */ | ||
| predicate captures(SsaDefinition capturedvar) { captures(this, capturedvar) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry that this predicate may give the wrong impression that this definition will always read the same value as capturedvar (I suspect it you included it, because that is actually the case in Java?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a bit back and forth on this. Firstly, I think it's useful to be able to get to the values of the outer scope in ad-hoc def-use chains like getAnUltimateDefinition, and Java certainly makes good use of it, even though it's technically a bit wrong for Kotlin (Kotlin allows post-capture modifications IIRC).
Another option would be to extend it to get all the possible inputs from the outer scope in case the variable is modified post-capture - that way the Java case would stay the same and other languages would see potentially multiple origins for a capture init - a bit like a phi node.
x = 0
f = () => { .. x .. } // currently sees x=0, could be extended to also see x=1
x = 1
What do you think? OTOH maybe that's silly, if we don't also include the following
x = 0
f () => { .. x .. } // can see x=0, but what about x=2 from g?
g () => { .. x=2; .. }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C# used to have rather complex logic for doing this kind of thing, but I got rid of it when switching over to the shared capture library. I think it is hard to make a good analysis for this kind of thing, without ultimately resolving to saying that all other SSA definitions could potentially be a source, which would make it useless.
So, if you still want it for Java, I think I would prefer to have that a Java-specific thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, then I think I'll just delete it from the shared class.
Unused so far, but this is intended as a step towards a unified SSA api, and a way to share more code by removing the need for individual languages to cache the SSA predicates and add wrapper classes.