-
Notifications
You must be signed in to change notification settings - Fork 11
Configurable and extended call and function parameter handling #52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Preliminary support for marking parameters of arbitrary functions static. The added interface enables the known-ness of parameters of arbitrary functions to be specified. The implementation builds on the memory range functionality and generalizes the function configuration interface such that the handling of rewriting entry functions and all other functions are unified.
Add configuration options for configuring that at a function should not be entered by the emulator. The function call can either be passed through (static parameter values are loaded before the call) or executed optionally mark its return value as static.
Useful for e.g. specialized handling of libc functions. Currently only implemented for memcpy Refactor call instruction handling (move bypassEmu stuff in sep. fun)
|
Some comments (not just about this MR but partly more general):
|
|
On 13/05/17 06:20, aengelke wrote:
Some comments (not just about this MR but partly more general):
- I'm worried about calling an arbitrary function at emulation time,
this can have unintended effects including memory access (we should
maintain a hashmap for emulating memory access, s.t. the real memory
is never modified) and will "fail" (crash) for functions like `call $;
pop rax; ...`. (Yes, probably no compiler will emit this and it is the
user's fault if the flag is enabled, but we can't be sure.)
Yes, of course it's dangerous, but I also think there are very valid
usecases for doing special handling of functions which simply provide
tinput to the specialization process in this manner. It's invaluable for
the MPI code, for instance. The code currently ensures that the state of
input parameters to the function matches the input parameter map defined
for the function. From there, it's up to the user to ensure that calling
the function during emulation is safe as we have no generic way to, as
you say, ensure that arbitrary functions are referentially transparent.
A safer way to implement this would be to perform a non-capturing
emulation where all memory reads return static values and memory writes
are tracked. I absolutely think we should do this.
My devel branch will never perform an arbitrary memory read or write
unless its address a) is on the stack, b) points to a defined static
memory region, c) was calculated from a stack relative offset or d)
calculated relative to the instruction pointer. Furthemore, it tracks
mutable memory as part of the EmuState which was required to generate
correct code in some cases.
We should probably take it all the way and, as you suggest, implement a
copy-on-write scheme which prevents the actual memory from being
modified. I opted not to do this as it required more work without
contributing tot he results.
- There are (currently?) three pools of parameters: GP regs, SSE regs, stack. These should be handled correctly. (Ideally, we would model the stack as another STATIC2 memory region, but I guess no one will ever implement this.)
Yes they should. GP and (partially) stack params are currently handled.
Never got around to implementing SSE pars. I guess it'll be done when
somebody finds a need for it.
- What is the purpose of `FC_SetReturnDynamic`?
The compiled MPI DDT copying code depends on the return value of memcpy
(i.e. the rax register) even though the source code does not. When the
memcpy call is kept (i.e. not inlined) we cannot generate code which
depends on a staic value of rax. Therefore this flag.
- I'm still not too convinced about handling libc functions with pseudo-instructions.
We've discussed this already. Feel free to provide an alternative
implementation :) The current approach provides the flexibility of
allowing the user to say that a function is semantically identical to a
standard library function and shouuld be handeled as such.
- The build in Travis fails (unused function, missing field `returnOrigOnFail`).
Yes. I'll get around to fixing the PR's. I missed a commit, I suppose.
…
--
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub:
#52 (comment)
|
|
Btw, usage examples of these new config interfaces are found in https://github.com/truls/dbrew-ddtbench/blob/master/src_c/utilities.c |
|
Am 13.05.2017 um 18:03 schrieb Truls Asheim:
On 13/05/17 06:20, aengelke wrote:
> - I'm worried about calling an arbitrary function at emulation time,
> ...
Yes, of course it's dangerous, but I also think there are very valid
usecases
I did not yet check the patch. Is this about calling an arbitrary
function in a native way? Then I am with Alexis, this never should be
done.
The current capturing process already can go into every arbitrary
function. So what is different? Do we want a config that says that
when we go into some function, all memory reads should be assumed
to be static, and memory writes can be ignored (or only need to
be tracked during processing of this function).
If all function parameters are static, too, no capturing should be
done anyways, and we only get a static return value back, or?
A safer way to implement this would be to perform a non-capturing
emulation where all memory reads return static values and memory writes
are tracked. I absolutely think we should do this.
Of course this special mode can be implemented, but I am yet not sure
if we actually need it.
My devel branch will never perform an arbitrary memory read or write
unless its address a) is on the stack, b) points to a defined static
memory region, c) was calculated from a stack relative offset or d)
calculated relative to the instruction pointer. Furthemore, it tracks
mutable memory as part of the EmuState which was required to generate
correct code in some cases.
Regarding this mutable memory: who decides/specifies which part
of memory should be tracked?
We should probably take it all the way and, as you suggest, implement a
copy-on-write scheme which prevents the actual memory from being
modified. I opted not to do this as it required more work without
contributing tot he results.
Writing to memory is a side effect, and the pure rewriting process of
dbrew never should have a side effect, yes. But what do you get by
tracking memory writes during rewriting, if it is not captured? The
original code did a write, and any rewriting should also do the write,
to get this side effect. Otherwise the result is not the same. The
stack is special here, as it becomes invalid if the function returns.
So, stack writes into the current frame can be seen as having no side
effect...
> - There are (currently?) three pools of parameters: GP regs, SSE regs, stack. These should be handled correctly. (Ideally, we would model the stack as another STATIC2 memory region, but I guess no one will ever implement this.)
How would this be possible? The stack can be written to with non-static
data...
Yes they should. GP and (partially) stack params are currently handled.
Never got around to implementing SSE pars. I guess it'll be done when
somebody finds a need for it.
Yes, at some point, the vector register state should be tracked, too.
A general comment: Can we have (Github) issues for features that should be
implemented at some point? They can link to this discussion.
|
I meant that any fixed parameter region on the stack should be marked as STATIC2, not the whole stack. LLVM has attributes for |
|
On 14/05/17 04:08, weidendo wrote:
Am 13.05.2017 um 18:03 schrieb Truls Asheim:
> On 13/05/17 06:20, aengelke wrote:
>> - I'm worried about calling an arbitrary function at emulation time,
>> ...
> Yes, of course it's dangerous, but I also think there are very valid
> usecases
I did not yet check the patch. Is this about calling an arbitrary
function in a native way? Then I am with Alexis, this never should be
done.
I agree, it's nicer to handle it through the emulator (as discussed
below). Executing functions natively was a quick hack, if you will, that
was less work than modifying the emulator. The way I'm using it
currently, though, is perfectly safe.
The current capturing process already can go into every arbitrary
function. So what is different? Do we want a config that says that
when we go into some function, all memory reads should be assumed
to be static, and memory writes can be ignored (or only need to
be tracked during processing of this function).
Yes we want that although we can not always simply ignore memory writes.
Code may depend on being able to modify its memory. We should instead
handle memory writes through a copy-on-write layer such that we can
prevent the rewriting process from modifycing general program memory.
If all function parameters are static, too, no capturing should be
done anyways, and we only get a static return value back, or?
Currently some "noise" (function pro- and epilogues, stack adjustment,
etc...) is always captured when the emulator enters a function. But
yes, ideally a function where everything is static should not cause
anything to be captured, but that's a separate issue.
The problem solved by this emulation bypass mode is when the return
value of a function depends on memory that was written before the
rewriting process started and which we have no way of marking static in
DBrew, then we need a way to get it's return value into DBrew. There are
more details about this in the report.
> A safer way to implement this would be to perform a non-capturing
> emulation where all memory reads return static values and memory writes
> are tracked. I absolutely think we should do this.
Of course this special mode can be implemented, but I am yet not sure
if we actually need it.
Then suggest an alternative way to handle, for example. the
MPIR_ToPointer function that works equally well.
> My devel branch will never perform an arbitrary memory read or write
> unless its address a) is on the stack, b) points to a defined static
> memory region, c) was calculated from a stack relative offset or d)
> calculated relative to the instruction pointer. Furthemore, it tracks
> mutable memory as part of the EmuState which was required to generate
> correct code in some cases.
Regarding this mutable memory: who decides/specifies which part
of memory should be tracked?
The user. All defined, mutable memory areas are tracked. "Tracked" may
be the wrong word here. It simply means that defined mutable memory
areas are treated as a part of the EmuState and saved and restored as
the emulation progresses in the same way as, e,g., the stack.
> We should probably take it all the way and, as you suggest, implement a
> copy-on-write scheme which prevents the actual memory from being
> modified. I opted not to do this as it required more work without
> contributing tot he results.
Writing to memory is a side effect, and the pure rewriting process of
dbrew never should have a side effect, yes. But what do you get by
tracking, memory writes during rewriting, if it is not captured? The
original code did a write, and any rewriting should also do the write,
to get this side effect. Otherwise the result is not the same. The
stack is special here, as it becomes invalid if the function returns.
So, stack writes into the current frame can be seen as having no side
effect...
This is about ensuring that the specialization process does not change
any memory. For example. if we rewrite a function and then subsequently
execute the rewritten function, it should see the world exactly as if
the rewriting hadn't taken place. That is, the rewriting process should,
in effect, not change any memory that isn't local to the functions being
rewritten. This is not about changing how writes to non-static memory
are captured, so these considerations only applies to the proposed
non-capturing all-static-memory emulation mode.
A general comment: Can we have (Github) issues for features that should be
implemented at some point? They can link to this discussion.
Yes, that would be nice.
|
This PR contains support for:
These features has a lot of cross-dependencies. Thus the combined PR.