Configurable and extended call and function parameter handling #52

truls · 2017-05-12T14:13:46Z

This PR contains support for:

More than 6 function parameters
Changing function parameter state dynamic -> static upon function invocation (I'm not sure this should be included)
A new and more flexible API for specifying static-ness and role of function parameters (I'm not sure if I like my actual implementation for several reasons)
Special call handling including: Executing a function outside of DBrew and getting it's return value in DBrew and keeping a function call unchanged.

These features has a lot of cross-dependencies. Thus the combined PR.

Preliminary support for marking parameters of arbitrary functions static. The added interface enables the known-ness of parameters of arbitrary functions to be specified. The implementation builds on the memory range functionality and generalizes the function configuration interface such that the handling of rewriting entry functions and all other functions are unified.

Add configuration options for configuring that at a function should not be entered by the emulator. The function call can either be passed through (static parameter values are loaded before the call) or executed optionally mark its return value as static.

Useful for e.g. specialized handling of libc functions. Currently only implemented for memcpy Refactor call instruction handling (move bypassEmu stuff in sep. fun)

aengelke · 2017-05-13T13:20:43Z

Some comments (not just about this MR but partly more general):

I'm worried about calling an arbitrary function at emulation time, this can have unintended effects including memory access (we should maintain a hashmap for emulating memory access, s.t. the real memory is never modified) and will "fail" (crash) for functions like call $; pop rax; .... (Yes, probably no compiler will emit this and it is the user's fault if the flag is enabled, but we can't be sure.)
There are (currently?) three pools of parameters: GP regs, SSE regs, stack. These should be handled correctly. (Ideally, we would model the stack as another STATIC2 memory region, but I guess no one will ever implement this.)
What is the purpose of FC_SetReturnDynamic?
I'm still not too convinced about handling libc functions with pseudo-instructions.
The build in Travis fails (unused function, missing field returnOrigOnFail).

truls · 2017-05-13T16:03:02Z

On 13/05/17 06:20, aengelke wrote: Some comments (not just about this MR but partly more general): - I'm worried about calling an arbitrary function at emulation time, this can have unintended effects including memory access (we should maintain a hashmap for emulating memory access, s.t. the real memory is never modified) and will "fail" (crash) for functions like `call $; pop rax; ...`. (Yes, probably no compiler will emit this and it is the user's fault if the flag is enabled, but we can't be sure.)

Yes, of course it's dangerous, but I also think there are very valid usecases for doing special handling of functions which simply provide tinput to the specialization process in this manner. It's invaluable for the MPI code, for instance. The code currently ensures that the state of input parameters to the function matches the input parameter map defined for the function. From there, it's up to the user to ensure that calling the function during emulation is safe as we have no generic way to, as you say, ensure that arbitrary functions are referentially transparent. A safer way to implement this would be to perform a non-capturing emulation where all memory reads return static values and memory writes are tracked. I absolutely think we should do this. My devel branch will never perform an arbitrary memory read or write unless its address a) is on the stack, b) points to a defined static memory region, c) was calculated from a stack relative offset or d) calculated relative to the instruction pointer. Furthemore, it tracks mutable memory as part of the EmuState which was required to generate correct code in some cases. We should probably take it all the way and, as you suggest, implement a copy-on-write scheme which prevents the actual memory from being modified. I opted not to do this as it required more work without contributing tot he results.

- There are (currently?) three pools of parameters: GP regs, SSE regs, stack. These should be handled correctly. (Ideally, we would model the stack as another STATIC2 memory region, but I guess no one will ever implement this.)

Yes they should. GP and (partially) stack params are currently handled. Never got around to implementing SSE pars. I guess it'll be done when somebody finds a need for it.

- What is the purpose of `FC_SetReturnDynamic`?

The compiled MPI DDT copying code depends on the return value of memcpy (i.e. the rax register) even though the source code does not. When the memcpy call is kept (i.e. not inlined) we cannot generate code which depends on a staic value of rax. Therefore this flag.

- I'm still not too convinced about handling libc functions with pseudo-instructions.

We've discussed this already. Feel free to provide an alternative implementation :) The current approach provides the flexibility of allowing the user to say that a function is semantically identical to a standard library function and shouuld be handeled as such.

- The build in Travis fails (unused function, missing field `returnOrigOnFail`).

Yes. I'll get around to fixing the PR's. I missed a commit, I suppose.

…

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: #52 (comment)

truls · 2017-05-13T16:15:51Z

Btw, usage examples of these new config interfaces are found in https://github.com/truls/dbrew-ddtbench/blob/master/src_c/utilities.c

weidendo · 2017-05-14T11:08:43Z

Am 13.05.2017 um 18:03 schrieb Truls Asheim:

On 13/05/17 06:20, aengelke wrote: > - I'm worried about calling an arbitrary function at emulation time, > ... Yes, of course it's dangerous, but I also think there are very valid usecases

I did not yet check the patch. Is this about calling an arbitrary function in a native way? Then I am with Alexis, this never should be done. The current capturing process already can go into every arbitrary function. So what is different? Do we want a config that says that when we go into some function, all memory reads should be assumed to be static, and memory writes can be ignored (or only need to be tracked during processing of this function). If all function parameters are static, too, no capturing should be done anyways, and we only get a static return value back, or?

A safer way to implement this would be to perform a non-capturing emulation where all memory reads return static values and memory writes are tracked. I absolutely think we should do this.

Of course this special mode can be implemented, but I am yet not sure if we actually need it.

My devel branch will never perform an arbitrary memory read or write unless its address a) is on the stack, b) points to a defined static memory region, c) was calculated from a stack relative offset or d) calculated relative to the instruction pointer. Furthemore, it tracks mutable memory as part of the EmuState which was required to generate correct code in some cases.

Regarding this mutable memory: who decides/specifies which part of memory should be tracked?

We should probably take it all the way and, as you suggest, implement a copy-on-write scheme which prevents the actual memory from being modified. I opted not to do this as it required more work without contributing tot he results.

Writing to memory is a side effect, and the pure rewriting process of dbrew never should have a side effect, yes. But what do you get by tracking memory writes during rewriting, if it is not captured? The original code did a write, and any rewriting should also do the write, to get this side effect. Otherwise the result is not the same. The stack is special here, as it becomes invalid if the function returns. So, stack writes into the current frame can be seen as having no side effect...

> - There are (currently?) three pools of parameters: GP regs, SSE regs, stack. These should be handled correctly. (Ideally, we would model the stack as another STATIC2 memory region, but I guess no one will ever implement this.)

How would this be possible? The stack can be written to with non-static data...

Yes they should. GP and (partially) stack params are currently handled. Never got around to implementing SSE pars. I guess it'll be done when somebody finds a need for it.

Yes, at some point, the vector register state should be tracked, too. A general comment: Can we have (Github) issues for features that should be implemented at some point? They can link to this discussion.

aengelke · 2017-05-14T12:48:29Z

How would this be possible? The stack can be written to with non-static data...

I meant that any fixed parameter region on the stack should be marked as STATIC2, not the whole stack.

LLVM has attributes for readnone, readonly, argmemonly and even speculatable, maybe a scheme like this might be useful?

truls · 2017-05-14T16:48:45Z

On 14/05/17 04:08, weidendo wrote: Am 13.05.2017 um 18:03 schrieb Truls Asheim: > On 13/05/17 06:20, aengelke wrote: >> - I'm worried about calling an arbitrary function at emulation time, >> ... > Yes, of course it's dangerous, but I also think there are very valid > usecases I did not yet check the patch. Is this about calling an arbitrary function in a native way? Then I am with Alexis, this never should be done.

I agree, it's nicer to handle it through the emulator (as discussed below). Executing functions natively was a quick hack, if you will, that was less work than modifying the emulator. The way I'm using it currently, though, is perfectly safe.

The current capturing process already can go into every arbitrary function. So what is different? Do we want a config that says that when we go into some function, all memory reads should be assumed to be static, and memory writes can be ignored (or only need to be tracked during processing of this function).

Yes we want that although we can not always simply ignore memory writes. Code may depend on being able to modify its memory. We should instead handle memory writes through a copy-on-write layer such that we can prevent the rewriting process from modifycing general program memory.

If all function parameters are static, too, no capturing should be done anyways, and we only get a static return value back, or?

Currently some "noise" (function pro- and epilogues, stack adjustment, etc...) is always captured when the emulator enters a function. But yes, ideally a function where everything is static should not cause anything to be captured, but that's a separate issue. The problem solved by this emulation bypass mode is when the return value of a function depends on memory that was written before the rewriting process started and which we have no way of marking static in DBrew, then we need a way to get it's return value into DBrew. There are more details about this in the report.

> A safer way to implement this would be to perform a non-capturing > emulation where all memory reads return static values and memory writes > are tracked. I absolutely think we should do this. Of course this special mode can be implemented, but I am yet not sure if we actually need it.

Then suggest an alternative way to handle, for example. the MPIR_ToPointer function that works equally well.

> My devel branch will never perform an arbitrary memory read or write > unless its address a) is on the stack, b) points to a defined static > memory region, c) was calculated from a stack relative offset or d) > calculated relative to the instruction pointer. Furthemore, it tracks > mutable memory as part of the EmuState which was required to generate > correct code in some cases. Regarding this mutable memory: who decides/specifies which part of memory should be tracked?

The user. All defined, mutable memory areas are tracked. "Tracked" may be the wrong word here. It simply means that defined mutable memory areas are treated as a part of the EmuState and saved and restored as the emulation progresses in the same way as, e,g., the stack.

> We should probably take it all the way and, as you suggest, implement a > copy-on-write scheme which prevents the actual memory from being > modified. I opted not to do this as it required more work without > contributing tot he results. Writing to memory is a side effect, and the pure rewriting process of dbrew never should have a side effect, yes. But what do you get by tracking, memory writes during rewriting, if it is not captured? The original code did a write, and any rewriting should also do the write, to get this side effect. Otherwise the result is not the same. The stack is special here, as it becomes invalid if the function returns. So, stack writes into the current frame can be seen as having no side effect...

This is about ensuring that the specialization process does not change any memory. For example. if we rewrite a function and then subsequently execute the rewritten function, it should see the world exactly as if the rewriting hadn't taken place. That is, the rewriting process should, in effect, not change any memory that isn't local to the functions being rewritten. This is not about changing how writes to non-static memory are captured, so these considerations only applies to the proposed non-capturing all-static-memory emulation mode.

A general comment: Can we have (Github) issues for features that should be implemented at some point? They can link to this discussion.

Yes, that would be nice.

truls added 21 commits May 11, 2017 21:26

Move param reg info function to instr.c

43d98d7

IT_HINT_CALLRET support

62c19c6

CONFIG: Add experimental new API for mapping fun params

1bfb0c5

Fix off-by-one in application of new parameter API

4232de7

Shift the PNone param field like the others

09f1db2

Support more than 6 parameters for rewritten functions

442bb67

Rewrite param map macros for clarity and to avoid binary literals

0bcbc3d

Add option for setting return value of preserved calls dynamic

7e78d15

Add config option for not rewriting large CALL addrs

77eb13c

Refactor call instruction emu bypass handling

dabfcf7

Add support for replacing function calls with intrinsic hints

2b0579a

Useful for e.g. specialized handling of libc functions. Currently only implemented for memcpy Refactor call instruction handling (move bypassEmu stuff in sep. fun)

Add support for per-function loop unrolling inhibition

7896792

Still load static vals to registers when fun parCount > 6

4382e02

Rename function getRegIndex -> getParregindex

9d063ad

Allow setting function static in calls made during emulation

f9ac79a

Ensure param ms before running function bypassing emu

3905c71

Make changing dynamic params to static during runtime configurable

72b88c8

Support more than 6 parameters for rewritten functions

6527089

Fix call instruction generation and partial printing fix

dbb7da5

truls force-pushed the call-pr branch from 807868c to dbb7da5 Compare May 12, 2017 14:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Configurable and extended call and function parameter handling #52

Configurable and extended call and function parameter handling #52

Uh oh!

truls commented May 12, 2017

Uh oh!

aengelke commented May 13, 2017

Uh oh!

truls commented May 13, 2017 via email •

edited

Loading

Uh oh!

truls commented May 13, 2017

Uh oh!

weidendo commented May 14, 2017 via email

Uh oh!

aengelke commented May 14, 2017

Uh oh!

truls commented May 14, 2017 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Configurable and extended call and function parameter handling #52

Are you sure you want to change the base?

Configurable and extended call and function parameter handling #52

Uh oh!

Conversation

truls commented May 12, 2017

Uh oh!

aengelke commented May 13, 2017

Uh oh!

truls commented May 13, 2017 via email • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

truls commented May 13, 2017

Uh oh!

weidendo commented May 14, 2017 via email

Uh oh!

aengelke commented May 14, 2017

Uh oh!

truls commented May 14, 2017 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

truls commented May 13, 2017 via email •

edited

Loading