-
Notifications
You must be signed in to change notification settings - Fork 379
Fix SLANG_ENABLE_ASAN CMake option
#8980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This PR modifies
This version number helps maintain compatibility when loading serialized IR modules. |
ncelikNV
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also changed SLANG_ENABLE_ASAN to also enable UBSan when using Clang or GCC, not sure if we should introduce a separate CMake option for it instead.
| ArtifactDescUtil::makeDescForCompileTarget(options.targetType)); | ||
| ArtifactUtil::addAssociated(artifact, compileDiagnostics); | ||
| *outArtifact = artifact.detach(); | ||
| return SLANG_OK; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels weird to return SLANG_OK here, but iirc returning SLANG_FAIL instead caused tests/cpp-compiler/c-compile-error.c to fail.
If the compile or link command fails, it will only be handled back up in EndToEndCompileRequest::executeActionsInner here (only if the compiler/linker printed a diagnostic/anything?):
slang/source/slang/slang-end-to-end-request.cpp
Lines 266 to 269 in 664fecd
| // Generate output code, in whatever format was requested | |
| generateOutput(); | |
| if (getSink()->getErrorCount() != 0) | |
| return SLANG_FAIL; |
In this case, the Slang compiler just prints (0): error 98: cannot access as a blob, which isn't a helpful error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should return SLANG_FAIL and fix the problem with the test, tests/cpp-compiler/c-compile-error.c.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into it again, and since there are a lot of SLANG_RETURN_ON_FAILs which can cause CommandLineDownstreamCompiler::compile to return early, it returning SLANG_OK or not basically indicates whether the compiler/linker diagnostics (if the compiler/linker printed anything) could be parsed and outArtifact was written to or not. In other words, SLANG_OK here just means there wasn't an internal/unexpected error, and the parsed diagnostics indicate whether the compiler/linker really succeeded. I think this makes sense.
I guess if the compiler/linker ever exited with a non-zero status code or didn't produce the output file but didn't print anything either, for whatever reason, then the user might get an obscure error message, but this really shouldn't happen, especially since Slang runs clang++ -v, g++ -v, etc, beforehand, and checks their console output to make sure it found a suitable C++ compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had gotten slangc to print (0): error 98: cannot access as a blob by editing GCCDownstreamCompilerUtil::calcArgs to add an invalid argument to the Clang/GCC command line.
It turns out that this Clang error message:
clang++: error: unknown argument: '-fasdfasdf'
hits the following branch in _parseGCCFamilyLine (which is called by GCCDownstreamCompilerUtil::parseOutput):
slang/source/compiler-core/slang-gcc-compiler-util.cpp
Lines 326 to 336 in d9b92cc
| else if (split.getCount() == 4) | |
| { | |
| // Probably a link error, give the source line | |
| String ext = Path::getPathExt(split[0]); | |
| // Maybe a bit fragile -> but probably okay for now | |
| if (ext != "o" && ext != "obj") | |
| { | |
| outLineParseResult = LineParseResult::Ignore; | |
| return SLANG_OK; | |
| } |
So in this case, Slang incorrectly parsed Clang's stderr output and didn't notice there was an error-severity diagnostic, which is why it didn't forward Clang's stderr output and instead ended up printing an obscure error message after hitting an error somewhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what is your justification of this code behavior.
To me, your code is returning OK when the result is non-zero. That looks incorrect.
And I am having a trouble understanding why you think "this makes sense."
Let me ask questions regarding this code block:
- it looks like these lines didn't exist before, why did you need to add these regarding ASAN enablement. I am not sure what "problem" you are trying to solve here.
- is returning SLANG_OK, same behavior as before? or is it a new behavior?
- where are the calling sites that call this
compile()function? - what is the reason why cannot/wouldn't return SLANG_FAIL? What happens if you do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refer to what was done regarding the behavior on the calling sites.
06c442f#diff-f83b43e52193a88da24b6dcc85fbe5d8dd62046289151ae1bb8894b462c6fb8b
Basically, it is not a good idea to keep the unreasonable behavior only because the unreasonable behavior is expected by the calling site.
For these cases, we have full control of making modifications to both calling sites and callee.
So my suggestion is to make the necessary change on the calling site to get thing "right" on the callee side.
I also think we need explanations of why we want to apply certain option only to the linking step as a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refer to what was done regarding the behavior on the calling sites. 06c442f#diff-f83b43e52193a88da24b6dcc85fbe5d8dd62046289151ae1bb8894b462c6fb8b
That commit you've linked introduces a potential null pointer dereference.
There are cases where NVRTCDownstreamCompiler::compile can return without writing to outArtifact, e.g.:
slang/source/compiler-core/slang-nvrtc-compiler.cpp
Lines 992 to 1008 in 1152af7
| SlangResult NVRTCDownstreamCompiler::compile( | |
| const DownstreamCompileOptions& inOptions, | |
| IArtifact** outArtifact) | |
| { | |
| if (!isVersionCompatible(inOptions)) | |
| { | |
| // Not possible to compile with this version of the interface. | |
| return SLANG_E_NOT_IMPLEMENTED; | |
| } | |
| CompileOptions options = getCompatibleVersion(&inOptions); | |
| // This compiler can only deal with a single artifact | |
| if (options.sourceArtifacts.count != 1) | |
| { | |
| return SLANG_FAIL; | |
| } |
The struct version check exists because these compilers are implemented in shared libraries (libslang-compiler.so and libslang-llvm.so) and compile() is called outside of them. See this comment above the declaration of DownstreamCompileOptions:
slang/source/compiler-core/slang-downstream-compiler.h
Lines 143 to 153 in 1152af7
| /* Downstream compile options | |
| NOTE! This type is trafficed across shared library boundaries and *versioned*. | |
| In particular | |
| * The struct can only contain types that can be trivially memcpyd (checked by static_assert); | |
| * New fields can only be added to the end of the struct | |
| * New fields must take into account alignment/padding such that they do not share bytes in previous | |
| version sizes | |
| */ | |
| struct DownstreamCompileOptions |
isVersionCompatible is a confusing name by the way. To me, "compatible" implies it's a greater-or-equal check or something similar, but it's actually just an equality check (which to me implies backward compatibility isn't really guaranteed):
slang/source/compiler-core/slang-downstream-compiler.h
Lines 131 to 141 in 1152af7
| template<typename T> | |
| bool isVersionCompatible(const VersionedStruct& ver) | |
| { | |
| return ver.identifier == T::kVersionIdentifier; | |
| } | |
| template<typename T> | |
| bool isVersionCompatible(const T& in) | |
| { | |
| return isVersionCompatible<T>(in.version); | |
| } |
So, I'm assuming we really shouldn't call ArtifactUtil::createArtifactForCompileTarget(options.targetType) before this version check.
So do you think we should just pick SLANG_TARGET_UNKNOWN as target type so we can still create an artifact in that case? Or do you think it would be acceptable to add a comment on IDownstreamCompiler::compile saying that outArtifact is only written to if the function didn't return SLANG_E_NOT_IMPLEMENTED? The latter would require implementations to not have a SLANG_RETURN_ON_FAIL(...) that can cause them to return this result code for other reasons, which might be hard to enforce, so I guess using SLANG_TARGET_UNKNOWN would be better.
I also think we need explanations of why we want to apply certain option only to the linking step as a comment.
Sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The potential null pointer dereference is due to artifact being dereferenced down this call chain by the way:
slang/source/slang/slang-code-gen.cpp
Line 1021 in 1152af7
| SLANG_RETURN_ON_FAIL(passthroughDownstreamDiagnostics(getSink(), compiler, artifact)); |
slang/source/slang/slang-pass-through.cpp
Line 205 in 1152af7
| auto diagnostics = findAssociatedRepresentation<IArtifactDiagnostics>(artifact); |
slang/source/compiler-core/slang-artifact.h
Line 520 in 1152af7
| artifact->findRepresentation(IArtifact::ContainedKind::Associated, T::getTypeGuid())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to mention, there are other cases where NVRTCDownstreamCompiler and other compilers (DXCDownstreamCompiler, GlslangDownstreamCompiler, MetalDownstreamCompiler, TintDownstreamCompiler) can return without writing to outArtifact, and I doubt you want me to fix them all in this PR.
But at first glance, the !isVersionCompatible case is the only that requires a little bit of thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is already too long and please make separate PRs if possible.
I think the expected behavior from the user perspective should be something like,
if (SLANG_FAILED(compile(..., sink)))
{
print error from sink;
return SLANG_FAILED;
}
The rest of implementation should change to make the interface contract working.
|
|
||
| ExecuteResult exeRes; | ||
|
|
||
| #if 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw there are many #if 0s across the Slang codebase and also code that is commented out (which is harder to grep for). Cleaning them all out would be out of this PR's scope but I thought I would remove these two since I was editing code nearby.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't object to this change.
But code clean up should happen in a separate PR especially when a PR is already so big like this one.
You should minimize your PR as much as possible and keep the "cleanup" changes separate from your intended code changes.
This helps for reader to focus on what changes are intended and it can also help us to revert/bisect changes that causes unexpected regressions.
| #if defined(__has_feature) | ||
| #if __has_feature(address_sanitizer) | ||
| if (options.targetType != SLANG_OBJECT_CODE) | ||
| { | ||
| #if SLANG_CLANG || SLANG_GCC | ||
| // Assume ASan was enabled through the SLANG_ENABLE_ASAN CMake option, meaning UBSan was | ||
| // enabled as well. | ||
| cmdLine.addArg("-fsanitize=address,undefined"); | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we should really assume that ASan being enabled + the compiler being either Clang or GCC means that UBSan is also enabled.
| break; | ||
|
|
||
| value = value * base + digit; | ||
| value = static_cast<uint64_t>(value) * base + digit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes UBSan error: signed integer overflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean an overflow actually happened?
Why does casting to uint64 and storing it back to signed-int64 solve the problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean an overflow actually happened?
Yes, e.g.:
/home/ncelik/projects/slang/source/compiler-core/slang-lexer.cpp:746:23: runtime error: signed integer overflow: 1152921504606846975 * 16 cannot be represented in type 'IntegerLiteralValue' (aka 'long')
Why does casting to uint64 and storing it back to signed-int64 solve the problem?
Because signed integer overflow is undefined behavior in C++ (https://eel.is/c++draft/basic.fundamental#note-2) and UBSan reports this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you answered my question.
I understand that when the operation happens in a higher bit space, you may be able to avoid overflow.
But casting down from uin64 to int64 will make the resulting value negative, which is as bad as overflow.
Your current implementation is just to make ASAN happy without improving the situation.
I think you should trace back to where the overflow happens and prevent it from overflowing in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that when the operation happens in a higher bit space, you may be able to avoid overflow.
No, that's not the point of this change. The point is just that signed arithmetic overflow is undefined behavior, whereas unsigned arithmetic is defined to be performed modulo 2^N. See the note from the (draft of the) C++ standard I linked above:
Unsigned arithmetic does not overflow. Overflow for signed arithmetic yields undefined behavior ([expr.pre]).
https://eel.is/c++draft/basic.fundamental#note-2
But in practice, I don't know any compiler or (modern) CPU architecture where signed integer arithmetic doesn't just wrap around. Yet, for some reason, the C++ standards committee doesn't want to define it as such:
- Status-quo If a signed operation would naturally produce a value that is not within the range of the result type, the behavior is undefined. The author had hoped to make this well-defined as wrapping (the operations produce the same value bits as for the corresponding unsigned type), but WG21 had strong resistance against this.
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r4.html#intro
But casting down from uin64 to int64 will make the resulting value negative, which is as bad as overflow.
Your current implementation is just to make ASAN happy without improving the situation.I think you should trace back to where the overflow happens and prevent it from overflowing in the first place.
So yes, in practice, I'm just making UBSan happy here.
The real issue is that Slang's IntegerLiteralValue is an alias for int64_t, and the Slang compiler doesn't even emit a warning if the value of an integer literal doesn't fit into this type. E.g., there is no warning when compiling this Slang code:
int x = 18446744073709551616;By the way, there is no warning if the value of an integer literal doesn't fit into the type of the variable it's initializing, either.
For comparison, I know Clang uses an arbitrary-precision integer type (llvm::APInt) to store integer literals, and both Clang and GCC emit the following errors/warnings even without any -W... flags:
$ cat a.c
int x = 18446744073709551616;
$ clang -c a.c
a.c:1:9: error: integer literal is too large to be represented in any integer type
1 | int x = 18446744073709551616;
| ^
1 error generated.
$ gcc -c a.c
a.c:1:9: warning: integer constant is too large for its type
1 | int x = 18446744073709551616;
| ^~~~~~~~~~~~~~~~~~~~
$ cat b.c
int x = 4294967296;
$ clang -c b.c
b.c:1:9: warning: implicit conversion from 'long' to 'int' changes value from 4294967296 to 0 [-Wconstant-conversion]
1 | int x = 4294967296;
| ~ ^~~~~~~~~~
1 warning generated.
$ gcc -c b.c
b.c:1:9: warning: overflow in conversion from ‘long int’ to ‘int’ changes value from ‘4294967296’ to ‘0’ [-Woverflow]
1 | int x = 4294967296;
| ^~~~~~~~~~
But don't you think fixing all this would be too much for this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overflow with any integer type is doom to happen no matter what we try. That's just a limitation of the current computer architecture, I guess.
Yes, unless we use an arbitrary-precision integer type to store integer literals, which would only be worth it if we planned to support integer types wider than 64 bits.
The question here is not about the theoretical solution for all overflow in general. Even when the overflow is going to happen, in practice it shouldn't happen if we are doing the regular/expected things.
What do you mean? We can't prevent users from passing int x = 18446744073709551616; to the Slang compiler. Either we silently let the overflow/wrap-around happen (and just use uint64_t instead of int64_t to make UBSan happy), or we emit an error like Clang does (or a warning like GCC does).
We need to identify which test triggers the behavior of the overflow and find a reasonable solution. We don't need to solve the problem in this PR but it is important to understand why ASAN warns us. And we need to understand more than just "overflow is happening here". We need to know if it is expected to happen or if we have a bug or mistake to cause the warning even when things are happening in an expected way.
I already explained this. We have to expect that the user might pass something like int x = 18446744073709551616; to the compiler.
I like to suggest to create a new github issue to track this. And I don't think we want to have any code change just to make ASAN happy; it will be better to find a way to snooze the ASAN warning if sensible.
I can create a separate issue for this. Would a TODO comment above this change be enough, or do you really want something like this:
#if SLANG_CLANG || SLANG_GCC
__attribute__((no_sanitize("undefined")))
#endif
IntegerLiteralValue getIntegerLiteralValue((which should probably have a TODO comment anyway)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't prevent users from passing int x = 18446744073709551616; to the Slang compiler.
I am talking about the case where the testing is done by our hand-crafted limited number of tests; not an arbitrary random inputs from the users.
Our current tests aren't supposed to use an invalid value like that; unless it is explicitly intended to test the case.
We need to figure out if the overflow is coming from a test that is expected to overflow or not expected to have overflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't prevent users from passing int x = 18446744073709551616; to the Slang compiler.
Slangc should obviously warn about overflowing literals. It wouldn't be the first time I do something like this in my C++ code:
uint64_t bitmask { 0x1f'ff'ff'ff'ff'ff'ff'ff'ff };
I'd be quite unhappy if the compiler decided to interpret this as 0xff'ff'ff'ff'ff'ff'ff'ff without a warning since I added one too many bytes.
I think that this function should be properly fixed. Possibly, this function should emit a warning in case of overflowing literal.
We need to figure out if the overflow is coming from a test that is expected to overflow or not expected
Emitted warning and some test-side checks for the warning should do the trick.
I can create a separate issue for this.
This would be ok to me. But still, I would add a TODO comment with a brief explanation here why this somewhat unexpected cast is done. Something like:
// The input may contain an overflowing literal value. This static_cast prevents the related UB.
// TODO: fix properly (https://github.com/shader-slang/slang/issues/12345)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing. This function should arguably return std::uint64_t instead of IntegerLiteralValue, since I don't see that potential minus sign handled here for negative values. I'm presuming that parsing literal INT64_MIN works by accident, since this function would overflow to INT64_MIN when reading the digits part, and then the calling function doing the negation would overflow again, since INT64_MIN cannot be negated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm presuming that parsing literal INT64_MIN works by accident, since this function would overflow to INT64_MIN when reading the digits part, and then the calling function doing the negation would overflow again, since INT64_MIN cannot be negated.
Yes, I think this is what happens.
| ComponentKey key; | ||
| key.typeName = UnownedStringSlice(type->getName()); | ||
| switch (type->getKind()) | ||
| if (type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes UBSan error: member call on null pointer.
| ComPtr<ISlangWriter> writer( | ||
| new FileWriter(stderr, WriterFlag::IsUnowned | WriterFlag::AutoFlush)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stderr was being closed when exiting main, which meant sanitizers would fail to print diagnostics.
| const SlangPassThrough cppCompilers[] = { | ||
| SLANG_PASS_THROUGH_VISUAL_STUDIO, | ||
| SLANG_PASS_THROUGH_GCC, | ||
| #if SLANG_CLANG | ||
| SLANG_PASS_THROUGH_CLANG, | ||
| #else | ||
| SLANG_PASS_THROUGH_GCC, | ||
| #endif | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue here was that I was building Slang with Clang but a test was being built with GCC and the sanitizer runtime of one is incompatible with the other's. Maybe this code should just be edited to only test the compiler that is being used to build Slang instead of MSVC + either Clang or GCC, this was more of a quick fix.
| // compute() | ||
| { | ||
| SLANG_CHECK(MD5::compute(nullptr, 0).toString() == "d41d8cd98f00b204e9800998ecf8427e"); | ||
| const String str("Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod " | ||
| "tempor incididunt ut labore et dolore magna aliqua."); | ||
| SLANG_CHECK( | ||
| MD5::compute(str.getBuffer(), str.getLength()).toString() == | ||
| "818c6e601a24f72750da0f6c9b8ebe28"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was getting a UBSan error in MD5::compute (null pointer passed as argument 2, which is declared to never be null) and noticed that MD5 is never actually used anywhere. Should I open another PR to fully remove MD5?
| ComPtr<slang::IBlob> code; | ||
| linkedProgram->getTargetCode(0, code.writeRef(), diagnosticBlob.writeRef()); | ||
| SLANG_CHECK(code != nullptr); | ||
| SLANG_CHECK_ABORT(code != nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't have the spirv-tools (Ubuntu) package installed and didn't have the LD_LIBRARY_PATH for libslang-glslang.so, so compiling/linking the Slang program failed and I got this UBSan error: member call on null pointer of type 'ISlangBlob'.
| if(CMAKE_CXX_COMPILER_ID STREQUAL "Clang") | ||
| # Clang defaults to statically linking the sanitizer runtime (except on macOS/Darwin), | ||
| # which is not compatible with -Wl,--no-undefined, so we need to use dynamic linking | ||
| # instead (-shared-libsan). | ||
| target_compile_options( | ||
| ${target} | ||
| PRIVATE -fsanitize=address,undefined -shared-libsan | ||
| ) | ||
| target_link_options( | ||
| ${target} | ||
| PUBLIC -fsanitize=address,undefined -shared-libsan | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to mention, the downside of -shared-libsan is the .so's are stored somewhere that is basically guaranteed to not be in LD_LIBRARY_PATH (e.g., /usr/lib/llvm-20/lib/clang/20/lib/linux for LLVM 20 on Ubuntu), so that directory has to be added to LD_LIBRARY_PATH before running CMake. But I've also had to add build/<CMake config, e.g., Debug>/lib to LD_LIBRARY_PATH for a test to find libslang-glslang.so.
jkwak-work
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't review all of the changes.
But I want you to handle the ASAN error cases more properly.
The current implementations look more like hacks to make ASAN happy.
| if(${target} STREQUAL slang-llvm) | ||
| if(CMAKE_SYSTEM_NAME STREQUAL "Darwin") | ||
| add_supported_cxx_linker_flags( | ||
| ${target} | ||
| PRIVATE | ||
| "-Wl,-undefined,suppress" | ||
| ) | ||
| endif() | ||
| else() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure where the WAR is supposed to be applied to.
Is it intentional to not set any option for slang-llvm on platforms other than Darwin?
Shouldn't the logic more be like,
if (${target} STREQUAL slang-llvm AND SLANG_ENABLE_ASN AND CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
suppress warning
else
...
|
|
||
| ExecuteResult exeRes; | ||
|
|
||
| #if 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't object to this change.
But code clean up should happen in a separate PR especially when a PR is already so big like this one.
You should minimize your PR as much as possible and keep the "cleanup" changes separate from your intended code changes.
This helps for reader to focus on what changes are intended and it can also help us to revert/bisect changes that causes unexpected regressions.
| if (PlatformUtil::isFamily(PlatformFamily::Unix, platformKind)) | ||
| { | ||
| // Position independent | ||
| cmdLine.addArg("-fPIC"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why this is needed.
If needed, can we apply to all platform without making a divergence based on the target platform?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's needed to fix this kind of link error:
/usr/bin/ld: /tmp/slang-generated-lhMVPe.o: warning: relocation against `globalCounter' in read-only section `.text'
/usr/bin/ld: /tmp/slang-generated-lhMVPe.o: relocation R_X86_64_PC32 against symbol `intGlobal' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: bad value
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
Objects linked into shared libraries must be compiled with -fPIC.
Note that with this change, position-independent code will be enabled for all objects (there's no way to tell just from SLANG_OBJECT_CODE whether we ultimately want to link a shared library or an executable or if we really just want to compile an object), but there doesn't seem to be a reason not to.
If needed, can we apply to all platform without making a divergence based on the target platform?
Right.
| ArtifactDescUtil::makeDescForCompileTarget(options.targetType)); | ||
| ArtifactUtil::addAssociated(artifact, compileDiagnostics); | ||
| *outArtifact = artifact.detach(); | ||
| return SLANG_OK; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should return SLANG_FAIL and fix the problem with the test, tests/cpp-compiler/c-compile-error.c.
| break; | ||
|
|
||
| value = value * base + digit; | ||
| value = static_cast<uint64_t>(value) * base + digit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean an overflow actually happened?
Why does casting to uint64 and storing it back to signed-int64 solve the problem?
| RawBlob(const void* data, size_t size) { memcpy(m_data.allocateTerminated(size), data, size); } | ||
| RawBlob(const void* data, size_t size) | ||
| { | ||
| if (data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a proper fix.
This will only hide the real problem.
We should fix the place where nullptr is provided.
I think this should be an assertion,
SLANG_ASSERT(data); // or just assert(data) if SLANG_ASSERT is not available.
| if (!m_current) | ||
| { | ||
| return _allocateAlignedFromNewBlock(sizeInBytes, kMinAlignment); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a good way to handle it.
m_current shouldn't be a nullptr in the first place.
We should add assert(m_current) and fix where the problem comes from.
| ArtifactDescUtil::makeDescForCompileTarget(options.targetType)); | ||
| ArtifactUtil::addAssociated(artifact, compileDiagnostics); | ||
| *outArtifact = artifact.detach(); | ||
| return SLANG_OK; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what is your justification of this code behavior.
To me, your code is returning OK when the result is non-zero. That looks incorrect.
And I am having a trouble understanding why you think "this makes sense."
Let me ask questions regarding this code block:
- it looks like these lines didn't exist before, why did you need to add these regarding ASAN enablement. I am not sure what "problem" you are trying to solve here.
- is returning SLANG_OK, same behavior as before? or is it a new behavior?
- where are the calling sites that call this
compile()function? - what is the reason why cannot/wouldn't return SLANG_FAIL? What happens if you do?
| break; | ||
|
|
||
| value = value * base + digit; | ||
| value = static_cast<uint64_t>(value) * base + digit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you answered my question.
I understand that when the operation happens in a higher bit space, you may be able to avoid overflow.
But casting down from uin64 to int64 will make the resulting value negative, which is as bad as overflow.
Your current implementation is just to make ASAN happy without improving the situation.
I think you should trace back to where the overflow happens and prevent it from overflowing in the first place.
| if (!m_current) | ||
| { | ||
| return _allocateAlignedFromNewBlock(sizeInBytes, kMinAlignment); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just adding an assertion is good enough.
Adding more of error handling will require "if (m_current)" which will have a negative impact on the runtime performance. And these types of situations aren't supposed to happen in the first place if used in a specific way that they are supposed to be used.
So I think "assert" makes sense.
If needed, we could add a comment explicitly saying that "this function must be called after Bla Blas is done".
I understand that adding an error case handling as implemented is the easiest way to make ASAN happy. But we should fix the root-cause of the problem and we should avoid the problem in the first place.
| char* getData() const | ||
| { | ||
| static char empty[] = ""; | ||
| return m_buffer ? m_buffer->getData() : empty; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not comfortable adding static keyword especially it is in a header file.
I am not sure the exact behavior and that's where my fear grows.
static here sounds like the variable will be instantiated per compilation unit; in other words, every "*.o" files will have its own instance of the data. If this is the case, this isn't ideal solution just to make ASAN happy.
I am guessing that the problem might be a mismatching behavior of when the function is inlined in one place but not inline in the other place.
I wonder if it will be resolved if we explicitly mark the function as "force-inline" or "never-inline". By making the behavior consistent in either way, it may resolve the problem.
| bool operator==(const char* strbuffer) const | ||
| { | ||
| const char* volatile b = begin(); | ||
| return (strcmp(b, strbuffer) == 0); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change makes no sense to me.
And it look dangerous to use volatile in this way especially when it doesn't makes sense to solve a problem like this.
source/core/slang-uint-set.cpp
Outdated
| if (auto* buffer = m_buffer.getBuffer()) | ||
| ::memset(buffer, 0, m_buffer.getCount() * sizeof(Element)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be needing this in the first place.
I searched for where clear() can be called from and the following implementation may be the source of this problem.
template<typename TKey, typename TValue>
class OrderedDictionary
{
UIntSet m_marks;
void clear()
{
m_count = 0;
m_kvPairs.clear();
m_marks.clear(); // <== THIS SHOULD BE `m_marks.resize(0);` or similar.
}
I think we should rename UInsetSet::clear() to something more like unsetAll() or something similar in order to avoid confusion like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the call sites. Do you want me to rename it to unsetAll() in this PR as well?
|
|
||
| DeclRef<FunctionDeclBase> funcDeclRef = defaultFuncDeclRef.as<FunctionDeclBase>(); | ||
| auto funcThisType = getTypeForThisExpr(visitor, funcDeclRef); | ||
| if (!funcThisType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think return would mean that there was no error/problems, which is not the case here.
I think you should add either SLANG_ASSERT or print a proper error message.
| if (argCount != 2) \ | ||
| return nullptr; \ | ||
| resultValue = \ | ||
| static_cast<uint64_t>(constArgVals[0]) OP static_cast<uint64_t>(constArgVals[1]); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will cause a problem when the values are actually negative values.
I see a line like this:
CASE(*);
What do you expect to happen when one of the argument is a negative and the other is positive?
Clang was taking around 10 minutes to compile `slang-embedded-core-module-source.cpp` with optimizations. Disabling optimizations for these functions reduced the compilation time to about 12 seconds on my machine (Intel Core Ultra 7 165H), and had no noticeable impact on run-time performance. Run-time performance with optimizations: ``` $ hyperfine --shell=none './build/generators/Release/bin/slang-bootstrap -archive-type riff-lz4 -save-core-module-bin-source slang-core-module-generated.h -save-glsl-module-bin-source slang-glsl-module-generated.h' Benchmark 1: ./build/generators/Release/bin/slang-bootstrap -archive-type riff-lz4 -save-core-module-bin-source slang-core-module-generated.h -save-glsl-module-bin-source slang-glsl-module-generated.h Time (mean ± σ): 2.545 s ± 0.035 s [User: 2.333 s, System: 0.210 s] Range (min … max): 2.496 s … 2.620 s 10 runs ``` Run-time performance without optimizations: ``` $ hyperfine --shell=none './build/generators/Release/bin/slang-bootstrap -archive-type riff-lz4 -save-core-module-bin-source slang-core-module-generated.h -save-glsl-module-bin-source slang-glsl-module-generated.h' Benchmark 1: ./build/generators/Release/bin/slang-bootstrap -archive-type riff-lz4 -save-core-module-bin-source slang-core-module-generated.h -save-glsl-module-bin-source slang-glsl-module-generated.h Time (mean ± σ): 2.564 s ± 0.039 s [User: 2.350 s, System: 0.213 s] Range (min … max): 2.512 s … 2.614 s 10 runs ``` Disabling optimizations also makes `slang-embedded-core-module-source.cpp.o` slightly smaller: - 7.84 MiB with optimizations, - 7.37 MiB without optimizations. Fixes shader-slang#9054.
… result still unsigned"
SLANG_ENABLE_ASAN CMake option
| add_supported_cxx_linker_flags( | ||
| ${target} | ||
| PRIVATE | ||
| # Don't assume that symbols will be resolved at runtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes such as this should be commented in the PR description. It is not obvious to the reviewer why you removed this flag. Presumably there was a good reason.
| asCharSlice(toSlice("slang-generated")), | ||
| lockFile.writeRef()))) | ||
| { | ||
| *outArtifact = resultArtifact.detach(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know this part of slangc very well (yet), but I'd just like to double-check that this is intentional.
Usually, a failure should rollback whatever changes was done by the function. In this case, it feels a bit odd that we're returning a failure but still also returning the artifact.
See also https://en.wikipedia.org/wiki/Exception_safety
Addendum. I see from above that apparently this is the intention. But I do wonder if the call site shouldn't try to use the artifact in case this function failed, unless it's for some sort of diagnostics (= error output). I this case, I would appreciate a code comment.
| options.modulePath = SliceUtil::asTerminatedCharSlice(modulePath); | ||
| } | ||
|
|
||
| // Compile stage: compile source to object code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this newly added code just as an ASAN fix? (I presume we do. The PR description would be the place to explain why. Not at all obvious.)
| break; | ||
|
|
||
| value = value * base + digit; | ||
| value = static_cast<uint64_t>(value) * base + digit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't prevent users from passing int x = 18446744073709551616; to the Slang compiler.
Slangc should obviously warn about overflowing literals. It wouldn't be the first time I do something like this in my C++ code:
uint64_t bitmask { 0x1f'ff'ff'ff'ff'ff'ff'ff'ff };
I'd be quite unhappy if the compiler decided to interpret this as 0xff'ff'ff'ff'ff'ff'ff'ff without a warning since I added one too many bytes.
I think that this function should be properly fixed. Possibly, this function should emit a warning in case of overflowing literal.
We need to figure out if the overflow is coming from a test that is expected to overflow or not expected
Emitted warning and some test-side checks for the warning should do the trick.
I can create a separate issue for this.
This would be ok to me. But still, I would add a TODO comment with a brief explanation here why this somewhat unexpected cast is done. Something like:
// The input may contain an overflowing literal value. This static_cast prevents the related UB.
// TODO: fix properly (https://github.com/shader-slang/slang/issues/12345)
| break; | ||
|
|
||
| value = value * base + digit; | ||
| value = static_cast<uint64_t>(value) * base + digit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing. This function should arguably return std::uint64_t instead of IntegerLiteralValue, since I don't see that potential minus sign handled here for negative values. I'm presuming that parsing literal INT64_MIN works by accident, since this function would overflow to INT64_MIN when reading the digits part, and then the calling function doing the negation would overflow again, since INT64_MIN cannot be negated.
|
|
||
| void ensureUniqueStorageWithCapacity(Index capacity); | ||
|
|
||
| RefPtr<StringRepresentation> m_buffer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note: Modern std::string implementations don't work like this. Instead, they usually do something like:
union
{
RefPtr<StringRepresentation> largeString; // assume size <= 31, probably 16
struct
{
std::array<char, 31> smallStringData; // max size 30, not counting the terminating '\0'
std::uint8_t smallStringSize; // value == 255 means we're using largeString, instead
};
} m_buffer;
The benefit of this is that this is generally much faster, since we're allocating dynamic memory only for large strings. Most strings are small. In addition, we could trivially let getData() return a pointer to smallStringData[0] for empty strings.
|
|
||
| private: | ||
| char* getData() const { return m_buffer ? m_buffer->getData() : (char*)""; } | ||
| char* getData() const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note: This probably shouldn't be a const member. This literally allows modifying the data of a const String, which is likely not intended.
| bool operator==(const char* strbuffer) const | ||
| { | ||
| const char* volatile b = begin(); | ||
| return (strcmp(b, strbuffer) == 0); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
volatile here is the incorrect fix. We shouldn't use it even as a workaround.
I would first try moving the function-static "empty" as a class static member. If that doesn't make GCC happy, let's then figure out something that doesn't involve volatile.
| } | ||
|
|
||
| return UnownedStringSlice(fileStat.m_filename).trim('/'); | ||
| return String(UnownedStringSlice(fileStat.m_filename).trim('/')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like an actual bug fix. We've been lucky not to have unexplained crashes.
|
|
||
| #define EMIT_LINE_DIRECTIVE() sb << "#line " << (__LINE__ + 1) << " \"" << path << "\"\n" | ||
|
|
||
| #if SLANG_CLANG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is possibly a reasonable change, this definitely should be a separate PR. Seems completely unrelated to ASAN/UBSAN fixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a separate PR: #9069.
I think I forgot to git switch -c before committing, so I created a new branch after committing and didn't think of removing the commit from the original branch.
|
Splitting this PR into multiple smaller ones to make it easier to review.
|
Fixes broken CMake checks for compiler support of sanitizer flags, which
caused sanitizers to silently not be enabled.
Changes
SLANG_ENABLE_ASANto enable both ASan and UBSan on Linuxinstead of only enabling ASan.
Fixes sanitizer errors reported by ASan and UBSan when running
slang-test -api cpuon Linux.Fixes #8646.