Skip to content

Conversation

@Mrhs121
Copy link

@Mrhs121 Mrhs121 commented Nov 13, 2025

Purpose of this pull request

close #10059

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

…uses RecordBuffer infinite loop during shutdown
Copy link
Contributor

@davidzollo davidzollo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job.
CI is not successful, you can refer to https://github.com/apache/seatunnel/pull/10060/checks?check_run_id=55259688662.

CI will run a few hours per time ^_^

Can you add an E2E test for enabling 2PC?

@davidzollo
Copy link
Contributor

When will the flushing=false state be reset?

@Mrhs121
Copy link
Author

Mrhs121 commented Nov 14, 2025

Thanks for pointing this out. The flushing flag should be reset to false immediately after the flush operation completest. I'll include the fix for resetting the flushing state along with the new E2E test in the next commit.

When will the flushing=false state be reset?

Thanks for pointing this out. The flushing flag should be reset to false immediately after the flush operation completest. I'll include the fix for resetting the flushing state along with the new E2E test in the next commit.

FYI, I noticed the CI failure was due to the job exceeding the 10-minute timeout limit.

@davidzollo
Copy link
Contributor

davidzollo commented Nov 14, 2025

Thanks for pointing this out. The flushing flag should be reset to false immediately after the flush operation completest. I'll include the fix for resetting the flushing state along with the new E2E test in the next commit.

When will the flushing=false state be reset?

Thanks for pointing this out. The flushing flag should be reset to false immediately after the flush operation completest. I'll include the fix for resetting the flushing state along with the new E2E test in the next commit.

FYI, I noticed the CI failure was due to the job exceeding the 10-minute timeout limit.

Good. you can also help fix CI ^_^
Usually reviewers will review a new PR when CI passed.

By the way, I think we can have a more in-depth communication to help you get familiar with SeaTunnel. Feel free to contact me on LinkedIn (David Zollo) or WeChat (taskflow). When adding me, please let me know your GitHub ID

@Mrhs121
Copy link
Author

Mrhs121 commented Nov 14, 2025

I have provide a pure test case #10069 to reproduction #10059

@github-actions github-actions bot added the e2e label Nov 15, 2025
}
} catch (Exception e) {
throw new RuntimeException(e);
} finally {
Copy link
Author

@Mrhs121 Mrhs121 Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Spark 2.4, the DataWriter interface does not extend Closeable, so when the test case runs on the Spark 2.4 engine and the job fails, theclose()method of the DorisSinkWriter is never invoked. As a result, the threads inside the DorisSinkWriter remain alive and prevent the SeaTunnel job from terminating. Therefore, releasing resources here.
I'm not sure if this is a good fix.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidzollo I made some new changes, please help review it when you have time, thank you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not allowed to perform close in abortPrepare, and other connectors do not have such an implementation.
How about implementing Closeable in Spark2.4?

Copy link
Author

@zhangshenghang
Copy link
Member

I have provide a pure test case #10069 to reproduction #10059

We can merge #10069 into the current PR to verify that this issue will not occur.


public RespContent stopLoad() throws IOException {
loading = false;
flushing = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add a separate "flushing" instead of just using "loading"?

Copy link
Author

@Mrhs121 Mrhs121 Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, you can take a look at the description of root cause first #10059 (comment).

As shown in the following code, the error msg in the http response will only be obtained when it is in the loading state, and this loading will be reset to false during flush. Therefore, if the flush action is executed before the http response is returned. This means that errorMessage will always be null, an infinite loop occurs, preventing the seatunnel task from stopping.

public String getLoadFailedMsg() {
if (!loading) {
return null;
}
if (this.getPendingLoadFuture() != null && this.getPendingLoadFuture().isDone()) {
String errorMessage;
try {
errorMessage = handlePreCommitResponse(pendingLoadFuture.get()).getMessage();
} catch (Exception e) {
errorMessage = ExceptionUtils.getMessage(e);
}
recordStream.setErrorMessageByStreamLoad(errorMessage);
return errorMessage;
} else {
return null;
}
}

Another way to fix it is to place the action of resetting the loading to false after the endInput, that is, the loading is only considered to have ended after the streaming is truly closed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve of your second plan.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve of your second plan.

Done

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve of your second plan.

@zhangshenghang I made some new changes, please help review it when you have time, thank you.

@github-actions github-actions bot added the Spark label Nov 20, 2025
@Mrhs121
Copy link
Author

Mrhs121 commented Nov 21, 2025

@zhangshenghang I made some new changes, please help review it when you have time, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] [connector-doris] DorisStreamLoad loading state mismanagement causes RecordBuffer infinite loop during shutdown

3 participants