-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
fs: support io_uring with tokio::fs::read
#7696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
f4c97c2 to
7e73f17
Compare
tokio/src/fs/read.rs
Outdated
| let mut offset = 0; | ||
|
|
||
| while size_read < size { | ||
| let left_to_read = (size - size_read) as u32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| let left_to_read = (size - size_read) as u32; | |
| let left_to_read = u32::try_from(size - size_read).unwrap_or(u32::MAX); |
To properly support files bigger than 4GB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max read size at a time is u32::MAX, we read the rest in other next iterations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future, if we know we're reading more than u32::MAX then we can batch 2 read requests to avoid extra syscalls
6237a4c to
636cfb8
Compare
ADD-SP
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks complicated, I will review it incrementally.
6e0abae to
8120700
Compare
mox692
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't checked all of the details in read_uring.rs, but left some comments I've noticed so far.
e6c6ce7 to
b9c3885
Compare
b29ed97 to
2acee44
Compare
6261e0e to
974be76
Compare
tokio/tests/fs_uring_read.rs
Outdated
| std::thread::spawn(move || { | ||
| let rt: Runtime = rx.recv().unwrap(); | ||
| rt.shutdown_timeout(Duration::from_millis(300)); | ||
| done_tx.send(()).unwrap(); | ||
| }); | ||
|
|
||
| tx.send(rt).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is equivalent to just moving the runtime.
| std::thread::spawn(move || { | |
| let rt: Runtime = rx.recv().unwrap(); | |
| rt.shutdown_timeout(Duration::from_millis(300)); | |
| done_tx.send(()).unwrap(); | |
| }); | |
| tx.send(rt).unwrap(); | |
| std::thread::spawn(move || { | |
| rt.shutdown_timeout(Duration::from_millis(300)); | |
| done_tx.send(()).unwrap(); | |
| }); | |
If you wanted an actual sleep for the loop to make some progress, then what you have now does not achieve that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Darksonn Should I sleep in the std::thread so the loop can make progress? Is that allowed or I shouldn't do it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Poll the
read(path).awaitmanually to submit it to the kernel. - Then increase the semaphore.
- Keep the spawn task pending for ever.
So that we can wait on the semaphore and then shutdown the runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this work for multi-thread runtime? For current thread runtime, this won't work. But I believe we can apply the similar pattern to the current thread runtime.
rt.spawn(async move {
let path = path[0].clone();
let mut futs = vec![];
// spawning a bunch of uring operations.
for _ in 0..N {
let path = path.clone();
let cl = Arc::clone(&cl);
let fut = Box::pin(read(path));
assert_pending!(fut.poll(cx));
futs.push(fut);
}
pending_forever().await;
});
std::thread::spawn(move || {
rt.shutdown_timeout(Duration::from_millis(300));
done_tx.send(()).unwrap();
});
done_rx.recv().unwrap();902191e to
52dcdca
Compare
b599115 to
44f8f98
Compare
1475f00 to
b6cfb9b
Compare
31431a2 to
4e69595
Compare
67ac09f to
b7ccee7
Compare
mox692
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me.
b7ccee7 to
ca00f84
Compare
| Ok(size_read) => { | ||
| *offset += size_read as u64; | ||
| read_len -= size_read; | ||
|
|
||
| if read_len == 0 || size_read == 0 { | ||
| return Ok((size_read, r_fd, r_buf)); | ||
| } | ||
|
|
||
| buf = r_buf; | ||
| fd = r_fd; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think this makes sense.
| Ok(size_read) => { | |
| *offset += size_read as u64; | |
| read_len -= size_read; | |
| if read_len == 0 || size_read == 0 { | |
| return Ok((size_read, r_fd, r_buf)); | |
| } | |
| buf = r_buf; | |
| fd = r_fd; | |
| } | |
| Ok(size_read) => return Ok((size_read, r_fd, r_buf)), |
The current logic means that unless the buffer fits the file exactly, we perform the final read of length zero twice:
- We call
op_read, which performs a partial read and then a length-zero EOF read. It returns a non-zero length to the caller. - Since the caller got a non-zero length, it now needs to call
op_readagain to check for EOF.
Checking for EOF twice does not make sense.
d219de6 to
a90b5c7
Compare
tokio/tests/fs_uring_read.rs
Outdated
| // If io_uring is enabled (and not falling back to the thread pool), | ||
| // the first poll should return Pending. | ||
| assert_pending!(fut.as_mut().poll(cx)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the fut.as_mut().poll() registers the waker, this assertion will be polled twice. Consider introduce noop waker here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced the contex of poll_fn with a noop waker @ADD-SP
Reading 64 MB chunks at a time and keeping the kernel busy surpases
std::fs::read time with unoptimized io_uring one being 1.12% fast
Fix tests fail on allocation failure
Doc fix and fix double subtraction of offset
Fix typos and use assert the stopped task's first poll
Add comments and panic on failed u32 conversion
Check the EOF internally in the `op_read` function
Fix test with noop waker
e1909d0 to
dfe83fe
Compare
Motivation
We slowly use io uring everywhere here I made the simple change of supporting
fs::readhowever it might not be as simple. Let me know if the unsafe I used is correct or not.We currently use the blocking
std::fs::MetaDatato obtain file size for buffer capacity and extend the length of the vector according to the bytes read in the CQE. This implementation sounds good on paper to me.Later we should implement an internal statx helper, in this PR or a seperate PR to make our uring implementation less painful to use. As this pr failed #7616
Lets put statx helper in different PR to avoid merging an inefficient read implementation given io uring is about being more efficient in file IO
Solution
Continue adopting io uring
strace on a
tokio::fs::readafter this change