-
Notifications
You must be signed in to change notification settings - Fork 356
Fix DecompressResponse middleware for multiple encodings and keep updated content-length header #809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix DecompressResponse middleware for multiple encodings and keep updated content-length header #809
Conversation
If multiple Content-Encodings were applied restoring the original message requires undoing all steps in reverse order. Thus we cannot continue when encountering an unsupported codec. Just skipping over one step will most likely just lead to decoding errors in the next supported step or incorrect results otherwise.
HEAD requests can be used to check the size of remote content to decide ahead of time whether it is worth fetching. Of course the size after decompression likely differs from the transfer size indicated in the content-length header, but depending on use case only the transfer size might be relevant. This obsoletes the empty-body special case in decompress_body previously added in 5bc9b82 since HEAD requests are now handled earlier. If we get an invalid empty body in a non-HEAD request we want to fail loudly.
Depending on context presence of this header is mandatory or at least strongly encouraged in HTTP/1.0 and HTTP/1.1 and some later processing steps might rely on or profit from its presence
fa0c87b to
7f4a78b
Compare
|
Updated the patch to explicitly check for But now I’m not sure what to do with the existing empty-body special case in decompression added in 5bc9b82. Should we keep or drop it? It was motivated by |
|
If it's a HEAD request, it's caught earlier. If it's not a HEAD request with empty body + content-encoding, it should fail loudly. Unless I misunderstood https://datatracker.ietf.org/doc/html/rfc9110#name-head Should we merge it now 🤔 |
|
btw, when in doubt, keep adding tests 😄 |
7f4a78b to
3fda9ab
Compare
Alright, updated the patch to remove the pre-existing empty-body special case then as it’s no longer needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀 💜 Thank you so much for helping!
…dated content-length header (#809) * fix: stop decompressing response on first unknown codec If multiple Content-Encodings were applied restoring the original message requires undoing all steps in reverse order. Thus we cannot continue when encountering an unsupported codec. Just skipping over one step will most likely just lead to decoding errors in the next supported step or incorrect results otherwise. * fix: keep original content length and encoding headers for HEAD requests HEAD requests can be used to check the size of remote content to decide ahead of time whether it is worth fetching. Of course the size after decompression likely differs from the transfer size indicated in the content-length header, but depending on use case only the transfer size might be relevant. This obsoletes the empty-body special case in decompress_body previously added in 5bc9b82 since HEAD requests are now handled earlier. If we get an invalid empty body in a non-HEAD request we want to fail loudly. * fix: update existing content-length header after decompression Depending on context presence of this header is mandatory or at least strongly encouraged in HTTP/1.0 and HTTP/1.1 and some later processing steps might rely on or profit from its presence Signed-off-by: Yordis Prieto <[email protected]>
This addresses two issues:
First, until now if multiple encodings have been applied to the response body, the middleware would just skip over the unsupported algorithm and try to continue with the next. This likely crashes because the body now has the wrong data format for the next decompressor and is incorrect either way.
Furthermore, users may wish to use
HEADrequests to check how much data would be transferred for aGETto decide whether to (proactively) load a resource or not. But the middleware used to just delete anycontent-lengthheaders. Additionally,content-lengthheaders are mandatory in HTTP1.0 and unless doing chunked transfers HTTP1.1, thus its absence might confuse later processing steps even for other requests types. The second commit changes the middleware to instead retain or update thecontent-lengthheader.For the second part, there are two points worth pointing out:
content-lengthheader, this will now add one. I think this should be fine and it lead to simpler code, but if you think this should be changed let me knowHEADrequest and thus (effectively) also keep the originalcontent-encodingheader. Whether or not the content is compressed might also be of interest when doingHEADrequests (though i guess you could also argue one should omit the middleware entirely in this case, but then one needs to manually keep theaccept-encodingheader in sync with what the middleware supports)