-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#8570 Increase HTTP protocol length limits #12094
base: trunk
Are you sure you want to change the base?
Conversation
please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for the changes.
I left more comments inline.
I think they are an improvement over what we have in trunk
it is very important to have automated test for these values.
Let me know if you need help with the autoamted tests.
@@ -23,7 +23,7 @@ see in particular `Git and GitHub learning resources <https://help.github.com/ar | |||
$ git clone https://github.com/twisted/twisted twisted | |||
$ python3 -m venv ./venv | |||
$ . venv/bin/activate | |||
$ pip install -e .[dev] | |||
$ pip install -e .[dev,test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. We can have this for now.
In an ideal case, dev will also install test.
But for another PR.
Free free to send a PR if you have time to help with this.
src/twisted/protocols/basic.py
Outdated
@@ -419,7 +422,7 @@ class LineOnlyReceiver(protocol.Protocol): | |||
|
|||
_buffer = b"" | |||
delimiter = b"\r\n" | |||
MAX_LENGTH = 16384 | |||
MAX_LENGTH = _MAX_LINE_LENGTH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making this as a change to all LineOnlyReceiver
and LineReceiver
uses seems pretty excessive. The issue is with certain HTTP servers. Assuming the right fix is to make Twisted more liberal in what it accepts from HTTP servers, this doesn't mean all line-based protocols implemented with Twisted should have the same change applied.
I'd move this fix into the HTTP implementation and leave LineReceiver
/ LineOnlyReceiver
alone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I have now limited the scope of the changes to HTTPClientParser
.
please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good. Many thanks!
I think that this is very close to being ready to merge.
I left a few minor comments, but that needs to be fixed before merge.
"""Test that HTTP responses with header lines up to 65536 bytes long | ||
are allowed.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this is a copy/paste error.
Maybe something like this?
"""Test that HTTP responses with header lines up to 65536 bytes long | |
are allowed.""" | |
""" | |
When the remote server returns an HTTP header that has a long line, will disconnected the connection. | |
""" |
Is disconnection everything that it does?
I was expecting to also see an error and the reason for disconnection.
Is it possible to also test the error message in this test?
In yes, it wouldhelp to update the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually originally going to ask about that. I was first checking by testing the resulting headers, which I will probably restore in combination with the disconnect check based on #12094 (comment), and was going to ask for a better way to check that things failed, as I don’t know. The relevant code simply calls “loseConnection” on the transport as far as I can tell.
prefix = b"a: " | ||
line = prefix + b"a" * (65536 - len(prefix) + 1) + b"\r\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this tests, does it matter the format of the header?
Maybe just write random long data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given https://github.com/twisted/twisted/pull/12094/files/3ade9422f18248993f2921a1439649bd39440c15#r1473704052, maybe it makes sense to keep this a valid header, so that checking that the parsed header list is empty has more value (i.e. it is not empty because it was not a valid header, it is empty due to the length limit).
Scope and purpose
Fixes #8570, fixes scrapy/scrapy#355, closes scrapy/scrapy#5911.
Increase the length limit affecting the HTTP client parser to support scenarios seen in the wild, such as https://www.vapestore.co.uk/ returning a header with a 28k+ char value.
It is an arbitrary increase mimicking CPython’s, though, so there is a chance that in the future some new scenario for which these new limits are not enough warrants either increasing the limits further or making it easier to increase the limits from user code, which is not straightforward at the moment without relying on private APIs.
I also made a minor change on the contributor docs, switching
pip install -e .[dev]
topip install -e .[dev,test]
based on my experience. Not sure if that’s OK to do here, or if that’s the right change (maybe you preferpip install -e .[test]
to be suggested on the following section about running tests).Regarding tests, there are no existing tests that check the specific values, which I think may make sense given these are trivially-selected limits. Also, there are already tests for the limits themselves (those tests use custom values, for performance reasons I imagine).Post-merge to-do