8000 Percent encoding `|` in paths · Issue #3565 · encode/httpx · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Percent encoding | in paths #3565
Open
Open
@nathaniel-daniel

Description

@nathaniel-daniel

I got no response, so I'm opening this issue for more visibility.

OS: Windows 11
python --version: Python 3.12.8
httpx version: 0.28.1

I believe the | should be percent encoded in paths, which is not currently the case. If I'm understanding RFC3986 correctly, path characters are pchar, which can be unreserved, pct-encoded, sub-delims, ":", or "@". unreserved can be composed of ALPHA, DIGIT, "-", ".", "_", or "~". pct-encoded is the percent encoding sequences. sub-delims can be "!", "$", "&", "'", "(", ")", "*", "+", ",", ";", or "=". Nowhere in this set is the | character present, meaning it has to be percent-encoded.

Simplifying my problem, httpx seems to call its internal urlparse function to process urls. So, here's an example using that function. This function normally percent-encodes characters as needed, like spaces:

httpx._urlparse.urlparse('http://example.com/ ')

will return

ParseResult(scheme='http', userinfo='', host='example.com', port=None, path='/%20', query=None, fragment=None)

However, this does not happen for |:

httpx._urlparse.urlparse('http://example.com/|')

will return

ParseResult(scheme='http', userinfo='', host='example.com', port=None, path='/|', query=None, fragment=None)

In Firefox and Google Chrome, | is percent-encoded:

encodeURI('http://example.com/|') 

will return

"http://example.com/%7C"

In the requests library, | is also percent-encoded:

requests.utils.requote_uri('http://example.com/|')

will return

'http://example.com/%7C'

The rfc3986 library also percent encodes |:

rfc3986.urlparse('http://example.com/|')

will return

ParseResult(scheme='http', userinfo=None, host='example.com', port=None, path='/%7C', query=None, fragment=None)

Using urllib itself, | also seems to be percent-encoded for path components:

urllib.parse.quote('/|')

will return

'/%7C'

I'm fairly certain that I've interpreted this RFC right, and I think that | should be excluded from the PATH_SAFE set here. Here is its current value: "!$%&'()*+,-./0123456789:;=@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_abcdefghijklmnopqrstuvwxyz|~".

Potential Fix: nathaniel-daniel@a2f327f

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0