-
Notifications
You must be signed in to change notification settings - Fork 660
Baidu: signature authentication support #298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
changes for baidu geocoder & pytest For signature authentication, see http://lbsyun.baidu.com/index.php?title=lbscloud/api/appendix
I have done the pytest with my api_keys with/without security key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I've left a couple of notes. Overall it looks good!
One thing I'm wondering about is the status
code changes: was the source code incorrect before? I mean, it used to be strings, but now they're integers. Was it a bug, or a change in the API?
geopy/geocoders/baidu.py
Outdated
@@ -78,6 +84,8 @@ def __init__( | |||
) | |||
self.api_key = api_key | |||
self.api = '%s://api.map.baidu.com/geocoder/v2/' % self.scheme | |||
self.api_path = self.api.split('baidu.com')[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe that would be better?
self.api_path = '/geocoder/v2/'
self.api = '%s://api.map.baidu.com%s' % (self.scheme, self.api_path)
geopy/geocoders/baidu.py
Outdated
raise GeocoderQueryError( | ||
'IP/SN/SCODE/REFERER Illegal:' | ||
) | ||
elif status == '2xx': | ||
elif status in range(200, 300): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Python2 range
is a list, which seems inefficient.
What about this?
elif 200 <= status <= 300:
test/geocoders/baidu.py
Outdated
@@ -27,6 +27,7 @@ class BaiduTestCase(GeocoderTestBase): | |||
def setUpClass(cls): | |||
cls.geocoder = Baidu( | |||
api_key=env['BAIDU_KEY'], | |||
security_key=env['BAIDU_SEC_KEY'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to create a separate test class instead, which does use the SK, so we could test both setups. I think that having two simple calls (1 geocode and 1 reverse) for this new class would be enough to be sure that SK works.
Also, if I understood correctly, that should be a different BAIDU_KEY
(for which the SK auth is enabled), shouldn't it? If so, it should have a different ENV name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right. Are BAIDU_KEY_REQUIRES_SK and BAIDU_SEC_KEY ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, that's fine.
""" | ||
|
||
# Since quoted_params is escaped by urlencode, don't apply quote twice | ||
raw = self.api_path + '?' + quoted_params + self.security_key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the docs at http://lbsyun.baidu.com/index.php?title=lbscloud/api/appendix , the quote(params, safe="/:=&?#+!$,;'@()*[]")
code is used for getting a raw string, while there we have urlencode(params)
. Wouldn't that cause different hashes for peculiar requests? I think the relevant test cases should be added. For example, a one with a comma: locator.geocode("beijing, china")
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The quoted_params
in self._url_with_signature
was given by urlencode
. urlencode
has already called quote_plus
on parameters internall
8000
y. Except parameters, quote
never escape the characters in site/path/key parts, because they are not special characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the list of safe
characters match the one used in urlencode
internally? For example, I see that comma is in that list, however urlencode uses percent encoding on commas, IIRC.
A test wouldn't harm in either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this patch code doesn't match the given sample code in docs. But it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sure works, but for some cases it might not work. Could you please add the test with a comma, and fix the code, if that fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to stick to docs. I'm going to change urlencode(params)
to urlencode(params, safe="!*'();:@&=+$,/?#[]", quote_via=quote)
. Will you accept this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the test first :)
But yeah, that would be better than crafting the query string manually. Even better if all of this is not required and a simple urlencode
works.
Comments for See 状态码定义 ( |
I'm not sure if I understood your reply regarding I've just checked, API returns ints. It means that the current implementation (I mean, without this patch) doesn't seem to respect the |
May the new Invalid AK test case help. It's hard to cover all failure status tests due to the poor Baidu docs. |
For |
test/geocoders/baidu.py
Outdated
else: | ||
assert_raise_msg = self.assertRaisesRegexp | ||
|
||
with assert_raise_msg(GeocoderAuthenticationFailure, 'Invalid AK'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way to do the same without tinkering around compatibility:
with self.assertRaises(GeocoderAuthenticationFailure) as cm:
...
self.assertEqual(str(cm.exception), 'Invalid AK')
To me it looks better than assertRaisesRegex
. What do you think?
Also I'm wondering: does the self.reverse_run
command ever run? I feel that self.geocode_run
causes an exception being thrown, and the self.reverse_run
call is never tested. To fix that, each call should be wrapped in their own assertRaises
/assertRaisesRegex
context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, you have a better solution.
self.reverse_run
cannot be reached, and I will remove this line afterwards.
""" | ||
|
||
# Since quoted_params is escaped by urlencode, don't apply quote twice | ||
raw = self.api_path + '?' + quoted_params + self.security_key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sure works, but for some cases it might not work. Could you please add the test with a comma, and fix the code, if that fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment this looks good to me, except the urlencode vs quote_plus problem, discussed in the thread #298 (comment)
A test with a comma in the query is absolutely required, because this is an edge case, dividing the two possible implementations. So in order to avoid any bugs for now and in the future, such test must be added.
Do you think the comma test is still a must? |
Sorry for the delay. Yes, a comma test is definitely a must. Do you need any help or a guidance with it? Please don't hesitate to ask. Basically all what's needed here is a simple forward geocoding request, which contains a comma in its query. If signature is wrong, an exception would be thrown and the test would fail, proving that there's a bug in the signature implementation. The cf9a30e...1009fe2 change is somewhat disturbing to me. I would better to not construct the query string manually by escaping individual parts and joining them with It's important to distinguish between input for the signature algorithm and resulting URL. The former requires the aforementioned |
I append 1009fe2 because cf9a30e failed in py2.7 & py 3.4. No safe parameter in py2.7; New quote_via parameter since 3.5. |
Would you mind if I finished this by myself? :) |
Never mind. It's my pleasure. |
@KostyaEsmukov Is new test appropriate? |
I'm wondering if at one point Baidu changes their search algorithm such as the Actually the reason why I wanted to take this over is that there clearly is a confusion between composing a raw body for the signature and urlencoding the params for url. As I said in my previous comment #298 (comment) , urlencode should be used for the latter, instead of manual string construction. But for the raw body, apparently, the string must be constructed manually (unless, as said before, their backend uses urlencoded data for signature verification, where this It's just easier for me to finish this by myself than trying to explain this in text. Sorry for that. You've done a great job, a little is left. Is that okay if I finish this by myself? 😉Of course you'd still be credited as a contributor and would get a green box. 😀 |
It's ok you take it over and go for final shot. :-) |
changes for baidu geocoder & pytest
For signature authentication, see
http://lbsyun.baidu.com/index.php?title=lbscloud/api/appendix