8000 Issue parsing base64/png · Issue #7 · goose3/goose3 · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Issue parsing base64/png #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account r 8000 elated emails.

Already on GitHub? Sign in to your account

Closed
blakdronzer opened this issue Jul 13, 2017 · 4 comments
Closed

Issue parsing base64/png #7

blakdronzer opened this issue Jul 13, 2017 · 4 comments

Comments

@blakdronzer
Copy link
blakdronzer commented Jul 13, 2017

Was trying to work around on the following url -
https://www.missmalini.com/2017/07/13/tried-tested-dip-powder-nails/

There i found -it started failing in as the file have a background image with background:url(data:image/png;base64 .... )

article = g.extract(url=url)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/goose3/__init__.py", line 75, in extract
    return self.crawl(cc)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/goose3/__init__.py", line 91, in crawl
    raise e
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/goose3/__init__.py", line 85, in crawl
    article = crawler.crawl(crawl_candidate)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/goose3/crawler.py", line 120, in crawl
    return self.process(raw_html, parse_candidate.url, parse_candidate.link_hash)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/goose3/crawler.py", line 189, in process
    self.get_image()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/goose3/crawler.py", line 211, in get_image
    self.article.top_image = self.image_extractor.get_best_image(doc, top_node)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/goose3/extractors/images.py", line 82, in get_be
st_image
    image = self.check_large_images(topNode, 0, 0)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/goose3/extractors/images.py", line 116, in check
_large_images
    good_images = self.get_image_candidates(node)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/goose3/extractors/images.py", line 277, in get_i
mage_candidates
    good_images = self.get_images_bytesize_match(filtered_images)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/goose3/extractors/images.py", line 293, in get_i
mages_bytesize_match
    local_image = self.get_local_image(src)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/goose3/extractors/images.py", line 337, in get_l
ocal_image
    return ImageUtils.store_image(self.fetcher, self.link_hash, src, self.config)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/goose3/utils/images.py", line 64, in store_image
    data = http_client.fetch(src)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/goose3/network.py", line 52, in fetch
    response = self._connection.get(url, timeout=self.config.http_timeout, headers=self.headers)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/sessions.py", line 515, in get
    return self.request('GET', url, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/sessions.py", line 502, in request
    resp = self.send(prep, **send_kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/sessions.py", line 606, in send
    adapter = self.get_adapter(url=request.url)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/sessions.py", line 697, in get_adapter
    raise InvalidSchema("No connection adapters were found for '%s'" % url)
requests.exceptions.InvalidSchema: No connection adapters were found for 'data:image/png;base64, iVBORw0KGgoAAAANSUhEUgAAAB0AAAAaCAYAA
ABLlle3AAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJbWFnZVJlYWR5ccllPAAABSBJREFUeNrkVltPY1UY/VpaaMu1kFAiw6Wl3Nva0mHoTEAGhsgwCSTERIwTJ4Y3n3zwB/ADzPg
8Og9GzBhJxmBICI4iNShyKVIuQoGWW4FyGSiFMlCgBdfetgQYCzzpgydpetpzzl7fWutb3z6Ck5MT+rcPIf0Hh+jsj4qKytPzo6MjEolEdHx8TEKhkH+gSrzf7087ODi4he97Y
rHYGBUVlS4URjQKhYJvLy4eUtFs7goPevEQCATsS40Cbh0eHhpRgFYmkxkzM5WJhYUFpNEUUmdnJw0NDX8MiGxcjzz7rEQiQd2iafz88ipQGSrMB5Myn2+/FAw1cnmiKiMjQ6z
Xv0l6vZ6USiUpFClgL6BXr/ZoeHjEdPNmsQmsg8oIKBA4oZkZB21ubvZeCorqPgWr2wAtUKmUcq1WS7m5uWCkodTUVIqOjn6twry8XPwfQ48efUA6nY5LygTy+Q6pqamJ5ufn9
y71dH9//yOTyRTd0PAuxcbGUFSUhNRq9aVNkZ+fT3FxcTQ2NkbFxcVnCAhDvp5c2r3w7afkZAUBGLK46cmTz2lxcTEsIJOSsddqNTQ4+AckDZxeY+fh4ngOFKa3Dw9baWNjA93
rp6SkJBoYsLBiTu/xer00OjpGLS0t9PjxZ+RwzFBZWSn8m0GByxzoquyfkzcyMrLf5XJt2e0OeWZmBuQyQnIfLS8v09zcHAfY2dkhmSya2HVmQ0pKCkmlEoqIiKCnT7/gdshkU
qqpqeH/XQmK3C14PB7r+Ph4pU73HnV1ddHCwiL5/YeUmJjIG4XFRCKRopAlmp2dQ1yGqLy8nNDd5PFsgXUZW4cDnpX7ssjsQOJ+q3Wosrr6bf7w3bvllJOTzc8nJmz04sWPtLq
6yoeHXJ6A7s2n+Ph4MhiKkFl2bYUSEhJwTR7K+eWgzAssPgB5fVhUUlVVRf39/dTc/DVYeNA0MsrOzqaiIgPFxMTwj9PppOnpaW5Fc/NXAP6Z7ty5Tenp6dcDDUo8CgAnJM5h/
jF2BoMebHN4Ubu7Xt4wvb19tL6+zp8pL3+LsrJUyPIbGBrJuN/A5Q3XUK+B4mYnvBixWCw5tbW1nBmbLMxfl2sVjbXHu1qtzqIHD2q4lCxWLJdGo5FaW1uR8Viqq6vjk+lKUCY
HqvNjnP06NTX9DpgIzeZfuE8MnDG6cSOVe2i326m7uxuN5uQN8/Dh+3xEdnT8wHMeGonX8jQYnd9crpVteCW/f78a3evnBc3OzlJbWxutra3xRVUqFWfLimAdzdhjyPNma2z88
HQqXSlv8JjE9vXn+PhEmVaro+fPv+P5TE5OxtjLo/r6egwSMTxdQ3Rc1NPzO98GWW41mgKy2Wz8+rU9Dcq8D2J9o6OjZSw6paWlPDa8mslJ+GvmA4NNqpQUBaRXg3UWbFgDkIB
WVlZoe3v7esPhwkjscTjsnwQCx8K4uFh69uwbHn62o6SlpVNlZQWX1evdAbNJam9vR3TsPL/MW5TOs/1PbMOCospBt3trYWrKpmQDgA11hUKBjoygpaVFvA2YyWod5uPR7XbzY
WAylVBJSQkviv3/8uU68z7q2qDYvFdR5QD8UjLpWCxaW78n5Jc10iqkDYCJgHnJdhqxWEQjIyNksQzyxvt7BJ5Apbj1a3Vv0NcAJo65r6+/AQvtYWDYsJCVxQlgNqlU6sf9gtA
zXu/uaZeHJhHui8Cp58qcXmA7BEbtADMDpAOST7FiQnvlhSL5y1u40XcO53/z3vuXAAMAGHpywSLn9WUAAAAASUVORK5CYII='

Well how can we address the following ?

@lababidi
Copy link
Contributor

Apologies for the slow response. Have you found how to tackle the issue?

@lababidi
Copy link
Contributor
lababidi commented Oct 3, 2017

@blakdronzer try the following. feel free to submit a PR with any attempts to get this working.

https://stackoverflow.com/questions/33048636/saving-image-from-url-using-python-requests-url-type-error

@barrust
Copy link
Collaborator
barrust commented Nov 22, 2017

@blakdronzer I started to look into this issue and I was unable to reproduce the issue. Can you confirm that it is still happening?

@barrust
Copy link
Collaborator
barrust commented Dec 5, 2017

New example of the issue: https://varvy.com/pagespeed/base64-images.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
0