requests.get() not retrieving correct url in python 2.7


I'm trying to access url and then parse it's contents based on tags. My code:

page = requests.get('')
self.tree = html.fromstring(page.content)
names = self.tree.xpath("//span[@class='truncate_name']//text()")

Problem: variable page is containing data that of url '' I'm new to python 2.7. The whole encoding issues in file. I'm using unicode-escape as my default encoding. Encoding on resource at is utf-8 whereas encoding of resource at is variable. Is this has something to do with the problem? Please suggest solution for this.

Show source
| osx   | python-2.7   | python-requests   2016-09-29 05:09 1 Answers

Answers to requests.get() not retrieving correct url in python 2.7 ( 1 )

  1. 2016-09-29 10:09

    It has nothing to do with encoding , what you are looking for is dynamically created so not in the source you get back. A series of ajax calls populates the data. To get the product names etc.. from the carousel where you see the span.truncate_name in your browser:

    params = {"page": "products",
              "locale": "en_US",
              "doctype": "DOWNLOADS",
    js = requests.get("", params=params).content

    Normally we could call .json() on the response object but in this case we need to use "unicode_escape" then call loads:

    from json import loads, dumps
    js2 = loads(js.decode("unicode_escape"))

    Which gives you a huge dict of data like:

    {u'products': [{u'name': u'Servers and Enterprise', u'urlpath': u'serversandenterprise', u'order': u'', u'products': .............

    You can see the request in chrome tools:

    enter image description here

    We leave off callback:ACDow‌​nloadSearch.customCa‌​llBack as we want to get back valid json.

Leave a reply to - requests.get() not retrieving correct url in python 2.7

◀ Go back