Missing source page information using urllib2

Question 1

Well, I don't know if I'm missing something, but it's working for me using requests:

import requests

# Getting html code
url = "http://store.steampowered.com/app/251570/"
html = requests.get(url).text

And even more, the data requested is in json format, so it's easy to extract it in this way:

# Extracting javscript object (a json like object)
start_tag = 'InitAppTagModal( 251570,'
end_tag = '],'
startIndex = html.find(start_tag) + len(start_tag)
endIndex = html.find(end_tag, startIndex) + len(end_tag) - 1
raw_data = html[startIndex:endIndex]

# Load raw data as python json object
data = json.loads(raw_data)

You will see a beatiful json object like this (this is the info that you need, right?):

[
  {
    "count": 283,
    "browseable": true,
    "tagid": 1662,
    "name": "Survival"
 },
 {
    "count": 274,
    "browseable": true,
    "tagid": 1659,
    "name": "Zombies"
 },
 {
   "count": 248,
   "browseable": true,
   "tagid": 1702,
   "name": "Crafting"
 }......

I hope it helps....

UPDATED:

Ok, I see your problem right now, it seems that the problem is in the page 224600. In this case the webpage requires that you confirm your age before to show you the games info. Anyway, easy to solve it just posting the form that confirm the age. Here is the code updated (and I created a function):

def extract_info_games(page_id):
    # Create session
    session = requests.session()

    # Get initial html
    html = session.get("http://store.steampowered.com/app/%s/" % page_id).text

    # Checking if I'm in the check age page (just checking if the check age form is in the html code)
    if ('<form action="http://store.steampowered.com/agecheck/app/%s/"' % page_id) in html:
            # I'm being redirected to check age page
            # let's confirm my age with a POST:
            post_data = {
                     'snr':'1_agecheck_agecheck__age-gate',
                     'ageDay':1,
                     'ageMonth':'January',
                     'ageYear':'1960'
            }
            html = session.post('http://store.steampowered.com/agecheck/app/%s/' % page_id, post_data).text


    # Extracting javscript object (a json like object)
    start_tag = 'InitAppTagModal( %s,' % page_id
    end_tag = '],'
    startIndex = html.find(start_tag) + len(start_tag)
    endIndex = html.find(end_tag, startIndex) + len(end_tag) - 1
    raw_data = html[startIndex:endIndex]

    # Load raw data as python json object
    data = json.loads(raw_data)
    return data

And to use it:

extract_info_games(224600)
extract_info_games(251570)

Enjoy!

Question 2

When using urllib2 and read(), you will have to read repeatedly in chunks till you hit EOF, in order to read the entire HTML source.

import urllib2  
url = "http://store.steampowered.com/app/224600/" #7 Days to Die page
url_handle = urllib2.urlopen(url)
data = ""
while True:
    chunk = url_handle.read()
    if not chunk:
        break
    data += chunk

An alternative would be to use the requests module as:

import requests
r = requests.get('http://store.steampowered.com/app/251570/')
soup = BeautifulSoup(r.text)