Well, I don't know if I'm missing something, but it's working for me using requests:
import requests
# Getting html code
url = "http://store.steampowered.com/app/251570/"
html = requests.get(url).text
And even more, the data requested is in json format, so it's easy to extract it in this way:
# Extracting javscript object (a json like object)
start_tag = 'InitAppTagModal( 251570,'
end_tag = '],'
startIndex = html.find(start_tag) + len(start_tag)
endIndex = html.find(end_tag, startIndex) + len(end_tag) - 1
raw_data = html[startIndex:endIndex]
# Load raw data as python json object
data = json.loads(raw_data)
You will see a beatiful json object like this (this is the info that you need, right?):
[
{
"count": 283,
"browseable": true,
"tagid": 1662,
"name": "Survival"
},
{
"count": 274,
"browseable": true,
"tagid": 1659,
"name": "Zombies"
},
{
"count": 248,
"browseable": true,
"tagid": 1702,
"name": "Crafting"
}......
I hope it helps....
UPDATED:
Ok, I see your problem right now, it seems that the problem is in the page 224600. In this case the webpage requires that you confirm your age before to show you the games info. Anyway, easy to solve it just posting the form that confirm the age. Here is the code updated (and I created a function):
def extract_info_games(page_id):
# Create session
session = requests.session()
# Get initial html
html = session.get("http://store.steampowered.com/app/%s/" % page_id).text
# Checking if I'm in the check age page (just checking if the check age form is in the html code)
if ('<form action="http://store.steampowered.com/agecheck/app/%s/"' % page_id) in html:
# I'm being redirected to check age page
# let's confirm my age with a POST:
post_data = {
'snr':'1_agecheck_agecheck__age-gate',
'ageDay':1,
'ageMonth':'January',
'ageYear':'1960'
}
html = session.post('http://store.steampowered.com/agecheck/app/%s/' % page_id, post_data).text
# Extracting javscript object (a json like object)
start_tag = 'InitAppTagModal( %s,' % page_id
end_tag = '],'
startIndex = html.find(start_tag) + len(start_tag)
endIndex = html.find(end_tag, startIndex) + len(end_tag) - 1
raw_data = html[startIndex:endIndex]
# Load raw data as python json object
data = json.loads(raw_data)
return data
And to use it:
extract_info_games(224600)
extract_info_games(251570)
Enjoy!