You can choose a visual scraper like IRobotSoft scraper to handle such problems. It includes many options you can easily use to navigate through next pages. The query for you next link is:
<a (tx like 'Vorw%')>
Question
i having a problem in accessing the next page in sub category and i need to soup the information page by page. But in my code i able to soup only the first page of the each sub category. Can anyone help me how to access the next page in subcategories. Thank you in advanced.
import re
import urllib
import urllib2
import time
import sys
from datetime import datetime, date
from BeautifulSoup import BeautifulSoup
#list
categories=[]
details=[]
tools=[]
pages_details=[]
division =[]
links=[]
link =[]
subcategory=[]
info=[]
Data ={}
url = 'http://www.share-hit.de/' # website
pageHTML = urllib.urlopen(url).read()
soup = BeautifulSoup(pageHTML)
#find the main category and append to a list
for category in soup.find('td',{'class':'linkmenu'}).findAll('a'):
categories.append('http://www.share-hit.de/' + category['href']) # print all the categories
print categories
try:
for i in categories:
if len(subcategory) != " ":
del subcategory[:]
try:
pageHTML = urllib2.urlopen(i).read()
soup2 = BeautifulSoup(pageHTML)
table = soup2.find('table', attrs={'id':'kategoriemenu'})
division = table.findAll('div',attrs={'align':'left'})
# find the sub category of each main category
for sub_cate in division:
try:
subcategory.append('http://www.share-hit.de/' + sub_cate.find("a")["href"])
print subcategory
# Inside each sub category get the application link of first page only
# I need to know the way how to find the next page in each sub category
for apps in subcategory:
pageHTML = urllib2.urlopen(apps).read()
soup2 = BeautifulSoup(pageHTML)
tools = soup2.findAll('span', attrs={'class':'Stil2'})
if len(links) != " ":
del links[:]
# append list of application on each page
for list in tools:
try:
links.append('http://www.share-hit.de/' + list.find("a")["href"])
print links
except Exception:
print 'No Apps'
#Details From the application link i manage to soup the details of each application
except Exception:
print 'No Sub Categories'
except Exception:
print 'No Categories'
except Exception:
print 'Finish'
Solution
You can choose a visual scraper like IRobotSoft scraper to handle such problems. It includes many options you can easily use to navigate through next pages. The query for you next link is:
<a (tx like 'Vorw%')>