extracting more information from webdriver

Question 1

This will get you on your way, I would use while loops using sleep to get all the page loaded before getting the information from the page.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
driver = webdriver.Firefox()
driver.get("http://www.flipkart.com/mobiles/pr?   p%5B%5D=sort%3Dfeatured&sid=tyy%2C4io&ref=659eb948-c365-492c-99ef-59bd9f0427c6")
time.sleep(3)
for i in range(5):
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") # scroll to bottom of page
    time.sleep(2)
driver.find_element_by_xpath('//*[@id="show-more-results"]').click() # click load more  button, needs to be done until you reach the end.
elem=[]
elem=driver.find_elements_by_xpath('.//div[@class="pu-title fk-font-13"]')
for e in elem:
   print e.text

Question 2

Ok this is going to be a major hack but here goes... The site gets more phones as you scroll down by hitting an ajax script giving you 20 more each time. The script its hitting is this:

http://www.flipkart.com/mobiles/pr?p[]=sort%3Dpopularity&sid=tyy%2C4io&start=1&ref=8aef4a5f-3429-45c9-8b0e-41b05a9e7d28&ajax=true

Notice the start parameter you can hack this into what you want with

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Firefox()

num = 1
while num <=2450:
    """
    This condition will need to be updated to the maximum number
    of models you're interested in (or if you're feeling brave try to extract
    this from the top of the page)
    """
    driver.get("http://www.flipkart.com/mobiles/pr?p[]=sort%3Dpopularity&sid=tyy%2C4io&start=%f&ref=8aef4a5f-3429-45c9-8b0e-41b05a9e7d28&ajax=true" % num)
    elem=[]
    elem=driver.find_elements_by_xpath('.//div[@class="pu-title fk-font-13"]')
    for e in elem:
        print e.text
    num += 20

You'll be making 127 get requests so this will be quite slow...

Question 3

You can get full source of the page and do all the analysis based on it:

page_text = driver.page_source

The page shall contain current content including whatever was generated by JavaScript. Be carefull to get this content at the moment, all the rendering is completed (you may e.g. wait for presence of some string, which gets rendered at the end).