Extract information from a webpage [closed]

https://stackoverflow.com/questions/23648709

22-07-2023
|

質問

Closed. This question needs to be more focused. It is not currently accepting answers.

Want to improve this question? Update the question so it focuses on one problem only by editing this post.

Closed 9 years ago.

Please could you help me how to get the mobile model and it its price from the following using python. I wanted to extract the name Moto E(Black) and Rs.6999 from the page. I tried the same using selenium in Python(I am a beginner to selenium). Here is my code. Please help me out.

from selenium import webdriver 
from selenium.webdriver.common.keys import Keys

driver = webdriver.Firefox()
driver.get("http://www.kart123.com/mobiles/pr?p%5B%5D=sort%3Dfeatured&sid=tyy%2C4io&ref=68c7d088-ae7f-4310-aa4c-a7ee176d168d")
elem=driver.find_element_by_xpath("//div[@class='product-unit unit-4 
browse-product']")
elem1=elem.find_element_by_xpath("//div[@class='pu-details lastUni']")
elem2=elem1.find_element_by_xpath("//div[@class='pu-title
fk-font-13']") print
elem2.find_element_by_xpath(".//a[@class='fk-display-block']").text<br>
driver.close()

<div class=' product-unit unit-4  browse-product  ' data-pid="MOBDVHC6XKKPZ3GZ" data-tracking-products=";MOBDVHC6XKKPZ3GZ;1;6999;;eVar22=Mobile" data-size="store-grid-new-4">
    <div class='pu-visual-section'>
        <a data-tracking-id="prd_img"  class='pu-image fk-product-thumb ' href="/moto-e/p/itmdvuwsybgnbtha?pid=MOBDVHC6XKKPZ3GZ&srno=b_1&ref=83c37824-b74d-4121-8be0-27731ddccde2">
        <img alt="Moto E: Mobile" data-error-url="http://img1a.flixcart.com/mob/thumb/mobile.jpg" onload="img_onload(this);" onerror="img_onerror(this);" src="http://img5a.flixcart.com/image/mobile/3/g/z/motorola-xt1022-125x125-imadvvfknshcywk5.jpeg"></img>
        </a>
    </div>
    <div class="pu-details lastUnit">
        <div class="pu-title fk-font-13">
            <a class="fk-display-block" data-tracking-id="prd_title" href="/moto-e/p/itmdvuwsybgnbtha?pid=MOBDVHC6XKKPZ3GZ&srno=b_1&ref=83c37824-b74d-4121-8be0-27731ddccde2" title="*Moto E (Black)*">
            Moto E (Black)
            </a>
        </div>
        <div class='pu-variants  fk-font-11'>
            and <a href="/moto-e/p/itmdvuwsybgnbtha?pid=MOBDVHC6XKKPZ3GZ&srno=b_1&ref=83c37824-b74d-4121-8be0-27731ddccde2">1 more variant</a>
        </div>
        <div class="pu-extra fk-font-11">
        </div>
        <div class="pu-rating" data-ratingfor="ITMDVUWSYBGNBTHA#MOBDVHC6XKKPZ3GZ#moto-e">
            <div class='fk-stars-small' title ='4.7 stars'>
                <div class='rating' style='width:94%;'>
                </div>
            </div>
            (852 ratings)<span class="ugc-summary-icon"></span>
        </div>
        <div class="pu-price">
            <div class="pu-border-top">
                <div class="pu-final">
                    <span class="fk-font-17 fk-bold">**Rs. 6999**</span>
                </div>
                <div class="pu-emi fk-font-12">EMI from Rs. 626</div>
                <div class="pu-personal">
                </div>
                <ul class="pu-offers">
                </ul>
            </div>
        </div>
        <div class="pu-border-top">
            <ul class="pu-usp">
                <li><span class="text">Android v4.4 OS</span></li>
                <li><span class="text">4.3-inch Touchscreen</span></li>
                <li><span class="text">1 GB RAM</span></li>
                <li><span class="text">Dual SIM (GSM + GSM)</span></li>
            </ul>
        </div>
        <div class="pu-compare pu-border-top">
            <input type="checkbox" class="compare-checkbox" data-uniqid="83c37824-b74d-4121-8be0-27731ddccde2" id="MOBDVHC6XKKPZ3GZ" display_vertical='Mobiles' vertical="mobile"  vertical_url_map='/mobiles'><label for="MOBDVHC6XKKPZ3GZ" class="compare-label">Add to compare</label>
        </div>
    </div>
</div>
</div>
<div class="gd-col gu3">

解決

There are a few tools out there for what you're trying to do.

Scrapy (http://doc.scrapy.org) is a great tool for writing web crawlers and keeping you data up to date. You can use XPath notation to access data (for instance div[@class='pu-final']/ span/text() would give you Rs. 6999).

If you don't all Scrapy's features and don't really need performance (like a one-time import script), there is also BeautifulSoup (http://www.crummy.com/software/BeautifulSoup/bs4/doc/) which is really simple to use.

Those are just two of the many tools you could use, but they're pretty well documented. I'm sure many people here could recommend you some other great tools, make your choice according to what fits your needs best.

Good luck.

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow