Frage

I have to parse the following HTML Page:

This is my code of parsing using Fizzler, what I want to get is the title, rates, days (sometimes null) and price; the second price after span.But when I run my code, it just could get 2 objects from ListRoomDetails, as following, we have Room Type 1 promotion 10 % and Room type 2 60%, but it skipped the Room type 2 60 % and get the first element of listRoomDetails (Room Type 1 promotion 90%).

I wish to keep all of the Room Type in two ListRoomDetails div

Is there also any way to detect whether or not the days value exists, if it does, get it, otherwise, ignore it.

//HTML File
<div class="ListItem">
     <div class="ListRoom">
          <span class="title">
             <strong>Super Room</strong>
          </span>
      </div>            

     //section to get details of room
     <div class="listRoomDetails">
        <table>
            <thead>
                <tr>
                    Days
                </tr>
            </thead>
            <tbody>
                <tr>
                    <td class = "rates">
                        Room Type 1 promotion 10%
                    </td>
                    <td class = "days">
                        261.00
                    </td>
                                        <td class = "days">

                    </td>
                    <td class="price">
                        <span>290.00&euro;</span>
                        261.00&euro; //get this money
                    </td>

                </tr>
                <tr>
                    <td class = "rates">
                        Room Type 2 promotion 60%
                    </td>
                                        <td class = "days">

                    </td>
                    <td class = "days">
                        261.00
                    </td>
                    <td class="price">
                        <span>290.00&euro;</span>
                        261.00&euro; // get this money
                    </td>

                </tr>
            </tbody>
    </div>
    <div class="listRoomDetails">
        <table>
            <thead>
                <tr>
                    Days
                </tr>
            </thead>
            <tbody>
                <tr>
                    <td class = "rates">
                        Room Type 1 promotion 90%
                    </td>
                                         <td class = "days">

                    </td>
                    <td class = "rates">
                        261.00
                    </td>
                    <td class="price">
                        <span>290.00&euro;</span>
                        261.00&euro;
                    </td>
                </tr>
                <tr>
                    <td class = "rates">
                        Room Type 2 promotion 0 % // type of room
                    </td>
                    <td class = "days">
                        261.00
                    </td>
                    <td class="price">
                        <span>290.00&euro;</span>
                        261.00&euro;
                    </td>

                </tr>
            </tbody>
        </div>
   </div>

Source Code:

        var source = File.ReadAllText("TestHtml/HotelWithAvailability.html");

        var html = new HtmlDocument(); // with HTML Agility pack
        html.LoadHtml(source);

        var doc = html.DocumentNode;

        var rooms = (from listR in doc.QuerySelectorAll(".ListItem")
                     from listR2 in doc.QuerySelectorAll("tbody")
                     select new HotelAvailability
                     {
                         HotelName = listR.QuerySelector(".title").InnerText.Trim(), //get room name

                         TypeRooms = listR2.QuerySelector("tr td.rates").InnerText.Trim(), //get room type

                         Price = listR2.QuerySelector("tr td.price").InnerText.Trim(), //

                     }).ToArray();
War es hilfreich?

Lösung

You should query for room details of current room (i.e. ListItem):

var rooms = from r in doc.QuerySelectorAll(".ListItem")
            from rd in r.QuerySelectorAll(".listRoomDetails tbody tr")
            select new HotelAvailability {
                HotelName = r.QuerySelector(".title").InnerText.Trim(),
                TypeRooms = rd.QuerySelector(".rates").InnerText.Trim(),
                Price = rd.QuerySelector(".price span").InnerText.Trim()
             };

For your sample html it produces:

[
  {
     HotelName: "Super Room",
     Price: "290.00&euro;",
     TypeRooms: "Room Type 1 promotion 10%"
  },
  {
    HotelName: "Super Room",
    Price: "290.00&euro;",
    TypeRooms: "Room Type 2 promotion 60%"
  },
  {
    HotelName:  "Super Room",
    Price: "290.00&euro;",
    TypeRooms: "Room Type 1 promotion 90%"
  },
  {
    HotelName: "Super Room",
    Price: "290.00&euro;",
    TypeRooms: "Room Type 2 promotion 0 % // type of room"
  }
]
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top