Question

I am trying to extract information from a HTML page using Vb script. This is the HTML page from which I am trying to extract the information.

<div id="profile-education">

  <div class="position  first education vevent vcard" id="xxxxxx">
  University 1
  <span class="degree">Ph.D.</span>
  <span class="major">Computer Science</span>
  <p class="period">
  <abbr class="dtstart" title="2005-01-01">2005</abbr> &#8211; <abbr class="dtend" 
  title="2012-12-31">2012</abbr>
  </div>          

  <div class="position  education vevent vcard" id="xxxxxx">  
  University 2                  
  <span class="degree">M.Eng.</span> 
  <span class="major">Computer Science</span>
  <p class="period">
  <abbr class="dtstart" title="2000-01-01">2000</abbr> &#8211; <abbr class="dtend" 
  title="2004-12-31">2004</abbr>
  </p>
  </div>

</div>

I want to extract the information in the below format.

  • University Name: University 1
  • Degree Name: Phd
  • Major: Computer Science
  • Period: 2005 - 2012

  • University Name: University 2

  • Degree Name: M.Eng
  • Major: Computer Science
  • Period: 2000 - 2004

In my VB script, I have the following code which extracts the entire information as a single variable.

Dim openedpage as String
openedpage = iedoc1.getElementById("profile-education").innerText

However, if I use the following statement in my vb Script, I can get a particular span information.

openedpage = iedoc1.getElementById("profile-education").getElementsByTagName("span")
(0).innerText

The above code gives me Phd as the output. However, I will not know the total spans beforehand and so I cannot simply give span(0) and span(1) in my code. Also, I would like to extract the information for all div tags and I won't be knowing this information either. Basically, I want some loop structure to iterate through the div tags with the id profile-education from which I should be able to extract multiple div and span information.

Was it helpful?

Solution

Dim divs, div

set divs = iedoc1.getElementById("profile-education").getElementsByTagName("div")

for each div in divs
    debug.print "*************************************"
    debug.Print div.ChildNodes(0).toString
    debug.print div.getElementsByTagName("span")(0).innerText
    debug.print div.getElementsByTagName("span")(1).innerText
    '  etc...
next div
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top