Question

So I'm trying to write a greasemonkey script to place tiles for me in an online game. I've got the placing tiles figured out, but in order to extend the script I need to limit the loop to the number of moves. I can't figure out the best way to extract this information from the page's html:

<h2>5</h2>Level:<font size="4px" color="red"> 1455</font><br><br>Moves:<font size="4px" color="red"> 0</font><br>Total:<font size="4px" color="red"> 688</font><br><br><a href="logout.php">

I'm just looking for pointers on how to tackle this beast. Regex?

edit: full code for this div is

   <div id="info">



<img src="images/mmosbg_title.png" onclick = "getinfo('boardinfo.php', 'info')"; height="48" width="138" border="0"><br><br><a href="board5.php?size=5&border=0"><img src="boxes/990000.gif" border="0" width="5 px" height="5 px" onmouseover="Tip('Micro Board Size', BGCOLOR, '#FFCC00', WIDTH, -200, OPACITY, 95, SHADOW, true, SHADOWWIDTH, 7)" onmouseout="UnTip()"></a><a href="board5.php?size=10&border=0"><img src="boxes/990000.gif" border="0" width="10 px" height="10 px" onmouseover="Tip('Small Board Size', BGCOLOR, '#FFCC00', WIDTH, -200, OPACITY, 95, SHADOW, true, SHADOWWIDTH, 7)" onmouseout="UnTip()"></a><a href="board5.php?size=16"><img src="boxes/990000.gif" border="0" width="16 px" height="16 px" onmouseover="Tip('Medium Board Size', BGCOLOR, '#FFCC00', WIDTH, -200, OPACITY, 95, SHADOW, true, SHADOWWIDTH, 7)" onmouseout="UnTip()"></a><a href="board5.php?size=32"><img src="boxes/990000.gif" border="0" width="32 px" height="32 px" onmouseover="Tip('Large Board Size', BGCOLOR, '#FFCC00', WIDTH, -200, OPACITY, 95, SHADOW, true, SHADOWWIDTH, 7)" onmouseout="UnTip()"></a><h2>5</h2>Level:<font size="4px" color="red"> 1455</font><br><br>Moves:<font size="4px" color="red"> 0</font><br>Total:<font size="4px" color="red"> 688</font><br><br><a href="logout.php"><img src="images/logout.png" border="0" onmouseover="Tip('Logout', BGCOLOR, '#FFCC00', WIDTH, -200, OPACITY, 95, SHADOW, true, SHADOWWIDTH, 7)" onmouseout="UnTip()"></a><a href="history.php"><img src="images/pastwinners.png" border="0" onmouseover="Tip('Past Winners', BGCOLOR, '#FFCC00', WIDTH, -200, OPACITY, 95, SHADOW, true, SHADOWWIDTH, 7)" onmouseout="UnTip()"></a><br><br><font color="red" font="5px">Current Rankings</font><img src="images/questionsmall.png" onmouseover="Tip('Current Rankings<br>(rank)(name)(total)(moves)', BGCOLOR, '#FFCC00', WIDTH, -300, OPACITY, 95, SHADOW, true, SHADOWWIDTH, 7)" onmouseout="UnTip()"></a><br><br><font color="red">1530</font> of 1600 (96 %)<br><br>1 <font color="red">iannis5</font> <font color="red">795</font> <font color="black">292</font><br><img src="boxes/0000CD.gif" width="16" height="16" ><br>2 <font color="black">5</font> <font color="red">688</font> <font color="black">0</font><br><img src="boxes/990000.gif" width="16" height="16" ><br>3 <font color="darkred">yellowfestiva5</font> <font color="red">47</font> <font color="black">6</font><br><img src="boxes/FFDAB9.gif" width="16" height="16" ><br>
</div>

It's ugly I know.

Was it helpful?

Solution

The question HTML looks suspiciously malformed and incomplete. What is the containing node for all that?

Anyway, for extracting info from poor HTML, you can use blunt-force regex for a quick and dirty solution:

var moves       = 0;

var movesMatch  = document.body.textContent.match (/Moves:\s*(\d+)(?:\D)/);
if (movesMatch  &&  movesMatch.length > 1) {
    moves       = parseInt (movesMatch[1], 10);
}
console.log ("The number of moves left is: ", moves);

And that may work in this case, but it's brittle (likely to "find" the wrong information) for all but the simplest pages.


The best process is to narrow down the text as much as possible with DOM techniques:

  1. Identify unique and durable nodes, if possible, that ideally contain the desired information or are near it in a stable way.

    Look for id attributes (best), or class names (good), or attributes (can be okay). You want to get a good "CSS path" to the desired information. This can be fed to querySelector or jQuery. Note that Firebug will give you a raw CSS path, which you can use as a start.

    For example, for HTML like this:

    <div id="dress-sizes">
        <ul>
            <li>
                <span class="dSize" data-color="green">13</span>
            </li>
            <li>
                <span class="dSize" data-color="green">8</span>
            </li>
        </ul>
    </div>
    

    a good selector to find the size of the green dress would be:

    "#dress-sizes ul li span.dSize[data-color='green']"
    
  2. Failing to find a good CSS path, you may have to fall back on the XPath (which firebug or Chrome will give you). But I've only had to that one time

  3. Once you've found a good way to select the exact node (ideal), or the parent node, or a reliable sibling node; You will have much less (or no) extra cruft to filter with RegEx. This reduces the likelihood of false hits.


In this case, the only unique-ish node given is the logout link <a href="logout.php">. This looks to be durable. That is, it's unlikely to change much when the site gets modified. But there may be more than one logout link.

So keying off that node, this is the best we can do with the HTML given so far:

var anchorNode  = document.querySelector ("a[href='logout.php']");
var siblingText = anchorNode.parentNode.textContent;
var moves       = 0;

var movesMatch  = siblingText.match (/Moves:\s*(\d+)(?:\D)/);
if (movesMatch  &&  movesMatch.length > 1) {
    moves       = parseInt (movesMatch[1], 10);
}
console.log ("The number of moves left is: ", moves);


Update: Now that the container is known, and it nicely has an id, use:

var containerNode   = document.querySelector ("#info");
var siblingText     = containerNode.textContent;
var moves           = 0;

var movesMatch      = siblingText.match (/Moves:\s*(\d+)(?:\D)/);
if (movesMatch  &&  movesMatch.length > 1) {
    moves           = parseInt (movesMatch[1], 10);
}
console.log ("The number of moves left is: ", moves);
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top