Parsing information from content for database

Question 1

Here is a DOM way:

$results = array();

$fields = array('name', 'img', 'url', 'blurb');

$queries = array('name'  => '//img/@alt',
                 'img'   => '//img[@class = "picture"]/@style |
                             //img/@src |
                             //div[@class = "picture"]/@style',
                 'url'   => '//div[@class = "blurb"]//a/@href',
                 'blurb' => '//div[@class = "blurb"]');

$imgPattern = <<<'EOD'
~
(?|
    .*? background-image:url\( [^)]*? ([^?="\')/]+ \.(?:png|jpe?g|gif) ).*
  | 
    .*? ([^=;/]+)$
)
~ix
EOD;

foreach ($data as $html) {
    $srcDom = new DOMDocument();
    @$srcDom->loadHTML($html);

    $elts = $srcDom->getElementsbyTagName("body")->item(0)->childNodes;

    $tmp['other'] = '';
    foreach ($elts as $elt) {
        if ( $elt->nodeType === XML_ELEMENT_NODE &&
             $elt->hasAttribute('class') &&
             $elt->getAttribute('class') == 'bottom-block' )
            $bbnode = $elt;
        else
            $tmp['other'] .= $srcDom->saveHTML($elt);
    }
    echo htmlspecialchars(print_r($other, true));
    if ( $bbnode ):
        $bbDom = new DOMDocument();
        $bbDom->appendChild($bbDom->importNode($bbnode, true));

        $xpath = new DOMXPath($bbDom);

        foreach($fields as $field) {
            $$field = $xpath->query($queries[$field]);

            if ( $field == 'blurb' ):
                $tmp[$field] = '';
                foreach ($$field->item(0)->childNodes as $child) {
                    $tmp[$field] .= $bbDom->saveHTML($child);
                }
            else:
                $tmp[$field] = ($$field->length) ? $$field->item(0)->nodeValue : '';
            endif;
        }
        $tmp['img'] = preg_replace($imgPattern, '$1', $tmp['img']);
    endif;
    $results[] = $tmp;
}

echo htmlspecialchars(print_r($results, true));

Question 2

Okay, so I do not know how one would go about doing this with an SQL query, but here's how I would do it with PHP. The basic premise is to use five separate matching queries and then print them out. The matching queries are as follows:

Bottom Block Contents
Images
URLS
Blurbs
Names

Here is some code to demonstrate.

// GET THE BOTTOM BLOCK CONTENT
preg_match('~(?<=<div class="bottom-block">).*?(?=</div>$)~ims', $mysql_row, $bottom_block_array);
$string = $bottom_block_array[0];

// GRAB THE IMAGES
preg_match_all('~[A-Z0-9_]+\.(?:jpg|jpeg|gif|png)(?=\'|")~i', $string, $images);
$images = $images[0];

// GRAB THE URLS
preg_match_all('~(?<=href=").*?(?=")~ims', $string, $urls);
$urls = $urls[0];

// GRAB THE BLURBS
preg_match_all('~(?<=<div class="blurb">).*?(?=</div>)~ims', $string, $blurbs);
$blurbs = $blurbs[0];

// GRAB THE NAMES
preg_match_all('~(?<=alt=").*?(?=")~ims', $string, $names);
$names = $names[0];



// LOOP THROUGH AND PRINT OUT ALL OF THE NAMES (OR ONLY ONE, IF DESIRED)
if ($names) {
    foreach ($names AS $name) {print "\nName: ".$name;} // USE THIS IF YOU WANT ALL OF THE NAMES
    // print "\nName: ".$names[0]; // USE THIS IF YOU ONLY WANT ONE POSSIBLE NAME TO SHOW UP
}
else {print "\nName:";}


if ($urls) {
    foreach ($urls AS $url) {print "\nUrl: ".$url;} // PRINT OUT ALL URLS
    // print "\nUrl: ".$urls[0]; // PRINT OUT ONLY ONE URL    
}
else {print "\nUrl:";}


if ($images) {
    foreach ($images AS $image) {print "\nImageName: ".$image;} // PRINT OUT ALL THE IMAGES
    // print "\nImageName: ".$images[0]; // PRINT OUT ONLY ONE IMAGE
}
else {print "\nImageName:";}


if ($blurbs) {
    foreach ($blurbs AS $blurb) {print "\nBlurb: ".$blurb;} // PRINT OUT ALL OF THE BLURBS
    // print "\nBlurb: ".$blurbs[0]; // PRINT OUT ONLY ONE BLURB
}
else {print "\nBlurb:";}


print "\n\n\n\n\n";

Here is a working demo