How to extract latest articles from DBLP

Question 1

For parse xml there are XML Parser, XMLReader and SimpleXML. XML Parser and XMLReader are used for big file, SimpleXML - for small files (<1Mb).

function startElement($parser, $tag, $attrs) {
    global $articles, $isArticle, $i, $globTag;
    $globTag = $tag;
    if ($tag == 'article') {
        $isArticle = true;
        if (isset ( $attrs ['mdate'] )) {
            // add date from attribute in article
            $articles [$i] ['mdate'] = $attrs ['mdate'];
        }
    }
}
function endElement($parser, $tag) {
    global $articles, $isArticle, $i, $globTag;
    if ($tag == 'article') {
        $isArticle = false;
        ++ $i;
    }
}
function getElement($parser, $data) {
    global $articles, $isArticle, $i, $globTag;
    if ($isArticle) {
        $articles [$i] = $articles [$i] + [ 
                $globTag => $data 
        ];
    }
}
global $articles, $isArticle, $i, $globTag;
$articles = [ ];
$i = 0;
$isArticle = false;
$url = 'http://dblp.uni-trier.de/rec/bibtex/';
$key = 'journals/acta/BayerM72';
$url .= $key;
$parser = xml_parser_create ();

xml_set_element_handler ( $parser, "startElement", "endElement" );
xml_set_character_data_handler ( $parser, 'getElement' );
xml_parser_set_option ( $parser, XML_OPTION_CASE_FOLDING, false );

$file = fopen ( $url, 'rb' );
if ($file === false) {
    die ( "File isnt!!" );
}
$clasterSize = 8192;

while ( $data = fread ( $file, $clasterSize ) ) {
    if (! xml_parse ( $parser, $data, feof ( $file ) )) {
        die ( sprintf ( "XML error: %s at line %d", xml_error_string ( xml_get_error_code ( $parser ) ), xml_get_current_line_number ( $parser ) ) );
    }
}
xml_parser_free ( $parser );
fclose ( $file );

This is example in XML Parser.

<?php
    $url = 'http://dblp.uni-trier.de/rec/bibtex/';
    $key = 'journals/acta/BayerM72';
    $content = file_get_contents($url . $key);
$xml = new SimpleXMLElement($content);

/* Search for <dblp><article> */
$result = $xml->xpath('/dblp/article');
// $result is  an array of SimpleXMLElement objects
   var_dump($result);
    ?>

There is SimpleXML example. You get an array of SimpleXMLElement objects in result. Look a manual to get SimpleXMLElement attributes SimpleXMLElement->attributes();.

Question 2

If the API doesn't offer a clean way to get updates, you'll have to cache the document and extract the articles from the changes.

DBLP2RSS, a project that creates RSS feeds from DBLP does this with a shell script:

#!/bin/sh

id="$1"
name="$2"
cache="$3"

test -d "$cache" || exit 1

curlit() {
    in-dcs && curl --proxy wwwcache.dcs.gla.ac.uk:8080 "$@" || curl --proxy "" "$@"
}

prefix="http://dblp.uni-trier.de/rec/bibtex"

echo "<dblp-content name=\"$1\">"
curlit "http://www.informatik.uni-trier.de/~ley/db/$id/index.html" 2> /dev/null | tidy -n -asxml 2> /dev/null | xml sel -N html=http://www.w3.org/1999/xhtml -t  -m '//html:a' -v '@href' -n | grep "^$name" | while read path; do 
    # Should cache here
    cachefile="$cache/$id/$path"

    if ! test -f "$cachefile"; then
    mkdir -p "$(dirname $cachefile)"
    curlit "http://www.informatik.uni-trier.de/~ley/db/$id/$path" 2> /dev/null | tidy -n -asxml 2> /dev/null >  $cachefile
    echo "Got $cachefile"

    fi
    cat "$cachefile" | xml sel -N html=http://www.w3.org/1999/xhtml -t  -m '//html:a' -v '@href' -n | egrep '^'"$prefix"'.*\.xml$' | sed -e 's#^'"$prefix"/'#<dblpkey>#' -e 's/\.xml$/<\/dblpkey>/'

done 

echo "</dblp-content>"

This doesn't get the articles, but you could take the same approach.