Création d'un fichier CSV à partir d'une page HTML

https://stackoverflow.com/questions/9320715

26-10-2019
|

Question

J'ai extrait des enregistrements d'une base de données et les a stockés sur une page HTML avec du texte. Chaque enregistrement est stocké dans un champ de paragraphe <p> et séparés par un saut de ligne <br /> et une ligne <hr>. Par exemple:

Company Name<br/>
555-555-555<br />
Address Line 1<br />
Address Line 2<br />
Website: www.example.com<br />

J'ai juste besoin de placer ces enregistrements dans un fichier CSV. Je fputcsv en combinaison avec array () et file_get_contents (), mais il a lu mon tout le code source de la page Web dans un fichier .csv et beaucoup de données étaient manquantes aussi bien. Ceux-ci sont multiples enregistrements stockés dans le même format. Ainsi, après un bloc d'enregistrement complet comme on le voit ci-dessus, il est séparé par une balise de ligne de <hr>. Je veux lire le nom de l'entreprise dans la colonne Nom, numéro de téléphone dans la colonne de téléphone, les adresses dans la colonne d'adresses et le site dans la colonne du site comme indiqué ci-dessous.

http://i.stack.imgur.com/00Gxw.png
Comment puis-je faire?

Snippet du HTML:

            1 Stop Signs<br />
            480-961-7446<br />
500 N. 56th Street<br />
        Chandler, AZ  85226<br />

<br />
                Website: www.1stopsigns.com<br />
            <br />
            </p><br /><hr><br />

Il est espacé comme celui-ci dans la source du code HTML.

La solution

Assuming the html that shown above is well formed,my approach to this problem must be in 2 phases. First. Clear a little bit the html text to be more efficient to export or manage the information. Here try to clear the items you want to save and delete those you know you don't want to require in the near future.

$html = preg_replace("|\s{2,}|si"," ",$html); // clear non neccesary spaces
$html = preg_replace("|\n{2,}|si","\n",$html); // convert more return line to only one
$html = preg_replace("|<br />|si","##",$html); // replace those tags with this one

Then you'll have a more clean html to work with similar to this....

1 Stop Signs##
480-961-7446##
500 N. 56th Street##
Chandler, AZ  85226##
Website: www.1stopsigns.com##
##
</p>##<hr>##

Second. Now you can explode the fields or make an implode into a comma separate value to form a csv

// here you'll have the fields to work with into the array called $csv_parts
$csv_parts = explode("##",$html);

// imploding, so there you have the formatted csv similar to 1 Stop Signs,480-961-7446,..
$csv = implode(",",$csv_parts);

Now you'll have a two ways to work with the html for extracting the fields or exporting the csv.

Hope this helps or give you an idea to develop what you need.

Autres conseils

Assuming that your data follows a pattern where every record is separated by a <hr> tag and every field within is separated by a <br /> then you should be able to split out the data.

There are loads of ways to do this, but a naive way that might work using explode() might be something like:

// open a file pointer to csv
$fp = fopen('records.csv', 'w');

// first, split each record into a separate array element
$records = explode('<hr>', $str);

// then iterate over this array
foreach ($records as $record) {

    // strip tags and trim enclosing whitespace
    $stripped = trim(strip_tags($record));

    // explode by end-of-line
    $fields = explode(PHP_EOL, $stripped);

    // array walk over each field and trim whitespace
    array_walk($fields, function(&$field) {
        $field = trim($field);
    });

    // create row
    $row = array(
        $fields[0], // name
        $fields[1], // phone
        sprintf('%s, %s', $fields[2], $fields[3]), // address
        $fields[6], // web
    );

    // write cleaned array of fields to csv
    fputcsv($fp, $row);
}

// done
fclose($fp);

Where $str is the page data you are parsing. Hope this helps.

EDIT

Didn't notice the specific field requirements originally. Updated the example.

By far the easiest way would be to simply take the block, drop everything from the <hr> tag forward then split the string as a string array on the <br /> tags.

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow