Question

Hello I would like to create a page html and php that is able to take the data in the table contained at this link: http://www.comuni-italiani.it/province.html

I would love to have any tips, I would use the file_get_content but then I do not know how to take all the various data

Was it helpful?

Solution

Can you explain us more clearly what you want to exactly take from this page?

Anyway, to do the trick, you can use file_get_contents to fetch the page then, according to what you want to take from the page (I suppose you want to take every <td> element from the page inside a table), you may use PHP regular expressions (preg_match, preg_match_all) to fetch all the data you need.

Example for your case:

$page = file_get_contents("http://www.comuni-italiani.it/province.html");

$output = array();
preg_match_all('/<td.*.<\/td>/',$page,$output);

print_r($output);

This will output:

Array ( [0] => Array ( [0] =>    [1] => [2] => Agrigento [3] => Alessandria [4] => Ancona [5] => Aosta [6] => Arezzo [7] => Ascoli Piceno [8] => Asti [9] => Avellino [10] => Bari [11] => Barletta-Andria-Trani [12] => Belluno [13] => Benevento [14] => Bergamo [15] => Biella [16] => Bologna [17] => Bolzano [18] => Brescia [19] => Brindisi [20] => Cagliari [21] => Caltanissetta [22] => Campobasso [23] => Carbonia-Iglesias [24] => Caserta [25] => Catania [26] => Catanzaro [27] => Chieti [28] => Como [29] => Cosenza [30] => Cremona [31] => Crotone [32] => Cuneo [33] => Enna [34] => Fermo [35] => Ferrara [36] => Firenze [37] => Foggia [38] => Forlì-Cesena [39] => Frosinone [40] => Genova [41] => Gorizia [42] => Grosseto [43] => Imperia [44] => Isernia [45] => La Spezia [46] => L'Aquila [47] => Latina [48] => Lecce [49] => Lecco [50] => Livorno [51] => Lodi [52] => Lucca [53] => Macerata [54] => Mantova [55] => Massa-Carrara [56] => Matera [57] => Messina [58] => Milano [59] => Modena [60] => Monza e della Brianza [61] => Napoli [62] => Novara [63] => Nuoro [64] => Olbia-Tempio [65] => Oristano [66] => Padova [67] => Palermo [68] => Parma [69] => Pavia [70] => Perugia [71] => Pesaro e Urbino [72] => Pescara [73] => Piacenza [74] => Pisa [75] => Pistoia [76] => Pordenone [77] => Potenza [78] => Prato [79] => Ragusa [80] => Ravenna [81] => Reggio Calabria [82] => Reggio Emilia [83] => Rieti [84] => Rimini [85] => Roma [86] => Rovigo [87] => Salerno [88] => Medio Campidano [89] => Sassari [90] => Savona [91] => Siena [92] => Siracusa [93] => Sondrio [94] => Taranto [95] => Teramo [96] => Terni [97] => Torino [98] => Ogliastra [99] => Trapani [100] => Trento [101] => Treviso [102] => Trieste [103] => Udine [104] => Varese [105] => Venezia [106] => Verbano-Cusio-Ossola [107] => Vercelli [108] => Verona [109] => Vibo Valentia [110] => Vicenza [111] => Viterbo [112] => CercaNel Sito e sul WebPagine UtiliElenco Province per PopolazionePrincipali Città ItalianeLista Alfabetica RegioniAmministrazioni LocaliScuole in Italia [113] =>   ) )

which can, of course, be filtered.

In your case, for example, by adding a little foreach loop... :

$page = file_get_contents("http://www.comuni-italiani.it/province.html");

    $output = array();
    preg_match_all('/<td.*.<\/td>/',$page,$output);

    $provinces = array();

    foreach ($output as $id => $list) {
        for ($i = 2; $i <= 111; $i++) {
            array_push($provinces,$list[$i]);
        }
    }

    print_r($provinces);

Will give you this:

Array ( [0] => Agrigento [1] => Alessandria [2] => Ancona [3] => Aosta [4] => Arezzo [5] => Ascoli Piceno [6] => Asti [7] => Avellino [8] => Bari [9] => Barletta-Andria-Trani [10] => Belluno [11] => Benevento [12] => Bergamo [13] => Biella [14] => Bologna [15] => Bolzano [16] => Brescia [17] => Brindisi [18] => Cagliari [19] => Caltanissetta [20] => Campobasso [21] => Carbonia-Iglesias [22] => Caserta [23] => Catania [24] => Catanzaro [25] => Chieti [26] => Como [27] => Cosenza [28] => Cremona [29] => Crotone [30] => Cuneo [31] => Enna [32] => Fermo [33] => Ferrara [34] => Firenze [35] => Foggia [36] => Forlì-Cesena [37] => Frosinone [38] => Genova [39] => Gorizia [40] => Grosseto [41] => Imperia [42] => Isernia [43] => La Spezia [44] => L'Aquila [45] => Latina [46] => Lecce [47] => Lecco [48] => Livorno [49] => Lodi [50] => Lucca [51] => Macerata [52] => Mantova [53] => Massa-Carrara [54] => Matera [55] => Messina [56] => Milano [57] => Modena [58] => Monza e della Brianza [59] => Napoli [60] => Novara [61] => Nuoro [62] => Olbia-Tempio [63] => Oristano [64] => Padova [65] => Palermo [66] => Parma [67] => Pavia [68] => Perugia [69] => Pesaro e Urbino [70] => Pescara [71] => Piacenza [72] => Pisa [73] => Pistoia [74] => Pordenone [75] => Potenza [76] => Prato [77] => Ragusa [78] => Ravenna [79] => Reggio Calabria [80] => Reggio Emilia [81] => Rieti [82] => Rimini [83] => Roma [84] => Rovigo [85] => Salerno [86] => Medio Campidano [87] => Sassari [88] => Savona [89] => Siena [90] => Siracusa [91] => Sondrio [92] => Taranto [93] => Teramo [94] => Terni [95] => Torino [96] => Ogliastra [97] => Trapani [98] => Trento [99] => Treviso [100] => Trieste [101] => Udine [102] => Varese [103] => Venezia [104] => Verbano-Cusio-Ossola [105] => Vercelli [106] => Verona [107] => Vibo Valentia [108] => Vicenza [109] => Viterbo )

(Sorry for the huge arrays).

It is, however, keeping the links inside the array so, if you want to take the values only and NOT the anchor associated to it, just feel free to use another regular expression.

Hope this helps.

(take this as an example, keep in mind that this foreach trick may not work anymore if the page gets changed, I posted it just to give you an idea on how you may have solved that case).

OTHER TIPS

Try to look more into DOMDocument reference : http://php.net/manual/en/class.domdocument.php

Also Those questions might be helpful for you:

Getting DOM elements by classname

PHP Parse HTML code

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top