문제

I have tons of HTML files saved from a website, with data in tables in a specific format. How can I retrieve data from those files and import them into Excel or write them in a CSV file?

The files are stored on HDD.

도움이 되었습니까?

해결책

You need to get these files, and parse them. So, you can write the results in a CSV file. Give more information to a better answer. Have you FTP access on server where these files are stored?

EDIT:

Use the PHP to iterate on directory and find the .html files(or any you need), and store the results on a variable. After, try a foreach() on variable, open each file and so parse it with some library, like php html parser. So, store the parser results on CSV.

다른 팁

Assuming you have 20000 files, and their names follow convention like file1.html, file2.html etc..

And html is

    <a class = "export" id = "export" href = "#" > Export </a>

Here is JS, This was written based on assumptions.

    // with the help of http://jsfiddle.net/terryyounghk/KPEGU/
    function exportTableToCSV($table, filename) {
        var $rows = $table.find('tr:has(td)'),
        // Temporary delimiter characters unlikely to be typed by keyboard
        // This is to avoid accidentally splitting the actual contents
        tmpColDelim = String.fromCharCode(11), // vertical tab character
        tmpRowDelim = String.fromCharCode(0), // null character

        // actual delimiter characters for CSV format
        colDelim = '","',
        rowDelim = '"\r\n"',

        // Grab text from table into CSV formatted string
        csv = '"' + $rows.map(function (i, row) {
            var $row = $(row),
                $cols = $row.find('td');

            return $cols.map(function (j, col) {
                var $col = $(col),
                    text = $col.text();

                return text.replace('"', '""'); // escape double quotes

            }).get().join(tmpColDelim);

        }).get().join(tmpRowDelim)
            .split(tmpRowDelim).join(rowDelim)
            .split(tmpColDelim).join(colDelim) + '"',

        // Data URI
        csvData = 'data:application/csv;charset=utf-8,' + encodeURIComponent(csv);

        $(this).attr({
            'download': filename,
            'href': csvData,
            'target': '_blank'
        });
    }
            // #http://www.2ality.com/2013/11/initializing-arrays.html
    function fillArrayWithNumbers(n) {
        var arr = Array.apply(null, Array(n));
        return arr.map(function (x, i) {
            return i
        });
    }

    // This must be a hyperlink
    $(".export").on('click', function (event) {
        // CSV
        var that = this;
        var data = fillArrayWithNumbers(20000)
        // Async js is a JS library
        async.each(data, function (i, cb) {
            $.get(["./htmlFiles/file", i, ".html"].join('')).done(function (html) {
                var tables = $(html).find('table');
                $.each(tables, function () {
                    var table = $(this);
                    // Writing to individual csv file. If all the data structures are same you can merge all strings and download one.
                    // IF CSV, don't do event.preventDefault() or return false
                    // We actually need this to be a typical hyperlink
                    exportTableToCSV.apply(that, [table, 'export.csv']);
                    cb();
                });
            }).fail(function () {
                cb();
            })
        });
    });
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top