Question

I'm trying to find a way to delete all BUT the repeated rows in a file, I know it's possible to delete the duplicates on a file using something like this:

$worksheet = $objPHPExcel->getActiveSheet();
foreach ($worksheet->getRowIterator() as $row) {
    $rowIndex = $row->getRowIndex();
    foreach ($worksheet->getRowIterator($rowIndex + 1) as $testRow) {
        if ($testRow == $row) {
            $worksheet->removeRow($rowIndex);
        }
    }
}

However I need to do pretty much the opposite, delete all the rows that doesn't have any duplicates in the file.

Why do I need this: The duplicate rows are the intersection between M x M sets with data. It's way easier to do it directly on the file that having to considerate the M x M possible results.

Was it helpful?

Solution

this function removes unique values from chosen column:

function removeUniqueRows($objPHPExcel, $column = 'A') {
    $worksheet = $objPHPExcel->getActiveSheet();

    $cells = array();

    foreach ($worksheet->getRowIterator() as $row) {
        $rowIndex = $row->getRowIndex();
        $cellValue = $worksheet->getCell($column.$rowIndex)->getValue();
        array_push($cells, $cellValue);       
    }

    $toRemove = array_keys(array_diff($cells, array_diff_assoc($cells, array_unique($cells))));

    for ($i = count($toRemove)-1; $i > -1; $i--) {
        $worksheet->removeRow($toRemove[$i]+1);
    }

    return $objPHPExcel;
}

If you need to remove rows basing on values in many columns, you can easily modify this code to concatenate cell values into one or maybe work directly on whole rows (I don't know if this is possible in PHPExcel, I didn't try it).

UPDATE:

Here you have two functions. One is for removing duplicated rows (leaving only one for each set of duplicates) and second one is for removing all unique rows (completely). Both these functions can take as argument snigle column leter or an array of column letters and use it to filter rows.

First function:

function removeDuplicateRows($objPHPExcel, $columns = 'A', $separator = '|') {
    $worksheet = $objPHPExcel->getActiveSheet();

    $cells = array();

    foreach ($worksheet->getRowIterator() as $row) {
        $rowIndex = $row->getRowIndex();
        if (is_array($columns)) {
            $cellValue = '';
            foreach ($columns as $column) {
                $cellValue .= $worksheet->getCell($column.$rowIndex)->getValue() . $separator;
            }
        } else {
            $cellValue = $worksheet->getCell($columns.$rowIndex)->getValue();
        }
        array_push($cells, $cellValue);       
    }

    $toRemove = array_keys(array_diff_assoc($cells, array_unique($cells)));

    for ($i = count($toRemove)-1; $i > -1; $i--) {
        $worksheet->removeRow($toRemove[$i]+1);
    }

    return $objPHPExcel;
}

Second function:

function removeUniqueRows($objPHPExcel, $columns = 'A', $separator = '|') {
    $worksheet = $objPHPExcel->getActiveSheet();

    $cells = array();

    foreach ($worksheet->getRowIterator() as $row) {
        $rowIndex = $row->getRowIndex();
        if (is_array($columns)) {
            $cellValue = '';
            foreach ($columns as $column) {
                $cellValue .= $worksheet->getCell($column.$rowIndex)->getValue() . $separator;
            }
        } else {
            $cellValue = $worksheet->getCell($columns.$rowIndex)->getValue();
        }
        array_push($cells, $cellValue);       
    }

    $toRemove = array_keys(array_diff($cells, array_diff_assoc($cells, array_unique($cells))));

    for ($i = count($toRemove)-1; $i > -1; $i--) {
        $worksheet->removeRow($toRemove[$i]+1);
    }

    return $objPHPExcel;
}

If I correctly understood what you need, then you have to do something similar to:

$columns = array('A', 'B', 'G', 'H');
removeDuplicateRows(removeUniqueRows($objPHPExcel, $columns), $columns);
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top