Question

I have a CSV file as follows

***Client Name: abc***,
,
# ----------------------------------------,
# Twitter : Mentions - Count,
# ----------------------------------------,
Date/Time (GMT),abc
6/6/2013,1
6/11/2013,3
6/12/2013,2
6/13/2013,1
6/14/2013,2
6/15/2013,4
6/17/2013,4
6/18/2013,8
6/19/2013,7
# *** Interval: Daily ***,
,
***Client Name: abc***,
,
# ----------------------------------------,
# Facebook Insights : Likes by Source,
# ----------------------------------------,
Sources,Likes
Mobile,3602
Page Profile,470
Recommended Pages,86
Ads,64
Like Story,49
Mobile Sponsored Page You May Like,44
Page Browser,33
Search,22
Timeline,16
Mobile Page Suggestions On Liking,15
3 more sources,48
,
***Client Name: xyz***,
,
# ----------------------------------------,
# Twitter : Mentions - Count,
# ----------------------------------------,
Date/Time (GMT),xyz
6/12/2013,1
# *** Interval: Daily ***,
,
***Client Name: pqr***,
,
# ----------------------------------------,
# Twitter : Mentions - Count,
# ----------------------------------------,
Date/Time (GMT),pqr
6/6/2013,2
6/7/2013,3
6/9/2013,6
6/10/2013,1
6/12/2013,4
6/13/2013,1
6/14/2013,9
6/15/2013,5
6/16/2013,1
6/18/2013,2
6/19/2013,2
# *** Interval: Daily ***,

out of which I want to extract Twitter : Mentions - Count data and save everything in database.

I want content between

# ----------------------------------------,
# Twitter : Mentions - Count,
# ----------------------------------------,

and

 # *** Interval: Daily ***,

How can I match this pattern in PHP is there any php class which match pattern from file or how can I do this with regax.

I don't have any idea about pattern matching I have just tried to read csv file using fgetcsv() as

 $file = fopen($uploaded_file_path, 'r');
            echo "<pre>";
            while (($line = fgetcsv($file)) !== FALSE) {
              print_r($line);
            }
            fclose($file);
Was it helpful?

Solution

Description

This regex will find each section header Twitter Mentions - Count and capture the section body into group 1.

^\#\sTwitter\s:\sMentions\s-\sCount,[\s\r\n]+    # match the header
^\#\s----------------------------------------,[\s\r\n]+   # match the separator line
(^(?:(?!\#\s\*\*\*\sInterval:\sDaily\s\*\*\*,).)*)    # match the rest of the string upto the first Interval Daily

enter image description here

Expanded

  • This first section simple finds the start of each block, it's a lot of characters but is largely straight forward.

    • ^ match the start of a line, requires the of the multiline option which is usually m
    • \#\sTwitter\s:\sMentions\s-\sCount, match this exact string, note the \s will match a space character, I do this because I like to use the ignore white space option which is usually x
    • [\s\r\n]+ match one or more space or new line character.
    • ^\#\s----------------------------------------,[\s\r\n]+ This matches the characters in the separator line from the start of the line ^ to the new line character at the end
  • This section captures the body of the section, and is where the real magic happens.

    • ( Start the capture group 1
    • ^ ensure we match the start of the line, This ensures the next lookahead validates properly
    • (?: start non capture group. The construction of this non-capture group is self terminating when it encounters the undesirable string inside the negative lookahead. This will end up capturing every character between the section title above and the finish string.
    • (?! start negative lookahead, this will validate we do not travel into the undesirable close text which marks the finish of the section.
    • \#\s\*\*\*\sInterval:\sDaily\s\*\*\*, match the undesirable text. If this is found, then the negative lookahead will fail
    • ) close the negative look ahead
    • . match any character, this is expecting the "dot matches new line" option usually s.
    • ) close the non capture group
    • * allow the non capture group to repeat zero or more times.
    • ) close capture group 1. Since all that happened inside this capture group every matched . will be stored here.

PHP Example

Live Example: http://www.rubular.com/r/stgaiBeSE1

Sample Text

***Client Name: abc***,
,
# ----------------------------------------,
# Twitter : Mentions - Count,
# ----------------------------------------,
Date/Time (GMT),abc
6/6/2013,1
6/11/2013,3
6/12/2013,2
6/13/2013,1
6/14/2013,2
6/15/2013,4
6/17/2013,4
6/18/2013,8
6/19/2013,7
# *** Interval: Daily ***,
,
***Client Name: abc***,
,
# ----------------------------------------,
# Facebook Insights : Likes by Source,
# ----------------------------------------,
Sources,Likes
Mobile,3602
Page Profile,470
Recommended Pages,86
Ads,64
Like Story,49
Mobile Sponsored Page You May Like,44
Page Browser,33
Search,22
Timeline,16
Mobile Page Suggestions On Liking,15
3 more sources,48
,
***Client Name: xyz***,
,
# ----------------------------------------,
# Twitter : Mentions - Count,
# ----------------------------------------,
Date/Time (GMT),xyz
6/12/2013,1
# *** Interval: Daily ***,
,
***Client Name: pqr***,
,
# ----------------------------------------,
# Twitter : Mentions - Count,
# ----------------------------------------,
Date/Time (GMT),pqr
6/6/2013,2
6/7/2013,3
6/9/2013,6
6/10/2013,1
6/12/2013,4
6/13/2013,1
6/14/2013,9
6/15/2013,5
6/16/2013,1
6/18/2013,2
6/19/2013,2
# *** Interval: Daily ***,

Code

<?php
$sourcestring="your source string";
preg_match_all('/^\#\sTwitter\s:\sMentions\s-\sCount,[\s\r\n]+
^\#\s----------------------------------------,[\s\r\n]+
(^(?:(?!\#\s\*\*\*\sInterval:\sDaily\s\*\*\*,).)*)/imsx',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
?>

Matches from Capture Group 1

[0] => Date/Time (GMT),abc
    6/6/2013,1
    6/11/2013,3
    6/12/2013,2
    6/13/2013,1
    6/14/2013,2
    6/15/2013,4
    6/17/2013,4
    6/18/2013,8
    6/19/2013,7

[1] => Date/Time (GMT),xyz
    6/12/2013,1

[2] => Date/Time (GMT),pqr
    6/6/2013,2
    6/7/2013,3
    6/9/2013,6
    6/10/2013,1
    6/12/2013,4
    6/13/2013,1
    6/14/2013,9
    6/15/2013,5
    6/16/2013,1
    6/18/2013,2
    6/19/2013,2

            )

OTHER TIPS

try this

public static function csv_to_array($filename='', $delimiter=',')
 { 
    if(!file_exists($filename) || !is_readable($filename))
        return FALSE;

    $header = NULL;
    $data = array();
    if (($handle = fopen($filename, 'r')) !== FALSE)
    {
        while (($row = fgetcsv($handle, 1000, $delimiter)) !== FALSE)
        {
                $data[] = $row;
        }
        fclose($handle);
    }
    return $data;
 }
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top