Question

Here is a example of the data I'm trying to parse:

//lots of html source code
['unimportant','example data']
$(function() {

 var graph = new lineChart('chart-im-looking-for',
 {
  'width' : 1100, 'height' : 600,
  'font' : 'Arial',
  //more javascript code here
  'groups':
    [
    {
    //more javascript code here
    'values' : [
        {'x':1386374400, 'y':49.82, 'hover':['12/7',49.82], 'xlabel':'12/7'},
        {'x':1386460800, 'y':47.67, 'hover':['12/8',47.67], 'xlabel':'12/8'},
        {'x':1386547200, 'y':45.43, 'hover':['12/9',45.43], 'xlabel':'12/9'},
        {'x':1386633600, 'y':47, 'hover':['12/10',47], 'xlabel':'12/10'},
        {'x':1386720000, 'y':46.81, 'hover':['12/11',46.81], 'xlabel':'12/11'}
    ]
    }
    ]
  });
['unimportant','example data']
//lots of html source code

I need to get the data from the 'hover' arrays in two groups. Note that there are more charts in the same format before and after the chart whose data I need to get, so I need to identify the right one by its id ('chart-im-looking-for', in this example).

I tried the following regex:

(?<=lineChart\('chart-im-looking-for'.*?'values'.*?)\['(.*?)',(.*?)]

If I use it with the 'dot matches newline' option, it captures all the data in the values array. However, I don't know how to limit the matching to the values array only, so it captures the 'unimportant example data' after the end of the array. I'm testing with RegexBuddy, and I'll be using .net regex. Any help, please?

Edit: I'd rather avoid making any assumptions about the formatting of the document in the regex, such as the position of whitespaces (including line breaks).

Was it helpful?

Solution

Use this regex to grab a specific chart:

new lineChart\(['"]([^"']+)["'],\s*(.+?})\s*\)\s*;

With the added option: RegexOptions.Singleline, this will match until the first occurence of });, which is as robust as I can come up with on the spot.

The first capture group of this regex will give you the chart name. The second capture group will provide you with a complete, most likely parsable JSON string. Put it into a JSON parser and access 'values' > 'hover' at will.

See it live (ruby, but should work for .net aswell)

OTHER TIPS

You have to change the pattern slightly to 1) only match (hover) arrays inside the 'values' array, and 2) to only match those in the correct values array. This is my attempt:

(?<=lineChart\('chart-im-looking-for'[^\$]*?'values'\s+:\s+\[([^\[]|\[[^\[]*\])*)\['(.*?)',(.*?)]
                                     ^^^^^<-2    2->^^^^^^/\^^^^^^^^<-1

This regex uses some assumptions about the structure of the document, e.g. that there is no $ sign between 'chart-im-looking-for' and its 'values', and that 'values' is followed by <whitespace>:<whitespace>[

edit: it should match all hover arrays now.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top