Question

I am trying to use PSQL, specifically AWS Redshift to parse a line. Sample data follows

{"c.1.mcc":"250","appId":"sx-calllog","b.level":59,"c.1.mnc":"01"}
{"appId":"sx-voice-call","b.level":76,"foreground":9}

I am trying the following regex in order to to extract the appId field, but my query is returning empty fields.

'appId\":\"[\w*]\",'

Query

SELECT app_params,
   regexp_substr(app_params, 'appId\":\"[\w*]\",')
FROM sample;
Was it helpful?

Solution

You can do that as follows:

(\"appId\":\"[^"]*\")(?:,)

Demo: http://regex101.com/r/xP0hW3

The first extracted group is what you want.
Your regex was not matching because \w does not match -

OTHER TIPS

Adding this here despite this being an old question since it may help someone viewing this down the road...

If your lines of data are valid json, you can use Redshift's JSON_EXTRACT_PATH_TEXT function to extract the value a given key. Emphasis on the json being valid, as it will fail if even one line cannot be parsed and Redshift will throw a JSON parsing error.

Example using given data:

select json_extract_path_text('{"c.1.mcc":"250","appId":"sx-calllog","b.level":59,"c.1.mnc":"01"}','appId');

returns sx-calllog

This is especially useful since Redshift does not support lookahead/lookbehind (it is POSIX regex) & extract groups.

You can try using some lookahead and look behinds to isolate just the text inside the quotes for the appid. (?<=appId\":\")(?=.*\",)[^\"]*. I tested this out a bit using your examples you provided here.

To explain the regex a bit more: (?<=appId\":\")(?=.*\",)[^\"]*

  1. (?<=appId\":\"): positive look behind for appid":". Since you don't want the appid text itself being returned (just the value), you can preface the regex with a look behind to say "find me the following regex, but only when it is following the look behind text.
  2. (?=.*\",): positive look ahead for the ending ",. You don't want quotes to be returned in your match, but as with number 1 you want your regex to be bounded a bit and a look ahead does that.
  3. [^\"]*: The actual matching portion. You want to find the string of chars that are NOT ". This will match the entire value and stop matching right before the closing ".

EDIT: Changed the 3rd step a little bit, removed the , from that last piece, it is not needed and would break the match if the value were to actually contain a ,.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top