SED extract value

https://stackoverflow.com//questions/12688898

12-12-2019
|

Question

can anybody please help me sed get the value of time, lat and lon based on the below text

{"class":"TPV","tag":"MID2","device":"/dev/ttyUSB0","mode":3,"time":"2012-10-02T10:43:21.000Z","ept":0.005,"lat":55.190682291,"lon":25.265912847,"alt":19.149,"epx":58.300,"epy":74.796,"epv":144.575,"track":148.2723,"speed":1.623,"climb":-1.471,"eps":149.59}

Solution

This is fairly trivial with GNU awk:

awk -F, '{ for (i=1; i<=NF; i++) if ($i ~ /time|lat|lon/) { match($i, /^\"([^\"]+)\":\"?([^\"]+)\"?/, array); printf "%s: %s\n", array[1], array[2] } }' file.txt

Results:

time: 2012-10-02T10:43:21.000Z
lat: 55.190682291
lon: 25.265912847

OTHER TIPS

$ grep -oP '"lat":\K[\d.]+' file
$ grep -oP '"lon":\K[\d.]+' file
$ grep -oP '"time":"\K[^"]+' file

With egrep and sed

<infile egrep -o '"(lat|lon|time)":"?[^,]*' | sed 's/[^:]*://'

Output:

"2012-10-02T10:43:21.000Z"
55.190682291
25.265912847

Append tr -d '"' to the pipeline if you don't like double-quotes.

With sed alone

<infile sed -r 's/"(lat|lon|time)":"?([^,"]*)/\n\2\n/g' | sed -n '2~2p'

Output:

2012-10-02T10:43:21.000Z
55.190682291
25.265912847

The first sed separates matches so they will be on every other line, the second picks them out.

With tr and grep

<infile tr ',' '\n' | grep 'time\|lon\|lat'

Output:

"time":"2012-10-02T10:43:21.000Z"
"lat":55.190682291
"lon":25.265912847

I would do (as a sed script):

#!/bin/sed -f

h;G;G

s/[^\n]*"lat"\s*:\s*\([0-9.]*\)[^\n]*/\1/
s/[^\n]*"lon"\s*:\s*\([0-9.]*\)[^\n]*/\1/
s/\n[^\n]*"time"\s*:\s*"\([^"]*\)".*$/\
\1/

The first line three commands (h;G;G) copies the line twice. It does this by copying the input line into an auxiliary buffer (called the hold space) with the "h" command, and then appending the contents of this hold space into pattern space (ie. the working buffer) with the "G" command, twice. Now we have three copies of the line.

For simplicity and to be more general, there are three separate commands to extract the data, but the format is analogous:

Skip some characters until we find our key. Beware that we must skip characters that aren't newlines ([^\n]*) in the first two commands, otherwise they will affect the lines below them as a consequence of its the greedy behaviour (ie. if skip as many characters as you can before finding a "lat", you will skip the first two lines because the third line also contains "lat"). In the last command, you may skip any character (.*), but you must first skip a newline character to prevent it from matching the previous lines.
Skip the key
Skip zero or more white space characters (\s*)
Skip the colon
Skip more optional white space characters
Capture the data. A capture is specified by the backslashed parenthesis (ie. the \( and the \)), and it will store the input that matches the expression between the parenthesis into an auxiliary "variable" called \1 (if you have more than one capture group, then the second will be called \2, the third \3, and so on up to \9). In the first two commands we match a series of digits or periods ([0-9.]*). In the last command, we capture any characters that aren't a double quote ([^"]*"), but we also skip a double quote before an one after the capture group (ie. skip the openning and closing double quotes).
Skip more characters. We skip as many characters that aren't a newline as we can, so we effectively skip to the end of the line.

Finally, in each command we replace the match with the capture result. On the last command, since we match and therefore skip the newline separating the second and the third line, we must include it in the replacement. To include it, we have to add a backslash and an actual newline character after it. That's why the replacement is split into two lines.

Hope this helps =)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow