An awk
version
awk '{sub(/[^>]*>/,"");sub(/<.*/,"");$1=$1}1' file
573653840
573653841
573653842
573653843
573653844
573653845
Question
Given the following data how do i pull out the numbers between the physical-blocks angle brackets?
Raw data:
"6917: <physical-blocks> 573653840</physical-blocks>"
"8954: <physical-blocks>573653841</physical-blocks>"
"8991: <physical-blocks>573653842</physical-blocks>"
"9028: <physical-blocks>573653843</physical-blocks>"
"9065: <physical-blocks>573653844</physical-blocks>"
"9102: <physical-blocks>573653845</physical-blocks>"
desired output (array of):
573653840 573653841 573653842 573653843 573653844 573653845
I simply want to be able to extract data between <physical-blocks>
and </physical-blocks>
. Note; the full dataset include many strings with angle brackets - i specifically need the data between this specific set of strings.
Solution
An awk
version
awk '{sub(/[^>]*>/,"");sub(/<.*/,"");$1=$1}1' file
573653840
573653841
573653842
573653843
573653844
573653845
OTHER TIPS
With GNU awk
:
gawk 'RT=="</physical-blocks>"' RS='</?physical-blocks>' ORS=' ' file
If you want newline after the output then see use the one below:
$ cat file
"6917: <physical-blocks>573653840</physical-blocks>"
"8954: <physical-blocks>573653841</physical-blocks>"
"8991: <physical-blocks>573653842</physical-blocks>"
"9028: <physical-blocks>573653843</physical-blocks>"
"9065: <physical-blocks>573653844</physical-blocks>"
"9102: <physical-blocks>573653845</physical-blocks>"
$ gawk 'RT=="</physical-blocks>";END{print "\n"}' RS='</?physical-blocks>' ORS=' ' file
573653840 573653841 573653842 573653843 573653844 573653845
you can use simple lookahead and lookbehinds:
(?<=\>)(\s*)(\d*)(?=\<)