Ignoring escaped delimiters (commas) with awk?

https://stackoverflow.com/questions/1468210

13-09-2019
|

Question

If I had a string with escaped commas like so:

a,b,{c\,d\,e},f,g

How might I use awk to parse that into the following items?

a
b
{c\,d\,e}
f
g

Solution

{
   split($0, a, /,/)
   j=1
   for(i=1; i<=length(a); ++i) {
      if(match(b[j], /\\$/)) {
         b[j]=b[j] "," a[i]
      } else {
         b[++j] = a[i]
      }
   }
   for(k=2; k<=length(b); ++k) {
      print b[k]
   }
}

Split into array a, using ',' as delimiter
Build array b from a, merging lines that end in '\'
Print array b (Note: Starts at 2 since first item is blank)

This solution presumes (for now) that ',' is the only character that is ever escaped with '\'--that is, there is no need to handle any \\ in the input, nor weird combinations such as \\\,\\,\\\\,,\,.

OTHER TIPS

{
  gsub("\\\\,", "!Q!")
  n = split($0, a, ",")
  for (i = 1; i <= n; ++i) {
    gsub("!Q!", "\\,", a[i])
    print a[i]
  }
}

I don't think awk has any built-in support for something like this. Here's a solution that's not nearly as short as DigitalRoss's, but should have no danger of ever accidentally hitting your made-up string (!Q!). Since it tests with an if, you could also extend it to be careful about whether you actually have \\, at the end of your string, which should be an escaped slash, not comma.

BEGIN {
    FS = ","
}

{
    curfield=1
    for (i=1; i<=NF; i++) {
        if (substr($i,length($i)) == "\\") {
            fields[curfield] = fields[curfield] substr($i,1,length($i)-1) FS
        } else {
            fields[curfield] = fields[curfield] $i
            curfield++
        }
    }
    nf = curfield - 1
    for (i=1; i<=nf; i++) {
        printf("%d: %s   ",i,fields[i])
    }
    printf("\n")
}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow