Question

I have my string in $LINE and I want $ITEMS to be the array version of this, split on single tabs and retaining blanks. Here's where I'm at now:

IFS=$'\n' ITEMS=($(echo "$LINE" | tr "\t" "\n"))

The issue here is that IFS is one-or-more so it gobbles up new-lines, tabs, whatever. I've tried a few other things based on other questions posted here but they assume that there will always be a value in all fields, never blank. And the one that seems to hold the key is far beyond me and operating on an entire file (I am just splitting a single string).

My preference here is a pure-BASH solution.

Was it helpful?

Solution

IFS is only one-or-more if the characters are whitespace. Non-whitespace characters are single delimiters. So a simple solution, if there is some non-whitespace character which you are confident is not in your string, is to translate tabs to that character and then split on it:

IFS=$'\2' read -ra ITEMS <<<"${LINE//$'\t'/$'\2'}"

Unfortunately, assumptions like "there is no instance of \2 in the input" tend to fail in the long-run, where "in the long-run" translates to "at the worst possible time". So you might want to do it in two steps:

IFS=$'\2' read -ra TEMP < <(tr $'\t\2' $'\2\t' <<<"$LINE")
ITEMS=("${TEMP[@]//$'\t'/$'\2'}")

OTHER TIPS

One possibility: instead of splitting with IFS, use the -d option to read tab-terminated "lines" from the string. However, you need to ensure that your string ends with a tab as well, or you will lose the last item.

items=()
while IFS='' read -r -d$'\t' x; do
   items+=( "$x" )
done <<< $'   foo   \t  bar\nbaz \t   foobar\t'

printf "===%s===\n" "${items[@]}"

Ensuring a trailing tab without adding an extra field can be accomplished with

if [[ $str != *$'\t' ]]; then str+=$'\t'; fi

if necessary.

IFS Special Characters:

Words of the form $'string' are treated specially.  The word expands to
string, with backslash-escaped characters replaced as specified by  the
ANSI  C  standard.  Backslash escape sequences, if present, are decoded
as follows:
       \a     alert (bell)
       \b     backspace
       \e
       \E     an escape character
       \f     form feed
       \n     new line
       \r     carriage return
       \t     horizontal tab
       \v     vertical tab
       \\     backslash
       \'     single quote
       \"     double quote
       \?     question mark
       \nnn   the eight-bit character whose value is  the  octal  value
              nnn (one to three digits)
       \xHH   the  eight-bit  character  whose value is the hexadecimal
              value HH (one or two hex digits)
       \uHHHH the Unicode (ISO/IEC 10646) character whose value is  the
              hexadecimal value HHHH (one to four hex digits)
       \UHHHHHHHH
              the  Unicode (ISO/IEC 10646) character whose value is the
              hexadecimal value HHHHHHHH (one to eight hex digits)
       \cx    a control-x character 

The expanded result is single-quoted, as if the dollar sign had not been present.

A double-quoted string preceded by a dollar sign ($"string") will cause the string to be translated according to the current locale. If the current locale is C or POSIX, the dollar sign is ignored. If the string is translated and replaced, the replacement is double-quoted.

A pure bash solution that will only split on tabs, and preserve newlines and other funny symbols, if any:

IFS=$'\t' read -r -a arr -d '' < <(printf '%s' "$line")

Try it:

$ line=$'zero\tone with\nnewlines\ttwo\t     three   \n\t\tfive\n'
$ IFS=$'\t' read -r -a arr -d '' < <(printf '%s' "$line")
$ declare -p arr
declare -a arr='([0]="zero" [1]="one with
newlines" [2]="two" [3]="     three   
" [4]="five
")'

As you can see, this works flawlessly: it preserves everything (spaces, newlines, etc.), splits only at the tab characters.

There's one drawback: it doesn't handle “empty fields”: observe there are two consecutive tabs in line; we would expect to get an empty field in arr, but that's not the case.

There's another less obvious drawback: the return code of read is 1, so technically, for Bash, there's a failure in this command. That's absolutely not a problem, unless you're using set -e or set -E, but this is not recommended anyways (so you shouldn't).

If you can live with these two minor drawbacks, this might be the ideal solution.

Note that we're using < <(printf '%s' "$line") and not <<< "$line" to feed read, as the latter inserts a trailing newline.

line=$'zero\tone\ttwo'
IFS=$'\t' read -a arr <<< "${line}"
declare -p

Output is

declare -a arr='([0]="zero" [1]="one" [2]="two")'

Note. This doesn't deal with newlines in line.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top