I have my string in $LINE and I want $ITEMS to be the array version of this, split on single tabs and retaining blanks. Here's where I'm at now:

IFS=$'\n' ITEMS=($(echo "$LINE" | tr "\t" "\n"))

The issue here is that IFS is one-or-more so it gobbles up new-lines, tabs, whatever. I've tried a few other things based on other questions posted here but they assume that there will always be a value in all fields, never blank. And the one that seems to hold the key is far beyond me and operating on an entire file (I am just splitting a single string).

5 Answers
5

IFS is only one-or-more if the characters are whitespace. Non-whitespace characters are single delimiters. So a simple solution, if there is some non-whitespace character which you are confident is not in your string, is to translate tabs to that character and then split on it:

IFS=$'\2' read -ra ITEMS <<<"${LINE//$'\t'/$'\2'}"

Unfortunately, assumptions like "there is no instance of \2 in the input" tend to fail in the long-run, where "in the long-run" translates to "at the worst possible time". So you might want to do it in two steps:

One possibility: instead of splitting with IFS, use the -d option to read tab-terminated "lines" from the string. However, you need to ensure that your string ends with a tab as well, or you will lose the last item.

It's funny, I saw the -d and tried to make something of it myself unsuccessfully; I see the key was using a loop (I tried combining with -a). One question: why do you set IFS='' beforehand?
– Neil C. ObremskiNov 2 '13 at 1:31

It's necessary if one of the tab-delimited strings starts or ends with whitespace, as read would strip that before setting the value of x with the default value of IFS.
– chepnerNov 2 '13 at 3:50

To cope with a missing trailing newline, you can replace your read statement in the while test by IFS='' read -r -d$'\t' x || [[ $x ]], or just add items+=( "$x" ) after the while loop.
– gniourf_gniourfOct 30 '14 at 9:29

items+=("$x") after the loop will append an empty string if the file isn't missing the final newline, so you'd need a guard like (( $? )) && items+=("$x"). (Not tested, and there are tricky corner cases, so I'm not sure that's 100% correct.)
– chepnerOct 30 '14 at 12:52

As you can see, this works flawlessly: it preserves everything (spaces, newlines, etc.), splits only at the tab characters.

There's one drawback: it doesn't handle “empty fields”: observe there are two consecutive tabs in line; we would expect to get an empty field in arr, but that's not the case.

There's another less obvious drawback: the return code of read is 1, so technically, for Bash, there's a failure in this command. That's absolutely not a problem, unless you're using set -e or set -E, but this is not recommended anyways (so you shouldn't).

If you can live with these two minor drawbacks, this might be the ideal solution.

Note that we're using < <(printf '%s' "$line") and not <<< "$line" to feed read, as the latter inserts a trailing newline.

Words of the form $'string' are treated specially. The word expands to
string, with backslash-escaped characters replaced as specified by the
ANSI C standard. Backslash escape sequences, if present, are decoded
as follows:
\a alert (bell)
\b backspace
\e
\E an escape character
\f form feed
\n new line
\r carriage return
\t horizontal tab
\v vertical tab
\\ backslash
\' single quote
\" double quote
\? question mark
\nnn the eight-bit character whose value is the octal value
nnn (one to three digits)
\xHH the eight-bit character whose value is the hexadecimal
value HH (one or two hex digits)
\uHHHH the Unicode (ISO/IEC 10646) character whose value is the
hexadecimal value HHHH (one to four hex digits)
\UHHHHHHHH
the Unicode (ISO/IEC 10646) character whose value is the
hexadecimal value HHHHHHHH (one to eight hex digits)
\cx a control-x character

The expanded result is single-quoted, as if the dollar sign had not
been present.

A double-quoted string preceded by a dollar sign ($"string") will cause
the string to be translated according to the current locale. If the
current locale is C or POSIX, the dollar sign is ignored. If the
string is translated and replaced, the replacement is double-quoted.