My question is, how do I format tsv (most likely csv as well) files, to make sure that when a maximum cell length (that can be set with a parameter) is reached, it's not just a line break, but data will start in a new line with the same indentation.

1 Answer
1

I don't know if understand what to want to achieve, but perhaps this awk program is useful for you:

Content of script.awk:

{
## Number of blocks printed to output.
block = 0
## Get number of columns searching how many tabs exists in the line.
## I substract one because each line has a tab at the end and splits
## function count blank space after it like a new column.
col_nums = split( $0, dummy, /\t+/ )
--col_nums
## Incorrect line if it has not any tab. Omit it.
if ( col_nums < 1 ) {
next
}
## Get number of chars of each block to output.
## 'max_cell_length' is an input argument provided by the user. It means
## number of chars to input by line.
chars = max_cell_length / col_nums
## For each column...
for ( i = 1; i <= NF; i++ ) {
## This is the index where I begin to extract a substring. Zero is
## at first char.
begin_idx = 0
## Get for each column blocks of 'chars' characters. And repeat until
## end of column.
while ( begin_idx < length( $i ) ) {
column = substr( $i, begin_idx, chars )
## Increment index to extract next block where last one ended.
begin_idx += chars
## Print block to output.
printf "%s ", column
## When have been printed number of columns indicated by the
## user, change to next line.
if ( ++block % col_nums == 0 ) {
printf "\n"
}
}
}
}
{
## For each line, print an extra newline for a pretty output.
printf "\n"
}

You can use variable max_cell_length to indicate number of chars per line of output (without banks), and I suppose it will be a factor of number of chars in original data. Otherwise output will be bad formated, I tested it with 30, as you can see in this post, and with 50. Both seems correct but not with many other strange numbers.