Question

I'm using ssconvert in Gnumeric to convert a bunch of ODS files to CSV files with the command:

ssconvert -O 'separator=; quoting-mode=never' "f.ods" "f.txt";

which works out great ... most of the time. Sometimes, there are cells where the user has punched in a new line character within the cell (in OpenOffice and LibreOffice on Mac, you achieve this by pressing cmd+enter). This results in the subsequently created CSV file getting an extra row, so instead of

This is some text. Here comes a newline that should be ignored;Some data;Some more data

I get

This is some text. Here comes a newline that should be ignored;Some data; Some more data

Is it possible in the conversion process to replace all these newline characters within cells with something else, for example a *?

Or can I somehow set the computer to ignore all the inline characters within cells?

Was it helpful?

Solution

Here's your problem:

ssconvert -O 'separator=;quoting-mode=never'"f.ods" "f.txt";

By preventing ssconvert from quoting where necessary, you're shooting yourself in the foot here, and your problem is not limited to newlines. For example, this spreadsheet:

example.ods

enter image description here

is converted by your ssconvert command to this:

example.txt

A1;B1;C1
A2;XX;B2
YY;C2

Good luck untangling that.

Rather than attempting to undo the mess after conversion (which is going to be impossible to do reliably), or by somehow pre-processing your source ODS file prior to conversion (which is insane – if you're converting to CSV it's presumably because you want to avoid messing with ODS documents), you need to use a CSV dialect that doesn't have this kind of fundamental flaw.

That means you need your data to be quoted. It turns out that ssconvert isn't intelligent enough to quote cells containing the separator on its default setting:

$ ssconvert -O 'separator=;' example.ods example-2.txt
$ cat example-2.txt
A1;B1;C1
A2;XX;"B2
YY";C2

... so you're going to need to quote everything:

$ ssconvert -O 'separator=; quoting-mode=always' example.ods example-3.txt
$ cat example-3.txt 
"A1";"B1";"C1"
"A2;XX";"B2
YY";"C2"

There's no reliable way around this with CSV; any solution you come up with other than quoting your data properly is going to come back and bite you at some point, because unquoted CSV is fundamentally broken as a data format.

To reiterate: Do not attempt to work around this fundamental flaw in unquoted CSV. Even if you think you've worked around all the problems you created for yourself by using an ambiguous data format, at some point a circumstance you didn't anticipate will come along, and you will repent at your leisure.

OTHER TIPS

Another solution (in this case for xlsx files) is:

  1. (if not yet installed) install xlsx2csv: apt or pip install

  2. with option -e, inside multi-line cells, newline is replaced by \n

Reusing @ZeroPiraeus example,

$ xlsx2csv -e -d ';' example.xlsx

A1;B1;C1
A2;XX;B2\nYY;C2
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top