Question

I want to quantify the saving of space I can get by changing the format of a file.

I have a sparse matrix stocked in a text file (30% sparsity). Columns are separated by tabs.

Following an idea in an SO answer, I will change the format to row_id, col_id for the non zero terms only. I know how much space a float takes, but my question is: how much space does a tab take?

Was it helpful?

Solution

CouchDeveloper in his comment is correct. It's impossible to tell from the data you provide.
In a single byte character set encoding you'd save 1 byte per separator from the current ", ".
In a multibyte encoding it'd depend on the way each of those characters is encoded, you could theoretically even lose space. Say a tab is encoded as 4 bytes, a comma and space as 1 each, you'd end up taking 2 more bytes per separator.
Unless you have many separators and relatively very little data, I'd not worry one way or another, it'd be micro optimisation.
If you do, a binary encoding scheme might be more relevant.

OTHER TIPS

1 byte, but significantly less if you're using compression (based on how common they will be, less than a bit on average). Use compression.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top