In some text editors, tabs are displayed like that. The contents of the file are correct, it's just a matter of how the file is displayed on screen. It happens with tabs but not with |
which is why you don't see it happening when you use |
.
tab-delimited file output inconsistent
-
29-09-2022 - |
Question
I am attempting to write elements from a nested list to individual lines in a file, with each element separated by tab characters. Each of the nested lists is of the following form:
('A', 'B', 'C', 'D')
The final output should be of the form:
A B C D
E F G H
. . . .
. . . .
However, my output seems to have reproducible inconsistencies such that the output is of the general form:
A B C D
E F G H
I J K L
M N O P
. . . .
. . . .
I've inspected the lists before writing and they seem identical in form. The code I'm using to write is:
with open("letters.txt", 'w') as outfile:
outfile.writelines('\t'.join(line) + '\n' for line in letter_list)
Importantly, if I replace '\t' with, for example, '|', the file is created without such inconsistencies. I know whitespace parsing can become an issue for certain file I/O operations, but I don't know how to troubleshoot it here.
Thanks for the time.
EDIT: Here is some actual input data (in nested-list form) and output:
IN
('5', '+', '5752624-5752673', 'alt_region_8161'), ('1', '+', '621461-622139', 'alt_region_67'), ('1', '+', '453907-454063', 'alt_region_60'), ('1', '+', '539611-539815', 'alt_region_61'), ('4', '+', '14610049-14610103', 'alt_region_6893'), ('4', '+', '14610049-14610144', 'alt_region_6895'), ('4', '+', '14610049-14610144', 'alt_region_6897'), ('4', '+', '14610049-14610144', 'alt_region_6896')]
OUT
4 + 12816011-12816087 alt_region_6808
1 + 21214720-21214747 alt_region_2377
4 + 9489968-9490833 alt_region_7382
1 + 12121545-12126263 alt_region_650
4 + 9489968-9490811 alt_region_7381
4 + 12816011-12816087 alt_region_6807
1 + 2032338-2032740 alt_region_157
5 + 4695084-4695628 alt_region_9316
1 + 22294677-22295134 alt_region_2424
1 + 22294677-22295139 alt_region_2425
1 + 22294677-22295139 alt_region_2426
1 + 22294677-22295139 alt_region_2427
1 + 22294677-22295134 alt_region_2422
1 + 22294677-22295134 alt_region_2423
1 + 22294384-22295198 alt_region_2428
1 + 22294384-22295198 alt_region_2429
5 + 20845105-20845211 alt_region_9784
5 + 20845105-20845206 alt_region_9783
3 + 2651447-2651889 alt_region_5562
EDIT: Thanks to everyone who commented. Sorry if the question was poorly phrased. I appreciate the help in clarifying the issue (or, apparently, non-issue).
Solution 2
OTHER TIPS
There are no spaces (' '
)in your output, only tabs ('\t'
).
>>> print(repr('1 + 21214720-21214747 alt_region_2377'))
'1\t+\t21214720-21214747\talt_region_2377'
^^ ^^ ^^
Tabs are not equivalent to a fixed number of spaces (in most editors). Rather, they move the character following the tab to the next available multiple of x
characters from the left margin, where x
varies - x
is most commonly 8, though it is 4 here on SO.
>>> for i in range(7):
print('x'*i+'\tx')
x
x x
xx x
xxx x
xxxx x
xxxxx x
xxxxxx x
If you want your output to appear aligned to the naked eye, you should use string formatting:
>>> for line in data:
print('{:4} {:4} {:20} {:20}'.format(*line))
5 + 5752624-5752673 alt_region_8161
1 + 621461-622139 alt_region_67
1 + 453907-454063 alt_region_60
1 + 539611-539815 alt_region_61
4 + 14610049-14610103 alt_region_6893
4 + 14610049-14610144 alt_region_6895
4 + 14610049-14610144 alt_region_6897
4 + 14610049-14610144 alt_region_6896
Note, however, that this will not necessarily be readable by code that expects a tab-separated value file.