質問

I have a matrix(5800 rows and 350 columns) of numbers. Each cell is either

0 / 0
1 / 1
2 / 2

What is the fastest way to remove all spaces in each cell, to have:

0/0
1/1
2/2

Sed, R, anything that will do it fastest.

役に立ちましたか?

解決

If you are going for efficiency, you should probably use coreutils tr for such a simple task:

tr -d ' ' < infile

I compared the posted answers against a 300K file, using GNU awk, GNU sed, perl v5.14.2 and GNU coreutils v8.13. The tests were each run 30 times, this is the average:

awk  - 1.52s user 0.01s system 99% cpu 1.529 total
sed  - 0.89s user 0.00s system 99% cpu 0.900 total
perl - 0.59s user 0.00s system 98% cpu 0.600 total
tr   - 0.02s user 0.00s system 90% cpu 0.020 total

All testes were run as above (cmd < infile) and with the output directed to /dev/null.

他のヒント

Using sed:

sed "s/ \/ /\//g" input.txt

It means:

Replace the string " / " (/ \/ /) by one slash (/\/) and do it globally (/g).

Here's an awk alternative that does exactly the same thing:

awk '{gsub(" ",""); print}' input.txt > output.txt

Explanations:

  • awk '{...}': invoke awk, then for each line do the stuff enclosed by braces.
  • gsub(" ","");: replace all space chars (single or multiple in a row) with the empty string.
  • print: print the entire line
  • input.txt: specifying your input file as argument to awk
  • > output.txt: redirect output to a file.

A perl solution could look like this:

perl -pwe 'tr/ //d' input.txt > output.txt

You can add the -i switch to do in-place edit.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top