Question

I have a big text file and the data in it are in 5 columns, but I need just the first and the last column of that.

It will take many days and probably with mistake if I want to enter the data of this two column one-by-one from here to another file. Is there a fast way to do this? For example:

     1   1.0000000000000000         0.0000000000 S {0}
     2   1.5000000000000000         0.3010299957 C {2}
     3   1.7500000000000000         0.6020599913 S {0,2}
     4   2.0000000000000000         0.7781512504 C {3}
     5   2.3333333333333333         1.0791812460 C {3,2}
     6   2.5000000000000000         1.3802112417 S {3,0,2}
     7   2.5277777777777778         1.5563025008 S {0,3}
     8   2.5833333333333333         1.6812412374 S {3,0,0,2}
     9   2.8000000000000000         1.7781512504 C {5,2}
    10   3.0000000000000000         2.0791812460 C {5,0,2}

I need the first column (numbering) and the last inside { }.

Was it helpful?

Solution

You can import it into Excel and manipulate it there.

OTHER TIPS

ALT + Left Mouse Click puts you in Column Mode Select. It's quite an useful shortcut that may help you.

in Notepad++, you can use regular expression to do replacement:

the regex for find and replace is:

^( +\d+).+\{([\d,]+)\}$

\1 \2

enter image description here

then can change the:

 1   1.0000000000000000         0.0000000000 S {0}
 2   1.5000000000000000         0.3010299957 C {2}
 3   1.7500000000000000         0.6020599913 S {0,2}
 4   2.0000000000000000         0.7781512504 C {3}
 5   2.3333333333333333         1.0791812460 C {3,2}
 6   2.5000000000000000         1.3802112417 S {3,0,2}
 7   2.5277777777777778         1.5563025008 S {0,3}
 8   2.5833333333333333         1.6812412374 S {3,0,0,2}
 9   2.8000000000000000         1.7781512504 C {5,2}
10   3.0000000000000000         2.0791812460 C {5,0,2}

to:

     1 0
     2 2
     3 0,2
     4 3
     5 3,2
     6 3,0,2
     7 0,3
     8 3,0,0,2
     9 5,2
    10 5,0,2

enter image description here

if not want the leading space, then use:

^( +\d+).+\{([\d,]+)\}$

\1 \2

will change to:

1 0
2 2
3 0,2
4 3
5 3,2
6 3,0,2
7 0,3
8 3,0,0,2
9 5,2
10 5,0,2

You should use awk or gawk which is available on windows platform also. Use gawk "{print $1,$5}" inpfile > outfile. I copied your file named it 'one'. You can see the output which consists of 1st and 5th column of your file.

>gawk "{print $1, $5}"  one
1 {0}
2 {2}
3 {0,2}
4 {3}
5 {3,2}
6 {3,0,2}
7 {0,3}
8 {3,0,0,2}
9 {5,2}
10 {5,0,2}

If you are using .NET, FileHelpers may save you a lot of time. From your post we can't tell what technology you are hoping to use to accomplish this.

Ultraedit has a tool for selecting columns and opens large files (I tried a 900 Mb file on a 2008 desktop and it opened in 3 minutes). I think it has a demo version fully operational. Excel could work if you do not have too many rows. Cheers,

There is only one way to convolve ungodly amounts of data. That is with the command prompt.

$cat text.txt | sed 's/{.*,//;s/  */ /g;s/[{}]//g' | awk '{print $1","$5}' > clean_text.csv

This 15 second fix is not available in Windows OS. It will take you less time to download and install Linux on that old dead computer in your closet than it will to get your data in and out of Excel.

Happy coding!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top