Question

I've managed to extract data (from an html page) that goes into a table, and I've isolated the columns of said table into a text file that contains the lines below:

[30,30,32,35,34,43,52,68,88,97,105,107,107,105,101,93,88,80,69,55],
[28,6,6,50,58,56,64,87,99,110,116,119,120,117,114,113,103,82,6,47],
[-7,,,43,71,30,23,28,13,13,10,11,12,11,13,22,17,3,,-15,-20,,38,71],
[0,,,3,5,1.5,1,1.5,0.5,0.5,0,0.5,0.5,0.5,0.5,1,0.5,0,-0.5,-0.5,2.5]

Each bracketed list of numbers represents a column. What I'd like to do is turn these lists into actual columns that I can work with in different data formats. I'd also like to be sure to include that blank parts of these lists too (i.e., "[,,,]")

This is basically what I'm trying to accomplish:

30 28 -7  0
30  6    
32  6     
35 50 43  3
34 58 71  5
43 56 30  1.5
52 64 23  1
 .  .  .  .
 .  .  .  .
 .  .  .  .

I'm parsing data from a web page, and ultimately planning to make the process as automated as possible so I can easily work with the data after I output it to a nice format.

Anyone know how to do this, have any suggestions, or thoughts on scripting this?

Was it helpful?

Solution 2

awk based solution:

awk -F, '{gsub(/\[|\]/, ""); for (i=1; i<=NF; i++) a[i]=a[i] ? a[i] OFS $i: $i}
      END {for (i=1; i<=NF; i++) print a[i]}' file
30 28 -7 0
30 6  
32 6  
35 50 43 3
34 58 71 5
43 56 30 1.5
52 64 23 1
..........
..........

OTHER TIPS

Since you have your lists in python, just do it in python:

l=[["30", "30", "32"], ["28","6","6"], ["-7", "", ""], ["0", "", ""]]
for i in zip(*l):
  print "\t".join(i)

produces

30      28      -7      0
30      6
32      6

Another solution, but it works only for file with 4 lines:

$ paste \
<(sed -n '1{s,\[,,g;s,\],,g;s|,|\n|g;p}' t) \
<(sed -n '2{s,\[,,g;s,\],,g;s|,|\n|g;p}' t) \
<(sed -n '3{s,\[,,g;s,\],,g;s|,|\n|g;p}' t) \
<(sed -n '4{s,\[,,g;s,\],,g;s|,|\n|g;p}' t)
30  28  -7  0
30  6       
32  6       
35  50  43  3
34  58  71  5
43  56  30  1.5
52  64  23  1
68  87  28  1.5
88  99  13  0.5
97  110 13  0.5
105 116 10  0
107 119 11  0.5
107 120 12  0.5
105 117 11  0.5
101 114 13  0.5
93  113 22  1
88  103 17  0.5
80  82  3   0
69  6       -0.5
55  47  -15 -0.5
        -20 2.5

        38  
        71  

Updated: or another version with preprocessing:

$ sed 's|\[||;s|\][,]\?||' t >t2
$ paste \
<(sed -n '1{s|,|\n|g;p}' t2) \
<(sed -n '2{s|,|\n|g;p}' t2) \
<(sed -n '3{s|,|\n|g;p}' t2) \
<(sed -n '4{s|,|\n|g;p}' t2)

If a file named data contains the data given in the problem (exactly as defined above), then the following bash command line will produce the output requested:

$ sed -e 's/\[//' -e 's/\]//' -e 's/,/ /g' <data | rs -T

Example:

cat data
[30,30,32,35,34,43,52,68,88,97,105,107,107,105,101,93,88,80,69,55],
[28,6,6,50,58,56,64,87,99,110,116,119,120,117,114,113,103,82,6,47],
[-7,,,43,71,30,23,28,13,13,10,11,12,11,13,22,17,3,,-15,-20,,38,71],
[0,,,3,5,1.5,1,1.5,0.5,0.5,0,0.5,0.5,0.5,0.5,1,0.5,0,-0.5,-0.5,2.5]

$ sed -e 's/[//' -e 's/]//' -e 's/,/ /g' <data | rs -T 30 28 -7 0 30 6 43 3 32 6 71 5 35 50 30 1.5 34 58 23 1 43 56 28 1.5 52 64 13 0.5 68 87 13 0.5 88 99 10 0 97 110 11 0.5 105 116 12 0.5 107 119 11 0.5 107 120 13 0.5 105 117 22 1 101 114 17 0.5 93 113 3 0 88 103 -15 -0.5 80 82 -20 -0.5 69 6 38 2.5 55 47 71

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top