質問

So I have this dataset, that the first column starts with the name inside quotes. Is it possible to capture the name as a single field?

"Mazda RX4" 21 6 160 110 3.9 2.62 16.46 0 1 4 4
"Mazda RX4 Wag" 21 6 160 110 3.9 2.875 17.02 0 1 4 4
"Datsun 710" 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
"Hornet 4 Drive" 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
"Hornet Sportabout" 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2
"Valiant" 18.1 6 225 105 2.76 3.46 20.22 1 0 3 1
"Duster 360" 14.3 8 360 245 3.21 3.57 15.84 0 0 3 4
"Merc 240D" 24.4 4 146.7 62 3.69 3.19 20 1 0 4 2
"Merc 230" 22.8 4 140.8 95 3.92 3.15 22.9 1 0 4 2
"Merc 280" 19.2 6 167.6 123 3.92 3.44 18.3 1 0 4 4

Note that sometimes the name is single field (like "Valiant"), sometimes 2 (like "Mazda RX4" or 3 "Mazda RX4 Wag")

So base on the number of fields, I came up with this awk code that works as I wanted, however I wonder if there is any other systematic way to do so?

awk '{name=$1; for (i=2; i<=NF-11; i++) name=name " " $i; printf "%s\n", name}' data/mtcars.dat | head
Mazda RX4
Mazda RX4 Wag
Datsun 710
Hornet 4 Drive
Hornet Sportabout
Valiant
Duster 360
Merc 240D
Merc 230
Merc 280
役に立ちましたか?

解決

You could use " as the input field separator. That would assign an empty field to $1, the full name to $2, and the rest of the line to $3.

$ awk 'BEGIN{FS="\""}{print $2}' < test.dat
Mazda RX4
Mazda RX4 Wag
Datsun 710
Hornet 4 Drive
Hornet Sportabout
Valiant
Duster 360
Merc 240D
Merc 230
Merc 280

他のヒント

Just to make it as short as possible:

awk -F\" '$0=$2' file
Mazda RX4
Mazda RX4 Wag
Datsun 710
Hornet 4 Drive
Hornet Sportabout
Valiant
Duster 360
Merc 240D
Merc 230
Merc 280

Or some more robust:

awk -F\" '{$0=$2}1' file
awk NF=1 FPAT='[^"]+'

Result

Mazda RX4
Mazda RX4 Wag
Datsun 710
Hornet 4 Drive
Hornet Sportabout
Valiant
Duster 360
Merc 240D
Merc 230
Merc 280
ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top