Question

I'm having a very difficult time opening a tab delimitted file. The client says that it is definitely tab delimitted, but it seems like there are no text qualifiers.

I am running this statement:

 awk '{OF=OFS="\t"}{print $1,$2,$10,$12,$14,$20}' AECPRDA.TAB | head -10

and the output that i get:

+-----------------------------------------------------------------------+
| 22746528        BKEN48DVD       NEVER   050     R       N             |
| 22746535        BKEN48BR        NEVER   050     R       N             |
| 25584998        WD1194190DVD    DTS)    /       DOL     29.99         |
| 21548598        DSND001906102.2 /       001     11.49   8             |
| 25812794        WHV1000292717BR /       050     PG13    N             |
| 25812787        WHV1000284958DVD        SPEC    GRAVITY /       PG13  |
| 21425462        PBSDMST64400DVD SEASON  (3PC)   CLASSIC:        050   |
| 25584974        WD1194170BR     (WS     DTS     DIGC)   AC3           |
| 21388262        HBO1000394029DVD        3       OF      SEASON  59.98 |
| 25688450        WD11955700DVD   /       DOL)    THE     050           |
+-----------------------------------------------------------------------+

I don't believe that the columns are correctly "tabbed":

enter image description here

here's is a PURE text sample of the file:

22746528    BKEN48DVD   AW40    48  18  METALLICA (2PC) THROUGH THE NEVER (2PC) 050 090     R   12.99   19.98   85611500487 01/28/2014          N   N   30      1   A   1   11/27/2013  01/24/2014  11/27/2013  11/27/2013
22746535    BKEN48BR    AW40    48  BR  METALLICA (2PC) THROUGH THE NEVER (2PC) 050 090     R   16.25   24.98   85611500488 01/28/2014          N   N   30      1   A   2   11/27/2013  01/24/2014  11/27/2013  11/27/2013
25584998    WD1194190DVD    0819    1194190 18  FROZEN / (WS DOL DTS)   FROZEN / (WS DOL DTS)   050 110     G   21.25   29.99   78693683896 03/18/2014          N   N   0       2   A   3   12/20/2013  03/20/2014  12/20/2013  12/20/2013
21548598    DSND001906102.2 0107    001906102   02  FROZEN / O.S.T. FROZEN / O.S.T. 001 024         11.49   13.95   05008729574 11/25/2013          N   N   8       1   E   4   10/07/2013  03/20/2014  10/07/2013  10/07/2013
25812794    WHV1000292717BR 0526    1000292717  BR  GRAVITY / (UVDC)    GRAVITY / (UVDC)    050 093     PG13    29.49   35.99   88392924457 02/25/2014          N   N   30      1   E   5   01/16/2014  02/11/2014  01/16/2014  01/16/2014

am i doing something wrong with my AWK commands? why aren't the tabs being set correctly? is there a hidden "space" qualifier that i am missing?

here is an explanation that i got from someone, but i would like to implement it using AWK NOT excel (gd forbid): enter image description here

Tab delimited will probably not line up. The tab character is defined differently in different operating systems. Usually it is defined as 4 or 5 spaces when displayed. So if you have an artist name that is 5 characters, then the tab characher, then the title would start at character position 9. If the next line the artist is 20 characters long, then the tab character, then the title would appear at position 24. Hope this helps. (Another thought, tell user to open a blank spreadsheet in excel and use the Text Import)

thanks so much for your guidance!

Was it helpful?

Solution 2

I've solved it!

Instead of using this kind of tab character: "\t"

I needed to use $'\t'

I haven't found documentation on why, though.

Final answer:

awk -F$'\t' 'BEGIN{OF=OFS}{print $1,$2,$10,$12,$14,$20}' AECPRDA.TAB | head -10

OTHER TIPS

This should do:

awk 'BEGIN {FS=OFS="\t"} NR<=10 {print $1,$2,$10,$12,$14,$20}' AECPRDA.TAB
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top