Question

There is a website with a table of Australian weather stations which I wish to load into an R data.frame. The first few rows - excluding the header - are like this

  23034 ADELAIDE AIRPORT                         -34.9524  138.5204 Apr 1995 Mar 2012   16.7   81  36.8   Y
  23046 ADELAIDE AIRPORT OLD SITE                -34.9566  138.5356 Aug 2002 Jan 2005    2.4   89  37.8   Y

It looks like a tab delimited file but when I save as stations.txt and try read.delim, read.table or readLines, I just end up with everything in one column

I also tried copy and paste in Excel but none of the delimiting options seperated the data correctly

Was it helpful?

Solution

# set filepaths & widths
fn <- "http://www.bom.gov.au/climate/data/lists_by_element/alphaAUS_269.txt"
file.widths <- c( 7 , -1 , 39 , -1 , 9 , -1 , 9 , -1 , 8 , -1 , 8 , -1 , 6 , -1 , 4 , -1 , 5 , -1 , 3 )
# note: finding the file.widths is often the most
# annoying part of reading in an ASCII data set
# if you have a SAS import script,
# check ?parse.SAScii in the R SAScii package :)


# find the headers
headers <- 
    read.fwf( 
        fn ,
        widths = file.widths ,
        skip = 2 ,
        colClasses = "character" ,
        nrows = 1
    )

# remove spaces from column names
# and convert it to a character vector
cn <- gsub( " " , "" , headers[ 1 , ] )

# the % isn't a valid column name, so change that
cn[ 8 ] <- 'Pct'

# read everything in..
yourdata <-
    read.fwf( 
        fn ,
        widths = file.widths ,
        skip = 4 ,
        comment.char = "" ,
        nrows = 535 ,
        col.names = cn
    )

OTHER TIPS

Old style punch-card formatting, .... fixed width. Use the read.fwf function in utils:

df2 <- read.fwf(textConnection("  23034 ADELAIDE AIRPORT                         -34.9524  138.5204 Apr 1995 Mar 2012   16.7   81  36.8   Y
   23046 ADELAIDE AIRPORT OLD SITE                -34.9566  138.5356 Aug 2002 Jan 2005    2.4   89  37.8   Y"), widths =c(49,9,9,9,9,7,7,6,2) )
df2
#-----------------------
                                                 V1       V2       V3        V4        V5   V6 V7   V8 V9
1   23034 ADELAIDE AIRPORT                          -34.9524 138.5204  Apr 1995  Mar 2012 16.7 81 36.8  Y
2   23046 ADELAIDE AIRPORT OLD SITE                 -34.9566 138.5356  Aug 2002  Jan 2005  2.4 89 37.8  Y

Anthony deserves the checkmark. Better coding with the fwf input; Here's what I was about to post:

df2 <- read.fwf(url("http://www.bom.gov.au/climate/data/lists_by_element/alphaAUS_269.txt"), 
col.names= c('Site','Name', 'Lat', 'Lon','Start',    'End',       'Years',   'pct',    'Obs', 
'AWS'), widths=c(7,41,9,9,9,9,7,7,6,2) , row.names=NULL, skip=4, nrows=535, comment.char="")
str(df2)
#-----------------
'data.frame':   535 obs. of  10 variables:
 $ Site : int  23034 23046 23090 90180 9999 9741 68241 72160 15590 33295 ...
 $ Name : Factor w/ 533 levels " ADELAIDE (KENT TOWN)                    ",..: 2 3 1 4 5 6 7 8 9 10 ...
 $ Lat  : num  -35 -35 -34.9 -38.5 -34.9 ...
 $ Lon  : num  139 139 139 144 118 ...
 $ Start: Factor w/ 231 levels "0 Aug 200","0 Dec 199",..: 86 134 135 84 194 49 9 206 7 77 ...
 $ End  : Factor w/ 72 levels "0 Aug 201","0 Jul 200",..: 39 13 26 9 16 39 70 21 6 56 ...
 $ Years: Factor w/ 65 levels "0    0.","0    1.",..: 32 47 33 36 16 33 28 35 34 31 ...
 $ pct  : Factor w/ 173 levels "0   24 ","0   26 ",..: 116 66 59 108 51 41 153 15 139 27 ...
 $ Obs  : Factor w/ 262 levels "  1.0 ","  1.6 ",..: 145 151 223 242 230 216 141 130 138 81 ...
 $ AWS  : logi  NA NA NA NA NA NA ...
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top