# set filepaths & widths
fn <- "http://www.bom.gov.au/climate/data/lists_by_element/alphaAUS_269.txt"
file.widths <- c( 7 , -1 , 39 , -1 , 9 , -1 , 9 , -1 , 8 , -1 , 8 , -1 , 6 , -1 , 4 , -1 , 5 , -1 , 3 )
# note: finding the file.widths is often the most
# annoying part of reading in an ASCII data set
# if you have a SAS import script,
# check ?parse.SAScii in the R SAScii package :)
# find the headers
headers <-
read.fwf(
fn ,
widths = file.widths ,
skip = 2 ,
colClasses = "character" ,
nrows = 1
)
# remove spaces from column names
# and convert it to a character vector
cn <- gsub( " " , "" , headers[ 1 , ] )
# the % isn't a valid column name, so change that
cn[ 8 ] <- 'Pct'
# read everything in..
yourdata <-
read.fwf(
fn ,
widths = file.widths ,
skip = 4 ,
comment.char = "" ,
nrows = 535 ,
col.names = cn
)
Unable to create R data.frame from web text table
-
18-03-2022 - |
Question
There is a website with a table of Australian weather stations which I wish to load into an R data.frame. The first few rows - excluding the header - are like this
23034 ADELAIDE AIRPORT -34.9524 138.5204 Apr 1995 Mar 2012 16.7 81 36.8 Y
23046 ADELAIDE AIRPORT OLD SITE -34.9566 138.5356 Aug 2002 Jan 2005 2.4 89 37.8 Y
It looks like a tab delimited file but when I save as stations.txt and try read.delim, read.table or readLines, I just end up with everything in one column
I also tried copy and paste in Excel but none of the delimiting options seperated the data correctly
Solution
OTHER TIPS
Old style punch-card formatting, .... fixed width. Use the read.fwf function in utils:
df2 <- read.fwf(textConnection(" 23034 ADELAIDE AIRPORT -34.9524 138.5204 Apr 1995 Mar 2012 16.7 81 36.8 Y
23046 ADELAIDE AIRPORT OLD SITE -34.9566 138.5356 Aug 2002 Jan 2005 2.4 89 37.8 Y"), widths =c(49,9,9,9,9,7,7,6,2) )
df2
#-----------------------
V1 V2 V3 V4 V5 V6 V7 V8 V9
1 23034 ADELAIDE AIRPORT -34.9524 138.5204 Apr 1995 Mar 2012 16.7 81 36.8 Y
2 23046 ADELAIDE AIRPORT OLD SITE -34.9566 138.5356 Aug 2002 Jan 2005 2.4 89 37.8 Y
Anthony deserves the checkmark. Better coding with the fwf input; Here's what I was about to post:
df2 <- read.fwf(url("http://www.bom.gov.au/climate/data/lists_by_element/alphaAUS_269.txt"),
col.names= c('Site','Name', 'Lat', 'Lon','Start', 'End', 'Years', 'pct', 'Obs',
'AWS'), widths=c(7,41,9,9,9,9,7,7,6,2) , row.names=NULL, skip=4, nrows=535, comment.char="")
str(df2)
#-----------------
'data.frame': 535 obs. of 10 variables:
$ Site : int 23034 23046 23090 90180 9999 9741 68241 72160 15590 33295 ...
$ Name : Factor w/ 533 levels " ADELAIDE (KENT TOWN) ",..: 2 3 1 4 5 6 7 8 9 10 ...
$ Lat : num -35 -35 -34.9 -38.5 -34.9 ...
$ Lon : num 139 139 139 144 118 ...
$ Start: Factor w/ 231 levels "0 Aug 200","0 Dec 199",..: 86 134 135 84 194 49 9 206 7 77 ...
$ End : Factor w/ 72 levels "0 Aug 201","0 Jul 200",..: 39 13 26 9 16 39 70 21 6 56 ...
$ Years: Factor w/ 65 levels "0 0.","0 1.",..: 32 47 33 36 16 33 28 35 34 31 ...
$ pct : Factor w/ 173 levels "0 24 ","0 26 ",..: 116 66 59 108 51 41 153 15 139 27 ...
$ Obs : Factor w/ 262 levels " 1.0 "," 1.6 ",..: 145 151 223 242 230 216 141 130 138 81 ...
$ AWS : logi NA NA NA NA NA NA ...
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow