I'm trying to create a table from a csv file comma separated. I'm aware that not all the rows have the same number of elements so I would write some code to eliminate those rows. The problem is that there are rows that include numbers (in thousands) which include another comma as well. I'm not capable of splitting those rows properly, here's my code:

pURL <- "http://financials.morningstar.com/ajax/exportKR2CSV.html?&callback=?&t=EI&region=FRA&order=asc"
res <- read.table(pURL, header=T, sep='\t', dec = '.', stringsAsFactors=F)
x <- unlist( lapply(keyRatios, function(u) strsplit(u,split='\n')) [[1]] )
有帮助吗?

解决方案

You need to make use of the quote = argument of either read.table or read.delim...

res <- read.delim( pURL, header=F, sep=',', dec = '.', stringsAsFactors=F , quote = "\"" ,   fill = TRUE , skip = 2 )

The seperator is "," not "\t". Numbers written as thousands of millions are always quoted in this file so you can use the quote argument to make R ignore the comma inside the quotes with quote = "\"", and you want to skip the first two lines, and use fill = TRUE to fill in blanks on uneven lines.

head( res )

#                           2003-12 2004-12 2005-12 2006-12 2007-12 2008-12 2009-12 2010-12 2011-12 2012-12   TTM
#2          Revenue EUR Mil   2,116   2,260   2,424   2,690   2,908   3,074   3,268   3,892   4,190   4,989 5,034
#3           Gross Margin %    60.6    60.3    57.3    58.2    57.6    56.9    56.1    55.5    55.4    55.8  56.1
#4 Operating Income EUR Mil     365     404     394     460     505     515     555     618     683     832   841
#5       Operating Margin %    17.2    17.9    16.2    17.1    17.4    16.7    17.0    15.9    16.3    16.7  16.7
#6       Net Income EUR Mil     200     227     289     331     371     389     402     472     518     584   594
#7   Earnings Per Share EUR    3.90    4.30    5.44    6.22    3.48    3.62    3.78    4.36    4.82    2.77  2.80

I set the column names of res afterwards like this...

names( res ) <- res[1,]; res <- res[-1,]

It gave better formatting.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top