Domanda

Hi I have a string generated by Python and I need to read into R to analyze it.

The only difference between the two strings below is the length(number of elements inside list). And R cannot read the longer one successfully.

textWork <- "[('08/10/2013 01:50:16 AM INFO', 'product1', '', '61.12000', '1'), ('08/10/2013 02:04:23 AM INFO', 'product1', '', '61.12000', '1'), ('08/11/2013 02:29:46 AM INFO', 'product1', '', '61.12000', '1'), ('08/12/2013 12:58:43 AM INFO', 'product1', '', '61.12000', '1'), ('08/12/2013 01:12:18 AM INFO', 'product1', '', '61.12000', '1'), ('08/13/2013 01:14:57 AM INFO', 'product1', '', '61.12000', '1'), ('08/14/2013 02:01:42 AM INFO', 'product1', '', '61.12000', '1'), ('08/14/2013 02:04:43 AM INFO', 'product1', '', '61.12000', '1'), ('08/15/2013 01:09:23 AM INFO', 'product1', '', '61.12000', '1'), ('08/15/2013 01:22:50 AM INFO', 'product1', '', '61.12000', '1'), ('08/16/2013 12:56:52 AM INFO', 'product1', '', '61.12000', '1'), ('08/16/2013 01:09:38 AM INFO', 'product1', '', '61.12000', '1'), ('08/17/2013 12:54:20 AM INFO', 'product1', '', '61.12000', '1'), ('08/17/2013 01:07:51 AM INFO', 'product1', '', '61.12000', '1'), ('08/18/2013 12:54:14 AM INFO', 'product1', '', '61.12000', '1'), ('08/18/2013 01:09:37 AM INFO', 'product1', '', '61.12000', '1'), ('08/19/2013 12:54:13 AM INFO', 'product1', '', '61.12000', '1'), ('08/19/2013 01:10:06 AM INFO', 'product1', '', '61.12000', '1'), ('08/20/2013 02:09:17 AM INFO', 'product1', '', '61.12000', '1'), ('08/20/2013 02:25:56 AM INFO', 'product1', '', '61.12000', '1'), ('08/21/2013 01:21:03 AM INFO', 'product1', '', '61.12000', '1'), ('08/21/2013 01:34:59 AM INFO', 'product1', '', '61.12000', '1'), ('08/22/2013 01:32:54 AM INFO', 'product1', '', '61.12000', '1'), ('08/22/2013 01:55:25 AM INFO', 'product1', '', '61.12000', '1'), ('08/23/2013 01:23:44 AM INFO', 'product1', '', '61.12000', '1'), ('08/23/2013 01:41:08 AM INFO', 'product1', '', '61.12000', '1'), ('08/24/2013 01:17:46 AM INFO', 'product1', '', '61.12000', '1'), ('08/24/2013 01:31:12 AM INFO', 'product1', '', '61.12000', '1'), ('08/25/2013 12:57:21 AM INFO', 'product1', '', '61.12000', '1'), ('08/25/2013 01:10:55 AM INFO', 'product1', '', '61.12000', '1'), ('08/26/2013 12:56:37 AM INFO', 'product1', '', '61.12000', '1'), ('08/26/2013 01:11:03 AM INFO', 'product1', '', '61.12000', '1'), ('08/27/2013 01:00:15 AM INFO', 'product1', '', '61.12000', '1'), ('08/27/2013 01:13:09 AM INFO', 'product1', '', '61.12000', '1'), ('08/28/2013 01:07:21 AM INFO', 'product1', '', '61.12000', '1'), ('08/28/2013 01:24:13 AM INFO', 'product1', '', '61.12000', '1'), ('08/29/2013 12:57:08 AM INFO', 'product1', '', '61.12000', '1'), ('08/29/2013 01:10:57 AM INFO', 'product1', '', '61.12000', '1'), ('08/30/2013 12:56:22 AM INFO', 'product1', '', '61.12000', '1'), ('08/30/2013 01:10:43 AM INFO', 'product1', '', '61.12000', '1'), ('08/31/2013 12:53:37 AM INFO', 'product1', '', '61.12000', '1'), ('08/31/2013 01:08:01 AM INFO', 'product1', '', '61.12000', '1'), ('09/01/2013 12:52:11 AM INFO', 'product1', '', '61.12000', '1'), ('09/01/2013 01:06:40 AM INFO', 'product1', '', '61.12000', '1'), ('09/02/2013 12:50:31 AM INFO', 'product1', '', '61.12000', '1'), ('09/02/2013 01:05:16 AM INFO', 'product1', '', '61.12000', '1'), ('09/03/2013 12:54:07 AM INFO', 'product1', '', '61.12000', '1'), ('09/03/2013 01:09:32 AM INFO', 'product1', '', '61.12000', '1'), ('09/04/2013 01:16:11 AM INFO', 'product1', '', '61.12000', '1'), ('09/05/2013 12:59:34 AM INFO', 'product1', '', '61.12000', '1'), ('09/06/2013 12:55:00 AM INFO', 'product1', '', '61.12000', '1'), ('09/07/2013 01:13:40 AM INFO', 'product1', '', '61.12000', '1'), ('09/09/2013 01:07:43 AM INFO', 'product1', '', '61.12000', '1')]"

textNotWork <- "[('08/10/2013 01:50:16 AM INFO', 'product1', '', '61.12000', '1'), ('08/10/2013 02:04:23 AM INFO', 'product1', '', '61.12000', '1'), ('08/11/2013 02:29:46 AM INFO', 'product1', '', '61.12000', '1'), ('08/12/2013 12:58:43 AM INFO', 'product1', '', '61.12000', '1'), ('08/12/2013 01:12:18 AM INFO', 'product1', '', '61.12000', '1'), ('08/13/2013 01:14:57 AM INFO', 'product1', '', '61.12000', '1'), ('08/10/2013 01:50:16 AM INFO', 'product1', '', '61.12000', '1'), ('08/10/2013 02:04:23 AM INFO', 'product1', '', '61.12000', '1'), ('08/11/2013 02:29:46 AM INFO', 'product1', '', '61.12000', '1'), ('08/12/2013 12:58:43 AM INFO', 'product1', '', '61.12000', '1'), ('08/12/2013 01:12:18 AM INFO', 'product1', '', '61.12000', '1'), ('08/13/2013 01:14:57 AM INFO', 'product1', '', '61.12000', '1'), ('08/10/2013 01:50:16 AM INFO', 'product1', '', '61.12000', '1'), ('08/10/2013 02:04:23 AM INFO', 'product1', '', '61.12000', '1'), ('08/11/2013 02:29:46 AM INFO', 'product1', '', '61.12000', '1'), ('08/12/2013 12:58:43 AM INFO', 'product1', '', '61.12000', '1'), ('08/12/2013 01:12:18 AM INFO', 'product1', '', '61.12000', '1'), ('08/13/2013 01:14:57 AM INFO', 'product1', '', '61.12000', '1'), ('08/10/2013 01:50:16 AM INFO', 'product1', '', '61.12000', '1'), ('08/10/2013 02:04:23 AM INFO', 'product1', '', '61.12000', '1'), ('08/11/2013 02:29:46 AM INFO', 'product1', '', '61.12000', '1'), ('08/12/2013 12:58:43 AM INFO', 'product1', '', '61.12000', '1'), ('08/12/2013 01:12:18 AM INFO', 'product1', '', '61.12000', '1'), ('08/13/2013 01:14:57 AM INFO', 'product1', '', '61.12000', '1'), ('08/14/2013 02:01:42 AM INFO', 'product1', '', '61.12000', '1'), ('08/14/2013 02:04:43 AM INFO', 'product1', '', '61.12000', '1'), ('08/15/2013 01:09:23 AM INFO', 'product1', '', '61.12000', '1'), ('08/15/2013 01:22:50 AM INFO', 'product1', '', '61.12000', '1'), ('08/16/2013 12:56:52 AM INFO', 'product1', '', '61.12000', '1'), ('08/16/2013 01:09:38 AM INFO', 'product1', '', '61.12000', '1'), ('08/17/2013 12:54:20 AM INFO', 'product1', '', '61.12000', '1'), ('08/17/2013 01:07:51 AM INFO', 'product1', '', '61.12000', '1'), ('08/18/2013 12:54:14 AM INFO', 'product1', '', '61.12000', '1'), ('08/18/2013 01:09:37 AM INFO', 'product1', '', '61.12000', '1'), ('08/19/2013 12:54:13 AM INFO', 'product1', '', '61.12000', '1'), ('08/19/2013 01:10:06 AM INFO', 'product1', '', '61.12000', '1'), ('08/20/2013 02:09:17 AM INFO', 'product1', '', '61.12000', '1'), ('08/20/2013 02:25:56 AM INFO', 'product1', '', '61.12000', '1'), ('08/21/2013 01:21:03 AM INFO', 'product1', '', '61.12000', '1'), ('08/21/2013 01:34:59 AM INFO', 'product1', '', '61.12000', '1'), ('08/22/2013 01:32:54 AM INFO', 'product1', '', '61.12000', '1'), ('08/22/2013 01:55:25 AM INFO', 'product1', '', '61.12000', '1'), ('08/23/2013 01:23:44 AM INFO', 'product1', '', '61.12000', '1'), ('08/23/2013 01:41:08 AM INFO', 'product1', '', '61.12000', '1'), ('08/24/2013 01:17:46 AM INFO', 'product1', '', '61.12000', '1'), ('08/24/2013 01:31:12 AM INFO', 'product1', '', '61.12000', '1'), ('08/25/2013 12:57:21 AM INFO', 'product1', '', '61.12000', '1'), ('08/25/2013 01:10:55 AM INFO', 'product1', '', '61.12000', '1'), ('08/26/2013 12:56:37 AM INFO', 'product1', '', '61.12000', '1'), ('08/26/2013 01:11:03 AM INFO', 'product1', '', '61.12000', '1'), ('08/27/2013 01:00:15 AM INFO', 'product1', '', '61.12000', '1'), ('08/27/2013 01:13:09 AM INFO', 'product1', '', '61.12000', '1'), ('08/28/2013 01:07:21 AM INFO', 'product1', '', '61.12000', '1'), ('08/28/2013 01:24:13 AM INFO', 'product1', '', '61.12000', '1'), ('08/29/2013 12:57:08 AM INFO', 'product1', '', '61.12000', '1'), ('08/29/2013 01:10:57 AM INFO', 'product1', '', '61.12000', '1'), ('08/30/2013 12:56:22 AM INFO', 'product1', '', '61.12000', '1'), ('08/30/2013 01:10:43 AM INFO', 'product1', '', '61.12000', '1'), ('08/31/2013 12:53:37 AM INFO', 'product1', '', '61.12000', '1'), ('08/31/2013 01:08:01 AM INFO', 'product1', '', '61.12000', '1'), ('09/01/2013 12:52:11 AM INFO', 'product1', '', '61.12000', '1'), ('09/01/2013 01:06:40 AM INFO', 'product1', '', '61.12000', '1'), ('09/02/2013 12:50:31 AM INFO', 'product1', '', '61.12000', '1'), ('09/02/2013 01:05:16 AM INFO', 'product1', '', '61.12000', '1'), ('09/03/2013 12:54:07 AM INFO', 'product1', '', '61.12000', '1'), ('09/03/2013 01:09:32 AM INFO', 'product1', '', '61.12000', '1'), ('09/04/2013 01:16:11 AM INFO', 'product1', '', '61.12000', '1'), ('09/05/2013 12:59:34 AM INFO', 'product1', '', '61.12000', '1'), ('09/06/2013 12:55:00 AM INFO', 'product1', '', '61.12000', '1'), ('09/07/2013 01:13:40 AM INFO', 'product1', '', '61.12000', '1'), ('09/09/2013 01:07:43 AM INFO', 'product1', '', '61.12000', '1')]"

enter image description here

Question(1) As you can see, this is a list of tuple in Python, and the original data(textNotWork) actually contains more tuple elements (string was longer), and I cannot read the text successfully. Anyone know what is really going on? How can I read a string that is pretty long.

Question(2) How can I turn that into a dataframe with five variables (seems like one variable is an empty string) dataframe in R so I can turn that into a time series and analyze it.

Thanks

È stato utile?

Soluzione

One idea to transform your python structures(I think that the solution given here is general for any python structure) is to save them(using python) as a json format and read them after using R. So you can do something like this:

python

textNotWork = [('08/10/2013 01:50:16 AM INFO', ...]
with open("testing.json", "w") as file:
    json.dump(textNotWork,file)

R

library(rjson)
matrix(unlist(fromJSON(file='testing.json')),
          ncol=5,byrow=TRUE)

 [1,] "08/10/2013 01:50:16 AM INFO" "product1" ""   "61.12000" "1" 
 [2,] "08/10/2013 02:04:23 AM INFO" "product1" ""   "61.12000" "1" 
 [3,] "08/11/2013 02:29:46 AM INFO" "product1" ""   "61.12000" "1" 
 [4,] "08/12/2013 12:58:43 AM INFO" "product1" ""   "61.12000" "1" 
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top