Question

Hi I have a string generated by Python and I need to read into R to analyze it.

The only difference between the two strings below is the length(number of elements inside list). And R cannot read the longer one successfully.

textWork <- "[('08/10/2013 01:50:16 AM INFO', 'product1', '', '61.12000', '1'), ('08/10/2013 02:04:23 AM INFO', 'product1', '', '61.12000', '1'), ('08/11/2013 02:29:46 AM INFO', 'product1', '', '61.12000', '1'), ('08/12/2013 12:58:43 AM INFO', 'product1', '', '61.12000', '1'), ('08/12/2013 01:12:18 AM INFO', 'product1', '', '61.12000', '1'), ('08/13/2013 01:14:57 AM INFO', 'product1', '', '61.12000', '1'), ('08/14/2013 02:01:42 AM INFO', 'product1', '', '61.12000', '1'), ('08/14/2013 02:04:43 AM INFO', 'product1', '', '61.12000', '1'), ('08/15/2013 01:09:23 AM INFO', 'product1', '', '61.12000', '1'), ('08/15/2013 01:22:50 AM INFO', 'product1', '', '61.12000', '1'), ('08/16/2013 12:56:52 AM INFO', 'product1', '', '61.12000', '1'), ('08/16/2013 01:09:38 AM INFO', 'product1', '', '61.12000', '1'), ('08/17/2013 12:54:20 AM INFO', 'product1', '', '61.12000', '1'), ('08/17/2013 01:07:51 AM INFO', 'product1', '', '61.12000', '1'), ('08/18/2013 12:54:14 AM INFO', 'product1', '', '61.12000', '1'), ('08/18/2013 01:09:37 AM INFO', 'product1', '', '61.12000', '1'), ('08/19/2013 12:54:13 AM INFO', 'product1', '', '61.12000', '1'), ('08/19/2013 01:10:06 AM INFO', 'product1', '', '61.12000', '1'), ('08/20/2013 02:09:17 AM INFO', 'product1', '', '61.12000', '1'), ('08/20/2013 02:25:56 AM INFO', 'product1', '', '61.12000', '1'), ('08/21/2013 01:21:03 AM INFO', 'product1', '', '61.12000', '1'), ('08/21/2013 01:34:59 AM INFO', 'product1', '', '61.12000', '1'), ('08/22/2013 01:32:54 AM INFO', 'product1', '', '61.12000', '1'), ('08/22/2013 01:55:25 AM INFO', 'product1', '', '61.12000', '1'), ('08/23/2013 01:23:44 AM INFO', 'product1', '', '61.12000', '1'), ('08/23/2013 01:41:08 AM INFO', 'product1', '', '61.12000', '1'), ('08/24/2013 01:17:46 AM INFO', 'product1', '', '61.12000', '1'), ('08/24/2013 01:31:12 AM INFO', 'product1', '', '61.12000', '1'), ('08/25/2013 12:57:21 AM INFO', 'product1', '', '61.12000', '1'), ('08/25/2013 01:10:55 AM INFO', 'product1', '', '61.12000', '1'), ('08/26/2013 12:56:37 AM INFO', 'product1', '', '61.12000', '1'), ('08/26/2013 01:11:03 AM INFO', 'product1', '', '61.12000', '1'), ('08/27/2013 01:00:15 AM INFO', 'product1', '', '61.12000', '1'), ('08/27/2013 01:13:09 AM INFO', 'product1', '', '61.12000', '1'), ('08/28/2013 01:07:21 AM INFO', 'product1', '', '61.12000', '1'), ('08/28/2013 01:24:13 AM INFO', 'product1', '', '61.12000', '1'), ('08/29/2013 12:57:08 AM INFO', 'product1', '', '61.12000', '1'), ('08/29/2013 01:10:57 AM INFO', 'product1', '', '61.12000', '1'), ('08/30/2013 12:56:22 AM INFO', 'product1', '', '61.12000', '1'), ('08/30/2013 01:10:43 AM INFO', 'product1', '', '61.12000', '1'), ('08/31/2013 12:53:37 AM INFO', 'product1', '', '61.12000', '1'), ('08/31/2013 01:08:01 AM INFO', 'product1', '', '61.12000', '1'), ('09/01/2013 12:52:11 AM INFO', 'product1', '', '61.12000', '1'), ('09/01/2013 01:06:40 AM INFO', 'product1', '', '61.12000', '1'), ('09/02/2013 12:50:31 AM INFO', 'product1', '', '61.12000', '1'), ('09/02/2013 01:05:16 AM INFO', 'product1', '', '61.12000', '1'), ('09/03/2013 12:54:07 AM INFO', 'product1', '', '61.12000', '1'), ('09/03/2013 01:09:32 AM INFO', 'product1', '', '61.12000', '1'), ('09/04/2013 01:16:11 AM INFO', 'product1', '', '61.12000', '1'), ('09/05/2013 12:59:34 AM INFO', 'product1', '', '61.12000', '1'), ('09/06/2013 12:55:00 AM INFO', 'product1', '', '61.12000', '1'), ('09/07/2013 01:13:40 AM INFO', 'product1', '', '61.12000', '1'), ('09/09/2013 01:07:43 AM INFO', 'product1', '', '61.12000', '1')]"

textNotWork <- "[('08/10/2013 01:50:16 AM INFO', 'product1', '', '61.12000', '1'), ('08/10/2013 02:04:23 AM INFO', 'product1', '', '61.12000', '1'), ('08/11/2013 02:29:46 AM INFO', 'product1', '', '61.12000', '1'), ('08/12/2013 12:58:43 AM INFO', 'product1', '', '61.12000', '1'), ('08/12/2013 01:12:18 AM INFO', 'product1', '', '61.12000', '1'), ('08/13/2013 01:14:57 AM INFO', 'product1', '', '61.12000', '1'), ('08/10/2013 01:50:16 AM INFO', 'product1', '', '61.12000', '1'), ('08/10/2013 02:04:23 AM INFO', 'product1', '', '61.12000', '1'), ('08/11/2013 02:29:46 AM INFO', 'product1', '', '61.12000', '1'), ('08/12/2013 12:58:43 AM INFO', 'product1', '', '61.12000', '1'), ('08/12/2013 01:12:18 AM INFO', 'product1', '', '61.12000', '1'), ('08/13/2013 01:14:57 AM INFO', 'product1', '', '61.12000', '1'), ('08/10/2013 01:50:16 AM INFO', 'product1', '', '61.12000', '1'), ('08/10/2013 02:04:23 AM INFO', 'product1', '', '61.12000', '1'), ('08/11/2013 02:29:46 AM INFO', 'product1', '', '61.12000', '1'), ('08/12/2013 12:58:43 AM INFO', 'product1', '', '61.12000', '1'), ('08/12/2013 01:12:18 AM INFO', 'product1', '', '61.12000', '1'), ('08/13/2013 01:14:57 AM INFO', 'product1', '', '61.12000', '1'), ('08/10/2013 01:50:16 AM INFO', 'product1', '', '61.12000', '1'), ('08/10/2013 02:04:23 AM INFO', 'product1', '', '61.12000', '1'), ('08/11/2013 02:29:46 AM INFO', 'product1', '', '61.12000', '1'), ('08/12/2013 12:58:43 AM INFO', 'product1', '', '61.12000', '1'), ('08/12/2013 01:12:18 AM INFO', 'product1', '', '61.12000', '1'), ('08/13/2013 01:14:57 AM INFO', 'product1', '', '61.12000', '1'), ('08/14/2013 02:01:42 AM INFO', 'product1', '', '61.12000', '1'), ('08/14/2013 02:04:43 AM INFO', 'product1', '', '61.12000', '1'), ('08/15/2013 01:09:23 AM INFO', 'product1', '', '61.12000', '1'), ('08/15/2013 01:22:50 AM INFO', 'product1', '', '61.12000', '1'), ('08/16/2013 12:56:52 AM INFO', 'product1', '', '61.12000', '1'), ('08/16/2013 01:09:38 AM INFO', 'product1', '', '61.12000', '1'), ('08/17/2013 12:54:20 AM INFO', 'product1', '', '61.12000', '1'), ('08/17/2013 01:07:51 AM INFO', 'product1', '', '61.12000', '1'), ('08/18/2013 12:54:14 AM INFO', 'product1', '', '61.12000', '1'), ('08/18/2013 01:09:37 AM INFO', 'product1', '', '61.12000', '1'), ('08/19/2013 12:54:13 AM INFO', 'product1', '', '61.12000', '1'), ('08/19/2013 01:10:06 AM INFO', 'product1', '', '61.12000', '1'), ('08/20/2013 02:09:17 AM INFO', 'product1', '', '61.12000', '1'), ('08/20/2013 02:25:56 AM INFO', 'product1', '', '61.12000', '1'), ('08/21/2013 01:21:03 AM INFO', 'product1', '', '61.12000', '1'), ('08/21/2013 01:34:59 AM INFO', 'product1', '', '61.12000', '1'), ('08/22/2013 01:32:54 AM INFO', 'product1', '', '61.12000', '1'), ('08/22/2013 01:55:25 AM INFO', 'product1', '', '61.12000', '1'), ('08/23/2013 01:23:44 AM INFO', 'product1', '', '61.12000', '1'), ('08/23/2013 01:41:08 AM INFO', 'product1', '', '61.12000', '1'), ('08/24/2013 01:17:46 AM INFO', 'product1', '', '61.12000', '1'), ('08/24/2013 01:31:12 AM INFO', 'product1', '', '61.12000', '1'), ('08/25/2013 12:57:21 AM INFO', 'product1', '', '61.12000', '1'), ('08/25/2013 01:10:55 AM INFO', 'product1', '', '61.12000', '1'), ('08/26/2013 12:56:37 AM INFO', 'product1', '', '61.12000', '1'), ('08/26/2013 01:11:03 AM INFO', 'product1', '', '61.12000', '1'), ('08/27/2013 01:00:15 AM INFO', 'product1', '', '61.12000', '1'), ('08/27/2013 01:13:09 AM INFO', 'product1', '', '61.12000', '1'), ('08/28/2013 01:07:21 AM INFO', 'product1', '', '61.12000', '1'), ('08/28/2013 01:24:13 AM INFO', 'product1', '', '61.12000', '1'), ('08/29/2013 12:57:08 AM INFO', 'product1', '', '61.12000', '1'), ('08/29/2013 01:10:57 AM INFO', 'product1', '', '61.12000', '1'), ('08/30/2013 12:56:22 AM INFO', 'product1', '', '61.12000', '1'), ('08/30/2013 01:10:43 AM INFO', 'product1', '', '61.12000', '1'), ('08/31/2013 12:53:37 AM INFO', 'product1', '', '61.12000', '1'), ('08/31/2013 01:08:01 AM INFO', 'product1', '', '61.12000', '1'), ('09/01/2013 12:52:11 AM INFO', 'product1', '', '61.12000', '1'), ('09/01/2013 01:06:40 AM INFO', 'product1', '', '61.12000', '1'), ('09/02/2013 12:50:31 AM INFO', 'product1', '', '61.12000', '1'), ('09/02/2013 01:05:16 AM INFO', 'product1', '', '61.12000', '1'), ('09/03/2013 12:54:07 AM INFO', 'product1', '', '61.12000', '1'), ('09/03/2013 01:09:32 AM INFO', 'product1', '', '61.12000', '1'), ('09/04/2013 01:16:11 AM INFO', 'product1', '', '61.12000', '1'), ('09/05/2013 12:59:34 AM INFO', 'product1', '', '61.12000', '1'), ('09/06/2013 12:55:00 AM INFO', 'product1', '', '61.12000', '1'), ('09/07/2013 01:13:40 AM INFO', 'product1', '', '61.12000', '1'), ('09/09/2013 01:07:43 AM INFO', 'product1', '', '61.12000', '1')]"

enter image description here

Question(1) As you can see, this is a list of tuple in Python, and the original data(textNotWork) actually contains more tuple elements (string was longer), and I cannot read the text successfully. Anyone know what is really going on? How can I read a string that is pretty long.

Question(2) How can I turn that into a dataframe with five variables (seems like one variable is an empty string) dataframe in R so I can turn that into a time series and analyze it.

Thanks

Was it helpful?

Solution

One idea to transform your python structures(I think that the solution given here is general for any python structure) is to save them(using python) as a json format and read them after using R. So you can do something like this:

python

textNotWork = [('08/10/2013 01:50:16 AM INFO', ...]
with open("testing.json", "w") as file:
    json.dump(textNotWork,file)

R

library(rjson)
matrix(unlist(fromJSON(file='testing.json')),
          ncol=5,byrow=TRUE)

 [1,] "08/10/2013 01:50:16 AM INFO" "product1" ""   "61.12000" "1" 
 [2,] "08/10/2013 02:04:23 AM INFO" "product1" ""   "61.12000" "1" 
 [3,] "08/11/2013 02:29:46 AM INFO" "product1" ""   "61.12000" "1" 
 [4,] "08/12/2013 12:58:43 AM INFO" "product1" ""   "61.12000" "1" 
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top