Question

I am trying to load a big csv file(about 18G) into rapidminer for building a classification model. The “import configuration wizard” seems has difficulty in loading the data. Therefore, I choose to use the “Edit parameter list: data set meta data information” to set up the attribute and label information. However, the UI-interface only allows me to setup those information column-by-column. My csv file has about 80000 columns. How should I handle this kind of scenario? Thanks.

Was it helpful?

Solution

I haven't tried it yet myself, but you should be able to load the CSV into a MySQL database. You can then use the stream database operator to avoid size limitations. Here is the description from RapidMiner:

In contrast to the Read Database operator, which loads the data into the main memory, the Stream Database operator keeps the data in the database and performs the data reading in batches. This allows RapidMiner to access data sets of arbitrary sizes without any size restrictions.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top