Is it possible to import CSVs into a RapidMiner repository from the command-line?

StackOverflow https://stackoverflow.com/questions/13464234

  •  30-11-2021
  •  | 
  •  

Question

I'm considering using RapidMiner to store and analyse a collection of data gathered by a scripted process. Is there a way to import a CSV file into a RapidMiner repository from a command-line script?

Was it helpful?

Solution

Not directly. But you can create a process with the 'Read CSV' operator which is connected to a 'Store' operator and store this process in the repository. This process can be called from the command-line. If the file and the repository location are static and do not change, this is everything you need to do.

But to specify the input file and the repository location dynamically you need macros. These macros can be set in the command-line, but unfortunately are only available in RapidMiner version 5.3 which is currently not released (but will be in a few weeks). In the meantime you can use the up-to-date version from the sourceforge SVN repository (Unuk branch).

Process to store CSV in the repository:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.000" expanded="true" name="Process">
    <process expanded="true" height="190" width="413">
      <operator activated="true" class="read_csv" compatibility="5.3.000" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
        <parameter key="csv_file" value="%{csv-file}"/>
        <list key="annotations"/>
        <list key="data_set_meta_data_information"/>
      </operator>
      <operator activated="true" class="store" compatibility="5.3.000" expanded="true" height="60" name="Store" width="90" x="179" y="30">
        <parameter key="repository_entry" value="%{repository-location}"/>
      </operator>
      <connect from_op="Read CSV" from_port="output" to_op="Store" to_port="input"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
    </process>
  </operator>
</process>

Assuming that you have saved this process in //home/steve/csv-to-repository and your current directory is the RapidMiner directory, this is how you can call this from command-line:

./script/rapidminer //home/steve/csv-to-repository "-Mcsv-file=/path/to/your/csv/file" "-Mrepository-location=//repository/path/to/store/csv"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top