Pregunta

I want to select specific fields of a table in cassandra and insert them into another table. I do this in sql server like this:

INSERT INTO Users(name,family)
SELECT name,family FROM Users

How to to this in cassandra-cli or cqlsh?

¿Fue útil?

Solución

COPY keyspace.columnfamily1 (column1, column2,...) TO 'temp.csv';
COPY keyspace.columnfamily2 (column1, column2,...) FROM 'temp.csv';

here give your keyspace(schema-name) and instead of columnfamilyname1 use the table to which you want to copy and in columnfamily2 give the tablename in which you want to copy..

And yes this is solution for CQL,however I have never tried in with CLI.

Otros consejos

For tables that are not very large, save yourself the file and use an anonymous pipe:

cqlsh -e "COPY keyspace.src_table (col1, col2, ...,ColN ) TO STDOUT WITH HEADER=false" | cqlsh -e "COPY keyspace.target_table (col1, col2, ...,ColN ) FROM STDIN" 

For very large datasets, this won't work. A strategy of per token range should be explored

For very large tables, CQLSH will have a hard time handling the COPY TO/FROM. Here is how to sweep the table by local token ranges and copy desired columns from one table to another. Because it uses the local copy, this is to be executed on every node of the Datacenter:

    #!/bin/bash
    #
    # Script to COPY from a SOURCE table's select columns over a TARGET table
    # Assumes the following:
    #
    # * The SOURCE table is very large - otherwise just try:
    #   cqlsh  -e "COPY keyspace.src_table (col1, col2, ...,ColN ) TO STDOUT WITH HEADER=false" \
    #   |cqlsh -e "COPY keyspace.tgt_table (col1, col2, ...,ColN ) FROM STDIN"
    # * SOURCE AND TARGET TABLES are in the SAME KEYSPACE
    # * TARGET columns are named the SAME as SOURCE
    #
    # The script sweeps thru the local tokens to copy only the local data over to the new table
    # Therefore, this script needs to run on every node on the datacenter
    #
    # Set these variables before executing
    #
    USR=cassandra
    PWD=password
    KSP=my_keyspace
    SRC=src_table
    COL="col1, col2, col3"
    PKY="col1"
    TGT=tgt_table

    CQLSH="cqlsh -u ${USR} -p ${PWD} -k ${KSP}"

    function getTokens(){
       for i in $($CQLSH -e "select tokens from system.local;" | awk -F, '/{/{print $0}' | tr -d '{' | tr -d '}' | tr -d ','); do
           echo ${i//\'/}
       done | sort -n
    }

    function getDataByTokenRange(){
       i=0
       tokens=($(getTokens))
       while [ ${i} -lt ${#tokens[@]} ]; do 
             [ ${i} -eq 0 ]         && echo "SELECT ${COL} FROM ${SRC} WHERE token(${PKY}) <= ${tokens[i]};" 
             [ "${tokens[i+1]}" ]   && echo "SELECT ${COL} FROM ${SRC} WHERE token(${PKY}) >  ${tokens[i]} AND token(${PKY}) <= ${tokens[i+1]};"
             [ ! "${tokens[i+1]}" ] && echo "SELECT ${COL} FROM ${SRC} WHERE token(${PKY}) >  ${tokens[i]};"
             ((i++))
       done
    } 

function cqlExec(){ 
  while IFS='' read -r cql || [[ -n "$line" ]]; do
    $CQLSH -e "CONSISTENCY LOCAL_ONE; $cql"                                                                \
    |awk -F\| '( !/LOCAL_ONE/ && !/'"${COL/, /|}"'/ && !/^\-+/ && !/^\([0-9]+ rows)/ && !/^$/ ){print $0}' \
    |sed -e 's/^[ ]*//g' -e 's/[ ]*|[ ]*/|/g'                                                              \
    |$CQLSH -e "COPY ${TGT} (${COL}) FROM STDIN WITH DELIMITER = '|' and HEADER=false;"
    [ "$?" -gt 0 ] && echo "ERROR: Failed to import data from command: ${command}"
  done < "$1"
}

    main(){
       echo "Begin processing ..."
       getDataByTokenRange > getDataByTokenRange.ddl
       cqlExec getDataByTokenRange.ddl
       echo "End procesing"
    }

    main

If you (like me) have data that has complex types or object types it's getting tricky to copy data with cqlsh COPY. I solve my problems with DataGrip. you can export data as insert statements, later you can load .cql script directly into cqlsh.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top