Question

Say I have a csv file with following data format:

ID, Name, Gender, Q1
1, ABC, Male, "A1;A2"
2, ACB, Male, "A2;A3;A4"
3, BAC, Female, "A1"

I would like to transform it into following format so that my data virtualization tool can process it properly:

ID, Name, Gender, Questions, Responses
1, ABC, Male, Q1, A1
1, ABC, Male, Q1, A2
2, ACB, Male, Q1, A2
2, ACB, Male, Q1, A3
2, ACB, Male, Q1, A4
3, BAC, Female, Q1, A1

Using Text to Columns feature in LibreOffice I can easily separate Q1 column A1;A2 into different columns like A1, A2, but I am stuck at transposing and repeating rows.

Additional Info:

  • Data is collected via Google Form, unfortunately google spreadsheets store multiple choice question responses in one cell using semicolon-separator like A1;A2;A3..., while my visualization tool cannot see this underlying data structure, only treat them as a single string, making aggregation/grouping difficult.

  • In the actual data (survey results) I have around 5000 entries, each with multiple cells that require such processing, which will result in a table of around 100,000 entries. A way to automate the transformation is needed.

  • The tool I use to analyze/visualize data is "Tableau Public", they have a data reshaper plugin for Excel that semi-automate such tasks (see section Make sure each row contains only one piece of data), but no LibreOffice alternative.

Was it helpful?

Solution

You can use JavaScript on Google Spreadsheet to transform the data before exporting to other applications. Here is a quick-and-dirty script I just wrote for your sample data:

function transformRows() {
  var sheet = SpreadsheetApp.getActiveSheet();
  var rows = sheet.getDataRange();
  var numRows = rows.getNumRows();
  var values = rows.getValues();

  var newSheet = SpreadsheetApp.getActiveSpreadsheet().insertSheet("Result");
  var header = values[0].slice(0, values[0].length - 1);

  header.push("Question");
  header.push("Answer");
  newSheet.appendRow(header);

  var question = values[0][values[0].length - 1];

  // Note: Code below is inefficient and may exceed 6-minute timeout for sheets with 
  //       more than 1k rows. Change it to batch updating to speed up. 
  // Ref: https://developers.google.com/apps-script/reference/spreadsheet/range#setValues%28Object%29
  for (var i = 1; i <= numRows - 1; i++) {
    var row = values[i];
    var answers = row[row.length - 1].split(";");
    for (var ansi = 0; ansi < answers.length; ansi++) {
      var newRow = row.slice(0, row.length - 1);
      newRow.push(question);
      newRow.push(answers[ansi]);
      newSheet.appendRow(newRow);
    }
  }
};

To use it:

  1. Open script editor in your opened sheet (Tools -> Script editor...)
  2. Create a empty project for spreadsheet
  3. Paste the code into the editor
  4. Save, and run it (Run -> transformRows)
  5. Return to the spreadsheet, a new sheet will be created and filled with transformed data.

OTHER TIPS

I made a more general purpose version of @SAPikachu's answer. It can convert any number of data columns, assuming that all the data columns are to the right of all the non-data columns. (Not the clearest terminology...)

function onOpen() {
  var ss = SpreadsheetApp.getActive();
  var items = [
    {name: 'Normalize Crosstab', functionName: 'normalizeCrosstab'},
  ];
  ss.addMenu('Normalize', items);
}

/* Converts crosstab format to normalized form. Given columns abcDE, the user puts the cursor somewhere in column D.
The result is a new sheet, NormalizedResult, like this:

a     b     c    Field Value
a1    b1    c1   D     D1
a1    b1    c1   E     E1
a2    b2    c2   D     D2
a2    b2    c2   E     E2
...

*/
function normalizeCrosstab() {
  var sheet = SpreadsheetApp.getActiveSheet(); 
  var rows = sheet.getDataRange();
  var numRows = rows.getNumRows();
  var values = rows.getValues();
  var firstDataCol = SpreadsheetApp.getActiveRange().getColumn();
  var dataCols = values[0].slice(firstDataCol-1);

  if (Browser.msgBox("This will create a new sheet, NormalizedResult. Place your cursor is in the first data column.\\n\\n" +
                     "These will be your data columns: " + dataCols,Browser.Buttons.OK_CANCEL) == "cancel") {
    return;
  }


  var resultssheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("NormalizedResult");
  if (resultssheet != null) {
    SpreadsheetApp.getActive().deleteSheet(resultssheet);
  }
  var newSheet = SpreadsheetApp.getActiveSpreadsheet().insertSheet("NormalizedResult");
  var header = values[0].slice(0, firstDataCol - 1);

  var newRows = [];

  header.push("Field");
  header.push("Value");
  newRows.push(header);

  for (var i = 1; i <= numRows - 1; i++) {
    var row = values[i];
    for (var datacol = 0; datacol < dataCols.length; datacol ++) {
      newRow = row.slice(0, firstDataCol - 1); // copy repeating portion of each row
      newRow.push(values[0][firstDataCol - 1 + datacol]); // field name
      newRow.push(values[i][firstDataCol - 1 + datacol]); // field value
      //newSheet.appendRow(newRow);
      newRows.push(newRow);
    }
  }
  var r = newSheet.getRange(1,1,newRows.length, header.length);
  r.setValues(newRows);
};
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top