Question

I have a .tsv file with some fields being ranges like 1 - 4. I want to read these fields as they are textually written. However, upon file opening excel converts automatically those range fields to dates. For instance 1 - 4 is converted to 4-Jan. If I try to format back the cell to another type, the value is already changed and I can only get a useless number (39816). Even if the range fields are within double quotes, the wrong conversion to date still takes place. How to avoid this behavior?

Was it helpful?

Solution 2

My question is in fact a duplicate of at least:

1) Stop Excel from automatically converting certain text values to dates

2) Excel: Default to TEXT rather than GENERAL when opening a .csv file

The possible solutions for Excel are to 1) either writing the fields with special double quotes like "May 16, 2011" as "=""May 16, 2011""" or 2) importing the csv/tsv file with the external data wizard and then selecting manually which columns you want to read as TEXT and not GENERAL (which could convert fields to dates)

As for my use case, I was only using Excel to remove some columns. None of the solutions was appealing to me because I wouldn't like to rewrite the tsv files with special quotes and because I had hundreds of columns and I didn't want to select each manually to be read as TEXT.

Therefore I wrote a scala script to filter tsv files by column names:

package com.jmcejuela.ml

import java.io.InputStream
import java.io.Writer

import scala.io.Codec
import scala.io.Source

import Table._

/**
 * Class to represent tables with a fixed size of columns. All rows have the same columns.
 */
class Table(val rows: Seq[Row]) {
  lazy val numDiffColumns = rows.foldLeft(Set[Int]())((set, row) => set + row.size)

  def toTSV(out: Writer) {
    if (rows.isEmpty) out.write(TableEmpty.toString)
    else {
      out.write(writeLineTSV(rows.head.map(_.name))) //header
      rows.foreach(r => out.write(writeLineTSV(r.map(_.value))))
      out.close
    }
  }

  /**
   * Get a Table with only the given columns.
   */
  def filterColumnsByName(columnNames: Set[String]): Table = {
    val existingNames = rows.head.map(_.name).toSet
    assert(columnNames.forall(n => existingNames.contains(n)), "You want to include column names that do not exist")
    new Table(rows.map { row => row.filter(col => columnNames.contains(col.name)) })
  }

}

object TableEmpty extends Table(Seq.empty) {
  override def toString = "Table(Empty)"
}

object Table {
  def apply(rows: Row*) = new Table(rows)

  type Row = Array[Column]

  /**
   * Column representation. Note that each column has a name and a value. Since the class Table
   * is a sequence of rows which are a size-fixed array of columns, the name field is redundant
   * for Table. However, this column representation could be used in the future to support
   * schemata-less tables.
   */
  case class Column(name: String, value: String)

  private def parseLineTSV(line: String) = line.split("\t")
  private def writeLineTSV(line: Seq[String]) = line.mkString("", "\t", "\n")

  /**
   * It is assumed that the first row gives the names to the columns
   */
  def fromTSV(in: InputStream)(implicit encoding: Codec = Codec.UTF8): Table = {
    val linesIt = Source.fromInputStream(in).getLines
    if (linesIt.isEmpty) TableEmpty
    else {
      val columnNames = parseLineTSV(linesIt.next)
      val padding = {
        //add padding of empty columns-fields to lines that do not include last fields because they are empty
        def infinite[A](x: A): Stream[A] = x #:: infinite(x)
        infinite("")
      }
      val rows = linesIt.map { line =>
        ((0 until columnNames.size).zip(parseLineTSV(line) ++: padding).map { case (index, field) => Column(columnNames(index), field) }).toArray
      }.toStream
      new Table(rows)
    }
  }
}

OTHER TIPS

I think you best use the import facility in excel but you may have to manually change the file extension to a csv.

When importing be sure to select text for all the columns with these values.

Write 01-04 instead of 1-4 in excel..

I had a "text" formatted cell in excel being populated with a chemical casn with the value "8013-07-8" that was being reformatted into a date format. To remedy the problem, I concatenated a single quote to the beginning of the value and it rendered correctly when viewing the results. When you click on the cell, you see the prefixed single-quote, but at least I stopped seeing it as a date.

In my case, When I typed 5-14 in my D2 excel cell, is coverts to date 14 May. With a help from somebody , I was able to change the date format to the number range (5-14) using the following approach and wanted to share it with you. (I will use my case an example).

  1. Using cell format in excel, I converted the date format in D2 (14 May) to number first ( in my case it gave me 43599).
  2. then used the formula below ,in excel, to convert it 5-14. =IF (EXACT (D2, 43599), "5-14", D2).
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top