Batch: Convert .csv to tab-delimited text, only some fields are quoted, contain commas between quotes (eBay order file)

StackOverflow https://stackoverflow.com/questions/20663822

  •  19-09-2022
  •  | 
  •  

Question

I'm trying to convert the eBay File Exchange download into a tab-delimited format my shipping software can read.

If each and every column were quoted, this would be easy--but they're not. Only some columns (name, item listing title, etc) are quoted, and some quoted columns contain commas. The rest are bare of quotes.

I need a way to parse and convert this in a .bat file, but using comma as a delimiter splits the quoted fields if they contain a comma too, giving me unusable data. I'm certain there's a simple fix for this, I just can't figure it out.

Was it helpful?

Solution

Eric J is correct - solving this kind of problem with batch is not simple. But it is possible :-)

The main problem is how to differentiate between quoted and unquoted commas - jeb solved a similar problem with quoted vs. unquoted semicolons at 'Pretty print' windows %PATH% variable - how to split on ';' in CMD shell. The code below looks very different, but the fundamental concept is the same.

The code below should work for pretty much any CSV as long as all lines are less than ~8000 bytes long. Batch variable values are limited to 8191 bytes, and some characters are temporarily expanded to two bytes.

The code assumes there are not any existing TABs within the CSV file.

It does not modify any existing quotes.

As I say, the code should work, but it will be painfully SLOW if you have a large file. You would be much better off with a .NET solution as Eric J suggested.

@echo off
setlocal disableDelayedExpansion

set "file=optionalPathinfo\yourFile.csv"

:: Define a TAB variable
for /f "delims=" %%A in (
  'forfiles /p "%~dp0." /m "%~nx0" /c "cmd /c echo(0x09"'
) do set "TAB=%%A"


:: Read each line from CSV, convert it, and write to new file with .new extension
>"%file%.new" (
  for /f usebackq^ delims^=^ eol^= %%A in ("%file%") do (
    set "line=%%A"
    call :processLine
  )
)
exit /b


:processLine
setlocal enableDelayedExpansion

:: Protect problem characters
set "line=!line:@=@A!"
set "line=!line:^=@K!"
set "line=!line:&=@M!"
set "line=!line:|=@P!"
set "line=!line:<=@L!"
set "line=!line:>=@G!"

:: Mark commas with leading caret (escape)
set "line=!line:,=^,!"

:: Remove mark from unquoted commas, but first temporarily
:: disable delayed expansion to protect any ! characters
setlocal disableDelayedExpansion
set ^"line=%line%"
setlocal enableDelayedExpansion

:: Protect remaining marked commas
set "line=!line:^,=@C!"

:: Convert remaining commas to TAB
set "line=!line:,=%TAB%!"

:: Restore protected characters
set "line=!line:@C=,!"
set "line=!line:@G=>!"
set "line=!line:@L=<!"
set "line=!line:@P=|!"
set "line=!line:@M=&!"
set "line=!line:@K=^!"
set "line=!line:@A=@!"

:: Write modified line
echo(!line!
exit /b

OTHER TIPS

There's a further complication: A field with a quote and a comma will also have the quote escaped:

Jim "Smitty" Smith, Jr.

would be represented in the CSV file as

"Jim ""Smitty"" Smith, Jr."

This is not the kind of problem that is easily solved in a batch file. However, there is preexisting functionality to deal with the CSV format that can be used from any .NET compatible language including Powershell. If that is an option, have a look at

http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader

For information on calling the .NET methods to read CSV files from Powershell, have a look at

http://blogs.msdn.com/b/mattbie/archive/2010/02/23/how-to-call-net-and-win32-methods-from-powershell-and-your-troubleshooting-packs.aspx

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top