I think you've just discovered a bug in CSV ruby module. From csv.rb :
1587: @re_chars = /#{%"[-][\\.^$?*+{}()|# \r\n\t\f\v]".encode(@encoding)}/
This Regexp is used to escape characters conflicting with special regular expression symbols, including your "pipe" char |
.
I don't see any reason for the prepending [-]
, so if you do remove it, your example starts to work:
edit: the hyphen has to be escaped inside character set expression (surrounded with brackets []
) only when not as the leading character. So had to update the fixed Regexp:
1587: @re_chars = /#{%"(?<!\\[)-(?=.*\\])|[\\.^$?*+{}()|# \r\n\t\f\v]".encode(@encoding)}/
CSV.read('sample.csv', {quote_char: '|'})
# [["076N102 ",
# "CARD ",
# " 1", "NEW", "PCS "],
# ["07-1801 ",
# "BASE ",
# " 18", "NEW", "PCS "]]
As most languages does not support lookbehind expressions with quantifiers, Ruby included, I had to write it as a negative version for the left bracket. It would also match hyphens with missing left one of a bracket pair. If you'd find a better solution, leave a comment pls.
Glad to hear any comments before fill in a bug report to ruby-lang.org .