Domanda

I am processing a CSV file and want to search and replace strings as long as it is an exact match in the column. For example:

xxx,Apple,Green Apple,xxx,xxx
Apple,xxx,xxx,Apple,xxx
xxx,xxx,Fruit/Apple,xxx,Apple

I want to replace 'Apple' if it is the EXACT value in the column (if it is contained in text within another column, I do not want to replace). I cannot see how to do this with a single expression (maybe not possible?).

The desired output is:

xxx,GRAPE,Green Apple,xxx,xxx
GRAPE,xxx,xxx,GRAPE,xxx
xxx,xxx,Fruit/Apple,xxx,GRAPE

So the expression I want is: match the beginning of input OR a comma, followed by desired string, followed by a comma OR the end of input.

You cannot put ^ or $ in character classes, so I tried \A and \Z but that didn't work.

([\A,])Apple([\Z,])

This didn't work, sadly. Can I do this with one regular expression? Seems like this would be a common enough problem.

È stato utile?

Soluzione

It will depend on your language, but if the one you use supports lookarounds, then you would use something like this:

(?<=,|^)Apple(?=,|$)

Replace with GRAPE.

Otherwise, you will have to put back the commas:

(^|,)Apple(,|$)

Or

(\A|,)Apple(,|\Z)

And replace with:

\1GRAPE\2

Or

$1GRAPE$2

Depending on what's supported.

The above are raw regex (and replacement) strings. Escape as necessary.

Note: The disadvatage with the latter solution is that it will not work on strings like:

xxx,Apple,Apple,xxx,xxx

Since the comma after the first Apple got consumed. You'd have to call the regex replacement at most twice if you have such cases.


Oh, and I forgot to mention, you can have some 'hybrids' since some language have different levels of support for lookbehinds (in all the below ^ and \A, $ and \Z, \1 and $1 are interchangeable, just so I don't make it longer than it already is):

(?:(?<=,)|(?<=^))Apple(?=,|$)

For those where lookbehinds cannot be of variable width, replace with GRAPE.

(^|,)Apple(?=,|$)

And the above one for where lookaheads are supported but not lookbehinds. Replace with \1Apple.

Altri suggerimenti

This does as you wish:

  • Find what: (^|,)(?:Apple)(,|$)
  • Replace with: $1GRAPE$2

This works on regex101, in all flavors.

http://regex101.com/r/iP6dZ8

I wanted to share my original work-around (before the other answers), though it feels like more of a hack.

I simply prepend and append a comma on the string before doing the simpler:

/,Apple,/,GRAPE,/g

then cut off the first and last character.

PHP looks like:

$line = substr(preg_replace($search, $replace, ','.$line.','), 1, -1);

This still suffers from the problem of consecutive columns (e.g. ",Apple,Apple,").

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top