Pregunta

I am doing this in Notepad++

Here's how my data looks like

N|12345|JOHN|TAKÁCSI|blah|blah|
N|12466|PÉTER|VÁLI|blah|blah|
Y|45645|SÁNDAR|SÁKU|blah|blah|
N|89789|DÓRA|MERRY|blah|blah|


My regular expression: ^([N|Y]\|.*\|.*[^\x00-\x7F].*\|.*[^\x00-\x7F].*\|)

which is matching only the rows that have that UTF characters in both the first and lastname.
Is not showing if either name has that character.

How to get that?

¿Fue útil?

Solución

^[NY]\|\d{5}\|(?:[\w_]+[^\x00-\x7F]?[\w_]+\|){2}(?:[\w_]+[\x00-\x7F]?[\w_]+\|){2}$

matches:

N|12345|JOHN|TAKÁCSI|blah|blah|
N|12466|PÉTER|VÁLI|blah|blah|
Y|45645|SÁNDAR|SÁKU|blah|blah|
N|89789|DÓRA|MERRY|blah|blah|

does not match:

N|89789|DÓRA|MERRY|blah|blÓh|
N|89789|DoRA|MERRY|blaÓh|blah|
N|89789|DoRA|MERRY|blaÓh|blÓah|

You were checking for both to have UTF characters, I changed it to only need to match one, the other is not mandatory now. I have also used parts of @HamZa's answer below to modify this answer to suit your data set and wants.

Otros consejos

You could just use : ^[NY]\|\d+(?:\|[^\W_]+){4}\|$

Explanation:

  • ^ : match begin of line
  • [NY] : match either N or Y. You should not use [N|Y] since that will also make it match a pipe |
  • \| : match a pipe |
  • \d+ : match one digit or more
  • (?: : non capturing group
    • \| : match a pipe |
    • [^\W_]+ : We could use \w which will match alphanumeric characters, but _ will also be included. So to not match _ we just inverse it.
  • ){4} : end of group, and repeat it 4 times.
  • \| : match a pipe |
  • $ : match end of line
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top