Question

I am doing this in Notepad++

Here's how my data looks like

N|12345|JOHN|TAKÁCSI|blah|blah|
N|12466|PÉTER|VÁLI|blah|blah|
Y|45645|SÁNDAR|SÁKU|blah|blah|
N|89789|DÓRA|MERRY|blah|blah|


My regular expression: ^([N|Y]\|.*\|.*[^\x00-\x7F].*\|.*[^\x00-\x7F].*\|)

which is matching only the rows that have that UTF characters in both the first and lastname.
Is not showing if either name has that character.

How to get that?

Était-ce utile?

La solution

^[NY]\|\d{5}\|(?:[\w_]+[^\x00-\x7F]?[\w_]+\|){2}(?:[\w_]+[\x00-\x7F]?[\w_]+\|){2}$

matches:

N|12345|JOHN|TAKÁCSI|blah|blah|
N|12466|PÉTER|VÁLI|blah|blah|
Y|45645|SÁNDAR|SÁKU|blah|blah|
N|89789|DÓRA|MERRY|blah|blah|

does not match:

N|89789|DÓRA|MERRY|blah|blÓh|
N|89789|DoRA|MERRY|blaÓh|blah|
N|89789|DoRA|MERRY|blaÓh|blÓah|

You were checking for both to have UTF characters, I changed it to only need to match one, the other is not mandatory now. I have also used parts of @HamZa's answer below to modify this answer to suit your data set and wants.

Autres conseils

You could just use : ^[NY]\|\d+(?:\|[^\W_]+){4}\|$

Explanation:

  • ^ : match begin of line
  • [NY] : match either N or Y. You should not use [N|Y] since that will also make it match a pipe |
  • \| : match a pipe |
  • \d+ : match one digit or more
  • (?: : non capturing group
    • \| : match a pipe |
    • [^\W_]+ : We could use \w which will match alphanumeric characters, but _ will also be included. So to not match _ we just inverse it.
  • ){4} : end of group, and repeat it 4 times.
  • \| : match a pipe |
  • $ : match end of line
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top