Pergunta

I am doing this in Notepad++

Here's how my data looks like

N|12345|JOHN|TAKÁCSI|blah|blah|
N|12466|PÉTER|VÁLI|blah|blah|
Y|45645|SÁNDAR|SÁKU|blah|blah|
N|89789|DÓRA|MERRY|blah|blah|


My regular expression: ^([N|Y]\|.*\|.*[^\x00-\x7F].*\|.*[^\x00-\x7F].*\|)

which is matching only the rows that have that UTF characters in both the first and lastname.
Is not showing if either name has that character.

How to get that?

Foi útil?

Solução

^[NY]\|\d{5}\|(?:[\w_]+[^\x00-\x7F]?[\w_]+\|){2}(?:[\w_]+[\x00-\x7F]?[\w_]+\|){2}$

matches:

N|12345|JOHN|TAKÁCSI|blah|blah|
N|12466|PÉTER|VÁLI|blah|blah|
Y|45645|SÁNDAR|SÁKU|blah|blah|
N|89789|DÓRA|MERRY|blah|blah|

does not match:

N|89789|DÓRA|MERRY|blah|blÓh|
N|89789|DoRA|MERRY|blaÓh|blah|
N|89789|DoRA|MERRY|blaÓh|blÓah|

You were checking for both to have UTF characters, I changed it to only need to match one, the other is not mandatory now. I have also used parts of @HamZa's answer below to modify this answer to suit your data set and wants.

Outras dicas

You could just use : ^[NY]\|\d+(?:\|[^\W_]+){4}\|$

Explanation:

  • ^ : match begin of line
  • [NY] : match either N or Y. You should not use [N|Y] since that will also make it match a pipe |
  • \| : match a pipe |
  • \d+ : match one digit or more
  • (?: : non capturing group
    • \| : match a pipe |
    • [^\W_]+ : We could use \w which will match alphanumeric characters, but _ will also be included. So to not match _ we just inverse it.
  • ){4} : end of group, and repeat it 4 times.
  • \| : match a pipe |
  • $ : match end of line
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top