Question

I'm no expert in regex but I need to parse some input I have no control over, and make sure I filter away any strings that don't have A-z and/or 0-9.

When I run this,

Pattern p = Pattern.compile("^[a-zA-Z0-9]*$"); //fixed typo
if(!p.matcher(gottenData).matches())
       System.out.println(someData); //someData contains gottenData

certain spaces + an unknown symbol somehow slip through the filter (gottenData is the red rectangle): screenshot

In case you're wondering, it DOES also display Text, it's not all like that.

For now, I don't mind the [?] as long as it also contains some string along with it.

Please help.

[EDIT] as far as I can tell from the (very large) input, the [?]'s are either white spaces either nothing at all; maybe there's some sort of encoding issue, also perhaps something to do with #text nodes (input is xml)

Was it helpful?

Solution

The * quantifier matches "zero or more", which means it will match a string that does not contain any of the characters in your class. Try the + quantifier, which means "One or more": ^[a-zA-Z0-9]+$ will match strings made up of alphanumeric characters only. ^.*[a-zA-Z0-9]+.*$ will match any string containing one or more alphanumeric characters, although the leading .* will make it much slower. If you use Matcher.lookingAt() instead of Matcher.matches, it will not require a full string match and you can use the regex [a-zA-Z0-9]+.

OTHER TIPS

You have an error in your regex: instead of [a-zA-z0-9]* it should be [a-zA-Z0-9]*.

You don't need ^ and $ around the regex. Matcher.matches() always matches the complete string.

String gottenData = "a ";
Pattern p = Pattern.compile("[a-zA-z0-9]*");
if (!p.matcher(gottenData).matches())
    System.out.println("doesn't match.");

this prints "doesn't match."

The correct answer is a combination of the above answers. First I imagine your intended character match is [a-zA-Z0-9]. Note that A-z isn't as bad as you might think it include all characters in the ASCII range between A and z, which is the letters plus a few extra (specifically [,\,],^,_,`).

A second potential problem as Martin mentioned is you may need to put in the start and end qualifiers, if you want the string to only consists of letters and numbers.

Finally you use the * operator which means 0 or more, therefore you can match 0 characters and matches will return true, so effectively your pattern will match any input. What you need is the + quantifier. So I will submit the pattern you are most likely looking for is:

^[a-zA-Z0-9]+$

You have to change the regexp to "^[a-zA-Z0-9]*$" to ensure that you are matching the entire string

Looks like it should be "a-zA-Z0-9", not "a-zA-z0-9", try correcting that...

Did anyone consider adding space to the regex [a-zA-Z0-9 ]*. this should match any normal text with chars, number and spaces. If you want quotes and other special chars add them to the regex too.

You can quickly test your regex at http://www.regexplanet.com/simple/

You can check input value is contained string and numbers? by using regex ^[a-zA-Z0-9]*$

if your value just contained numberString than its show match i.e, riz99, riz99z else it will show not match i.e, 99z., riz99.z, riz99.9

Example code:

if(e.target.value.match('^[a-zA-Z0-9]*$')){
            console.log('match')
          }
          else{
            console.log('not match')
          }
}

online working example

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top