How to remove unknown formatted strings from string array? [closed]
-
14-07-2021 - |
문제
I'm trying to remove strings with unrecognized characters from string collection. What is the best way to accomplish this?
해결책
To remove strings that contain any characters you don't recognize: (EG: if you want to accept lowercase letters, then "foo@bar" would be rejected")
- Create a regular expression which defines the set of "recognized" characters, and starts with ^ and ends with $. For example, if your "recognized" characters are uppercase A through Z, it would be
^[A-Z]$
- Reject strings that don't match
Note: This won't work for strings that contain newlines, but you can tweak it if you need to support that
To remove strings that contain entirely characters you don't recognize: (EG: If you want to accept lowercase letters, then "foo@bar" would be accepted because it does contain at least one lowercase letter)
- Create a regular expression which defines the set of "recognized" characters, but with a
^
character inside the square brackets, and starts with ^ and ends with $. For example, if your "recognized" characters are uppercase A through Z, it would be^[^A-Z]$
- Reject strings that DO match
다른 팁
Since Array (assuming string[]
) is not re-sized when removing items you will need to create new one anyway. So basic LINQ filtering with ToArray()
will give you new array.
myArray = myArray.Where(s => !ContainsSpecialCharacters(s)).ToArray();
I would look at Linq's where method, along with a regular expression containing the characters you're looking for. In pseudocode:
return myStringCollection.Where(!s matches regex)
this does what you seem to want.
List<string> strings = new List<string>()
{
"one",
"two`",
"thr^ee",
"four"
};
List<char> invalid_chars = new List<char>()
{
'`', '-', '^'
};
strings.RemoveAll(s => s.Any(c => invalid_chars.Contains(c)));
strings.ForEach(s => Console.WriteLine(s));
generates output:
one
four
This question has some similar answers to what I think you are looking for. However, I think you want to include all letters, numbers, whitespace and punctuation, but exclude everything else. Is that accurate? If so, this should do it for you:
char[] arr = str.ToCharArray();
arr = Array.FindAll<char>(arr, (c => (char.IsLetterOrDigit(c) ||
char.IsWhiteSpace(c) || char.IsPunctuation(c))));
str = new string(arr);