Domanda

I'm trying to remove strings with unrecognized characters from string collection. What is the best way to accomplish this?

È stato utile?

Soluzione

To remove strings that contain any characters you don't recognize: (EG: if you want to accept lowercase letters, then "foo@bar" would be rejected")

  1. Create a regular expression which defines the set of "recognized" characters, and starts with ^ and ends with $. For example, if your "recognized" characters are uppercase A through Z, it would be ^[A-Z]$
  2. Reject strings that don't match

Note: This won't work for strings that contain newlines, but you can tweak it if you need to support that

To remove strings that contain entirely characters you don't recognize: (EG: If you want to accept lowercase letters, then "foo@bar" would be accepted because it does contain at least one lowercase letter)

  1. Create a regular expression which defines the set of "recognized" characters, but with a ^ character inside the square brackets, and starts with ^ and ends with $. For example, if your "recognized" characters are uppercase A through Z, it would be ^[^A-Z]$
  2. Reject strings that DO match

Altri suggerimenti

Since Array (assuming string[]) is not re-sized when removing items you will need to create new one anyway. So basic LINQ filtering with ToArray() will give you new array.

myArray = myArray.Where(s => !ContainsSpecialCharacters(s)).ToArray();

I would look at Linq's where method, along with a regular expression containing the characters you're looking for. In pseudocode:

return myStringCollection.Where(!s matches regex)

this does what you seem to want.

List<string> strings = new List<string>()
{
    "one",
    "two`",
    "thr^ee",
    "four"
};

List<char> invalid_chars = new List<char>()
{
    '`', '-', '^'
};

strings.RemoveAll(s => s.Any(c => invalid_chars.Contains(c)));
strings.ForEach(s => Console.WriteLine(s));

generates output:

one
four

This question has some similar answers to what I think you are looking for. However, I think you want to include all letters, numbers, whitespace and punctuation, but exclude everything else. Is that accurate? If so, this should do it for you:

char[] arr = str.ToCharArray();

arr = Array.FindAll<char>(arr, (c => (char.IsLetterOrDigit(c) || 
                      char.IsWhiteSpace(c) || char.IsPunctuation(c))));
str = new string(arr);
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top