Question

I need a robust and simple way to remove illegal path and file characters from a simple string. I've used the below code but it doesn't seem to do anything, what am I missing?

using System;
using System.IO;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            string illegal = "\"M<>\"\\a/ry/ h**ad:>> a\\/:*?\"<>| li*tt|le|| la\"mb.?";

            illegal = illegal.Trim(Path.GetInvalidFileNameChars());
            illegal = illegal.Trim(Path.GetInvalidPathChars());

            Console.WriteLine(illegal);
            Console.ReadLine();
        }
    }
}
Was it helpful?

Solution

Try something like this instead;

string illegal = "\"M\"\\a/ry/ h**ad:>> a\\/:*?\"| li*tt|le|| la\"mb.?";
string invalid = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());

foreach (char c in invalid)
{
    illegal = illegal.Replace(c.ToString(), ""); 
}

But I have to agree with the comments, I'd probably try to deal with the source of the illegal paths, rather than try to mangle an illegal path into a legitimate but probably unintended one.

Edit: Or a potentially 'better' solution, using Regex's.

string illegal = "\"M\"\\a/ry/ h**ad:>> a\\/:*?\"| li*tt|le|| la\"mb.?";
string regexSearch = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());
Regex r = new Regex(string.Format("[{0}]", Regex.Escape(regexSearch)));
illegal = r.Replace(illegal, "");

Still, the question begs to be asked, why you're doing this in the first place.

OTHER TIPS

public string GetSafeFilename(string filename)
{

    return string.Join("_", filename.Split(Path.GetInvalidFileNameChars()));

}

This answer was on another thread by Ceres, I really like it neat and simple.

I use Linq to clean up filenames. You can easily extend this to check for valid paths as well.

private static string CleanFileName(string fileName)
{
    return Path.GetInvalidFileNameChars().Aggregate(fileName, (current, c) => current.Replace(c.ToString(), string.Empty));
}

Update

Some comments indicate this method is not working for them so I've included a link to a DotNetFiddle snippet so you may validate the method.

https://dotnetfiddle.net/nw1SWY

You can remove illegal chars using Linq like this:

var invalidChars = Path.GetInvalidFileNameChars();

var invalidCharsRemoved = stringWithInvalidChars
.Where(x => !invalidChars.Contains(x))
.ToArray();

EDIT
This is how it looks with the required edit mentioned in the comments:

var invalidChars = Path.GetInvalidFileNameChars();

string invalidCharsRemoved = new string(stringWithInvalidChars
  .Where(x => !invalidChars.Contains(x))
  .ToArray());

These are all great solutions, but they all rely on Path.GetInvalidFileNameChars, which may not be as reliable as you'd think. Notice the following remark in the MSDN documentation on Path.GetInvalidFileNameChars:

The array returned from this method is not guaranteed to contain the complete set of characters that are invalid in file and directory names. The full set of invalid characters can vary by file system. For example, on Windows-based desktop platforms, invalid path characters might include ASCII/Unicode characters 1 through 31, as well as quote ("), less than (<), greater than (>), pipe (|), backspace (\b), null (\0) and tab (\t).

It's not any better with Path.GetInvalidPathChars method. It contains the exact same remark.

For file names:

string cleanFileName = String.Join("", fileName.Split(Path.GetInvalidFileNameChars()));

For full paths:

string cleanPath = String.Join("", path.Split(Path.GetInvalidPathChars()));

Note that if you intend to use this as a security feature, a more robust approach would be to expand all paths and then verify that the user supplied path is indeed a child of a directory the user should have access to.

For starters, Trim only removes characters from the beginning or end of the string. Secondly, you should evaluate if you really want to remove the offensive characters, or fail fast and let the user know their filename is invalid. My choice is the latter, but my answer should at least show you how to do things the right AND wrong way:

StackOverflow question showing how to check if a given string is a valid file name. Note you can use the regex from this question to remove characters with a regular expression replacement (if you really need to do this).

I use regular expressions to achieve this. First, I dynamically build the regex.

string regex = string.Format(
                   "[{0}]",
                   Regex.Escape(new string(Path.GetInvalidFileNameChars())));
Regex removeInvalidChars = new Regex(regex, RegexOptions.Singleline | RegexOptions.Compiled | RegexOptions.CultureInvariant);

Then I just call removeInvalidChars.Replace to do the find and replace. This can obviously be extended to cover path chars as well.

The best way to remove illegal character from user input is to replace illegal character using Regex class, create method in code behind or also it validate at client side using RegularExpression control.

public string RemoveSpecialCharacters(string str)
{
    return Regex.Replace(str, "[^a-zA-Z0-9_]+", "_", RegexOptions.Compiled);
}

OR

<asp:RegularExpressionValidator ID="regxFolderName" 
                                runat="server" 
                                ErrorMessage="Enter folder name with  a-z A-Z0-9_" 
                                ControlToValidate="txtFolderName" 
                                Display="Dynamic" 
                                ValidationExpression="^[a-zA-Z0-9_]*$" 
                                ForeColor="Red">

I absolutely prefer the idea of Jeff Yates. It will work perfectly, if you slightly modify it:

string regex = String.Format("[{0}]", Regex.Escape(new string(Path.GetInvalidFileNameChars())));
Regex removeInvalidChars = new Regex(regex, RegexOptions.Singleline | RegexOptions.Compiled | RegexOptions.CultureInvariant);

The improvement is just to escape the automaticially generated regex.

Here's a code snippet that should help for .NET 3 and higher.

using System.IO;
using System.Text.RegularExpressions;

public static class PathValidation
{
    private static string pathValidatorExpression = "^[^" + string.Join("", Array.ConvertAll(Path.GetInvalidPathChars(), x => Regex.Escape(x.ToString()))) + "]+$";
    private static Regex pathValidator = new Regex(pathValidatorExpression, RegexOptions.Compiled);

    private static string fileNameValidatorExpression = "^[^" + string.Join("", Array.ConvertAll(Path.GetInvalidFileNameChars(), x => Regex.Escape(x.ToString()))) + "]+$";
    private static Regex fileNameValidator = new Regex(fileNameValidatorExpression, RegexOptions.Compiled);

    private static string pathCleanerExpression = "[" + string.Join("", Array.ConvertAll(Path.GetInvalidPathChars(), x => Regex.Escape(x.ToString()))) + "]";
    private static Regex pathCleaner = new Regex(pathCleanerExpression, RegexOptions.Compiled);

    private static string fileNameCleanerExpression = "[" + string.Join("", Array.ConvertAll(Path.GetInvalidFileNameChars(), x => Regex.Escape(x.ToString()))) + "]";
    private static Regex fileNameCleaner = new Regex(fileNameCleanerExpression, RegexOptions.Compiled);

    public static bool ValidatePath(string path)
    {
        return pathValidator.IsMatch(path);
    }

    public static bool ValidateFileName(string fileName)
    {
        return fileNameValidator.IsMatch(fileName);
    }

    public static string CleanPath(string path)
    {
        return pathCleaner.Replace(path, "");
    }

    public static string CleanFileName(string fileName)
    {
        return fileNameCleaner.Replace(fileName, "");
    }
}

Most solutions above combine illegal chars for both path and filename which is wrong (even when both calls currently return the same set of chars). I would first split the path+filename in path and filename, then apply the appropriate set to either if them and then combine the two again.

wvd_vegt

If you remove or replace with a single character the invalid characters, you can have collisions:

<abc -> abc
>abc -> abc

Here is a simple method to avoid this:

public static string ReplaceInvalidFileNameChars(string s)
{
    char[] invalidFileNameChars = System.IO.Path.GetInvalidFileNameChars();
    foreach (char c in invalidFileNameChars)
        s = s.Replace(c.ToString(), "[" + Array.IndexOf(invalidFileNameChars, c) + "]");
    return s;
}

The result:

 <abc -> [1]abc
 >abc -> [2]abc

Throw an exception.

if ( fileName.IndexOfAny(Path.GetInvalidFileNameChars()) > -1 )
            {
                throw new ArgumentException();
            }

I wrote this monster for fun, it lets you roundtrip:

public static class FileUtility
{
    private const char PrefixChar = '%';
    private static readonly int MaxLength;
    private static readonly Dictionary<char,char[]> Illegals;
    static FileUtility()
    {
        List<char> illegal = new List<char> { PrefixChar };
        illegal.AddRange(Path.GetInvalidFileNameChars());
        MaxLength = illegal.Select(x => ((int)x).ToString().Length).Max();
        Illegals = illegal.ToDictionary(x => x, x => ((int)x).ToString("D" + MaxLength).ToCharArray());
    }

    public static string FilenameEncode(string s)
    {
        var builder = new StringBuilder();
        char[] replacement;
        using (var reader = new StringReader(s))
        {
            while (true)
            {
                int read = reader.Read();
                if (read == -1)
                    break;
                char c = (char)read;
                if(Illegals.TryGetValue(c,out replacement))
                {
                    builder.Append(PrefixChar);
                    builder.Append(replacement);
                }
                else
                {
                    builder.Append(c);
                }
            }
        }
        return builder.ToString();
    }

    public static string FilenameDecode(string s)
    {
        var builder = new StringBuilder();
        char[] buffer = new char[MaxLength];
        using (var reader = new StringReader(s))
        {
            while (true)
            {
                int read = reader.Read();
                if (read == -1)
                    break;
                char c = (char)read;
                if (c == PrefixChar)
                {
                    reader.Read(buffer, 0, MaxLength);
                    var encoded =(char) ParseCharArray(buffer);
                    builder.Append(encoded);
                }
                else
                {
                    builder.Append(c);
                }
            }
        }
        return builder.ToString();
    }

    public static int ParseCharArray(char[] buffer)
    {
        int result = 0;
        foreach (char t in buffer)
        {
            int digit = t - '0';
            if ((digit < 0) || (digit > 9))
            {
                throw new ArgumentException("Input string was not in the correct format");
            }
            result *= 10;
            result += digit;
        }
        return result;
    }
}

I think it is much easier to validate using a regex and specifiing which characters are allowed, instead of trying to check for all bad characters. See these links: http://www.c-sharpcorner.com/UploadFile/prasad_1/RegExpPSD12062005021717AM/RegExpPSD.aspx http://www.windowsdevcenter.com/pub/a/oreilly/windows/news/csharp_0101.html

Also, do a search for "regular expression editor"s, they help a lot. There are some around which even output the code in c# for you.

This seems to be O(n) and does not spend too much memory on strings:

    private static readonly HashSet<char> invalidFileNameChars = new HashSet<char>(Path.GetInvalidFileNameChars());

    public static string RemoveInvalidFileNameChars(string name)
    {
        if (!name.Any(c => invalidFileNameChars.Contains(c))) {
            return name;
        }

        return new string(name.Where(c => !invalidFileNameChars.Contains(c)).ToArray());
    }

Scanning over the answers here, they all** seem to involve using a char array of invalid filename characters.

Granted, this may be micro-optimising - but for the benefit of anyone who might be looking to check a large number of values for being valid filenames, it's worth noting that building a hashset of invalid chars will bring about notably better performance.

I have been very surprised (shocked) in the past just how quickly a hashset (or dictionary) outperforms iterating over a list. With strings, it's a ridiculously low number (about 5-7 items from memory). With most other simple data (object references, numbers etc) the magic crossover seems to be around 20 items.

There are 40 invalid characters in the Path.InvalidFileNameChars "list". Did a search today and there's quite a good benchmark here on StackOverflow that shows the hashset will take a little over half the time of an array/list for 40 items: https://stackoverflow.com/a/10762995/949129

Here's the helper class I use for sanitising paths. I forget now why I had the fancy replacement option in it, but it's there as a cute bonus.

Additional bonus method "IsValidLocalPath" too :)

(** those which don't use regular expressions)

public static class PathExtensions
{
    private static HashSet<char> _invalidFilenameChars;
    private static HashSet<char> InvalidFilenameChars
    {
        get { return _invalidFilenameChars ?? (_invalidFilenameChars = new HashSet<char>(Path.GetInvalidFileNameChars())); }
    }


    /// <summary>Replaces characters in <c>text</c> that are not allowed in file names with the 
    /// specified replacement character.</summary>
    /// <param name="text">Text to make into a valid filename. The same string is returned if 
    /// it is valid already.</param>
    /// <param name="replacement">Replacement character, or NULL to remove bad characters.</param>
    /// <param name="fancyReplacements">TRUE to replace quotes and slashes with the non-ASCII characters ” and ⁄.</param>
    /// <returns>A string that can be used as a filename. If the output string would otherwise be empty, "_" is returned.</returns>
    public static string ToValidFilename(this string text, char? replacement = '_', bool fancyReplacements = false)
    {
        StringBuilder sb = new StringBuilder(text.Length);
        HashSet<char> invalids = InvalidFilenameChars;
        bool changed = false;

        for (int i = 0; i < text.Length; i++)
        {
            char c = text[i];
            if (invalids.Contains(c))
            {
                changed = true;
                char repl = replacement ?? '\0';
                if (fancyReplacements)
                {
                    if (c == '"') repl = '”'; // U+201D right double quotation mark
                    else if (c == '\'') repl = '’'; // U+2019 right single quotation mark
                    else if (c == '/') repl = '⁄'; // U+2044 fraction slash
                }
                if (repl != '\0')
                    sb.Append(repl);
            }
            else
                sb.Append(c);
        }

        if (sb.Length == 0)
            return "_";

        return changed ? sb.ToString() : text;
    }


    /// <summary>
    /// Returns TRUE if the specified path is a valid, local filesystem path.
    /// </summary>
    /// <param name="pathString"></param>
    /// <returns></returns>
    public static bool IsValidLocalPath(this string pathString)
    {
        // From solution at https://stackoverflow.com/a/11636052/949129
        Uri pathUri;
        Boolean isValidUri = Uri.TryCreate(pathString, UriKind.Absolute, out pathUri);
        return isValidUri && pathUri != null && pathUri.IsLoopback;
    }
}
public static class StringExtensions
      {
        public static string RemoveUnnecessary(this string source)
        {
            string result = string.Empty;
            string regex = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());
            Regex reg = new Regex(string.Format("[{0}]", Regex.Escape(regex)));
            result = reg.Replace(source, "");
            return result;
        }
    }

You can use method clearly.

File name can not contain characters from Path.GetInvalidPathChars(), + and # symbols, and other specific names. We combined all checks into one class:

public static class FileNameExtensions
{
    private static readonly Lazy<string[]> InvalidFileNameChars =
        new Lazy<string[]>(() => Path.GetInvalidPathChars()
            .Union(Path.GetInvalidFileNameChars()
            .Union(new[] { '+', '#' })).Select(c => c.ToString(CultureInfo.InvariantCulture)).ToArray());


    private static readonly HashSet<string> ProhibitedNames = new HashSet<string>
    {
        @"aux",
        @"con",
        @"clock$",
        @"nul",
        @"prn",

        @"com1",
        @"com2",
        @"com3",
        @"com4",
        @"com5",
        @"com6",
        @"com7",
        @"com8",
        @"com9",

        @"lpt1",
        @"lpt2",
        @"lpt3",
        @"lpt4",
        @"lpt5",
        @"lpt6",
        @"lpt7",
        @"lpt8",
        @"lpt9"
    };

    public static bool IsValidFileName(string fileName)
    {
        return !string.IsNullOrWhiteSpace(fileName)
            && fileName.All(o => !IsInvalidFileNameChar(o))
            && !IsProhibitedName(fileName);
    }

    public static bool IsProhibitedName(string fileName)
    {
        return ProhibitedNames.Contains(fileName.ToLower(CultureInfo.InvariantCulture));
    }

    private static string ReplaceInvalidFileNameSymbols([CanBeNull] this string value, string replacementValue)
    {
        if (value == null)
        {
            return null;
        }

        return InvalidFileNameChars.Value.Aggregate(new StringBuilder(value),
            (sb, currentChar) => sb.Replace(currentChar, replacementValue)).ToString();
    }

    public static bool IsInvalidFileNameChar(char value)
    {
        return InvalidFileNameChars.Value.Contains(value.ToString(CultureInfo.InvariantCulture));
    }

    public static string GetValidFileName([NotNull] this string value)
    {
        return GetValidFileName(value, @"_");
    }

    public static string GetValidFileName([NotNull] this string value, string replacementValue)
    {
        if (string.IsNullOrWhiteSpace(value))
        {
            throw new ArgumentException(@"value should be non empty", nameof(value));
        }

        if (IsProhibitedName(value))
        {
            return (string.IsNullOrWhiteSpace(replacementValue) ? @"_" : replacementValue) + value; 
        }

        return ReplaceInvalidFileNameSymbols(value, replacementValue);
    }

    public static string GetFileNameError(string fileName)
    {
        if (string.IsNullOrWhiteSpace(fileName))
        {
            return CommonResources.SelectReportNameError;
        }

        if (IsProhibitedName(fileName))
        {
            return CommonResources.FileNameIsProhibited;
        }

        var invalidChars = fileName.Where(IsInvalidFileNameChar).Distinct().ToArray();

        if(invalidChars.Length > 0)
        {
            return string.Format(CultureInfo.CurrentCulture,
                invalidChars.Length == 1 ? CommonResources.InvalidCharacter : CommonResources.InvalidCharacters,
                StringExtensions.JoinQuoted(@",", @"'", invalidChars.Select(c => c.ToString(CultureInfo.CurrentCulture))));
        }

        return string.Empty;
    }
}

Method GetValidFileName replaces all incorrect data to _.

One liner to cleanup string from any illegal chars for windows file naming:

public static string CleanIllegalName(string p_testName) => new Regex(string.Format("[{0}]", Regex.Escape(new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars())))).Replace(p_testName, "");
public static bool IsValidFilename(string testName)
{
    return !new Regex("[" + Regex.Escape(new String(System.IO.Path.GetInvalidFileNameChars())) + "]").IsMatch(testName);
}

This will do want you want, and avoid collisions

 static string SanitiseFilename(string key)
    {
        var invalidChars = Path.GetInvalidFileNameChars();
        var sb = new StringBuilder();
        foreach (var c in key)
        {
            var invalidCharIndex = -1;
            for (var i = 0; i < invalidChars.Length; i++)
            {
                if (c == invalidChars[i])
                {
                    invalidCharIndex = i;
                }
            }
            if (invalidCharIndex > -1)
            {
                sb.Append("_").Append(invalidCharIndex);
                continue;
            }

            if (c == '_')
            {
                sb.Append("__");
                continue;
            }

            sb.Append(c);
        }
        return sb.ToString();

    }

I think the question already not full answered... The answers only describe clean filename OR path... not both. Here is my solution:

private static string CleanPath(string path)
{
    string regexSearch = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());
    Regex r = new Regex(string.Format("[{0}]", Regex.Escape(regexSearch)));
    List<string> split = path.Split('\\').ToList();
    string returnValue = split.Aggregate(string.Empty, (current, s) => current + (r.Replace(s, "") + @"\"));
    returnValue = returnValue.TrimEnd('\\');
    return returnValue;
}

I created an extension method that combines several suggestions:

  1. Holding illegal characters in a hash set
  2. Filtering out characters below ascii 127. Since Path.GetInvalidFileNameChars does not include all invalid characters possible with ascii codes from 0 to 255. See here and MSDN
  3. Possiblity to define the replacement character

Source:

public static class FileNameCorrector
{
    private static HashSet<char> invalid = new HashSet<char>(Path.GetInvalidFileNameChars());

    public static string ToValidFileName(this string name, char replacement = '\0')
    {
        var builder = new StringBuilder();
        foreach (var cur in name)
        {
            if (cur > 31 && cur < 128 && !invalid.Contains(cur))
            {
                builder.Append(cur);
            }
            else if (replacement != '\0')
            {
                builder.Append(replacement);
            }
        }

        return builder.ToString();
    }
}

Or you can just do

[YOUR STRING].Replace('\\', ' ').Replace('/', ' ').Replace('"', ' ').Replace('*', ' ').Replace(':', ' ').Replace('?', ' ').Replace('<', ' ').Replace('>', ' ').Replace('|', ' ').Trim();
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top