Come rimuovere caratteri illegali dal percorso e dai nomi dei file?

https://stackoverflow.com/questions/146134

02-07-2019
|

Domanda

Ho bisogno di un modo robusto e semplice per rimuovere percorsi e caratteri di file illegali da una semplice stringa. Ho usato il codice qui sotto ma non sembra fare nulla, cosa mi sto perdendo?

using System;
using System.IO;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            string illegal = "\"M<>\"\\a/ry/ h**ad:>> a\\/:*?\"<>| li*tt|le|| la\"mb.?";

            illegal = illegal.Trim(Path.GetInvalidFileNameChars());
            illegal = illegal.Trim(Path.GetInvalidPathChars());

            Console.WriteLine(illegal);
            Console.ReadLine();
        }
    }
}

Soluzione

Prova invece qualcosa del genere;

string illegal = "\"M\"\\a/ry/ h**ad:>> a\\/:*?\"| li*tt|le|| la\"mb.?";
string invalid = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());

foreach (char c in invalid)
{
    illegal = illegal.Replace(c.ToString(), ""); 
}

Ma devo essere d'accordo con i commenti, probabilmente proverei a trattare la fonte dei percorsi illegali, piuttosto che cercare di trasformare un percorso illegale in un percorso legittimo ma probabilmente non intenzionale.

Modifica: o una soluzione potenzialmente "migliore", usando quella di Regex.

string illegal = "\"M\"\\a/ry/ h**ad:>> a\\/:*?\"| li*tt|le|| la\"mb.?";
string regexSearch = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());
Regex r = new Regex(string.Format("[{0}]", Regex.Escape(regexSearch)));
illegal = r.Replace(illegal, "");

Tuttavia, la domanda richiede di essere posta, perché lo stai facendo in primo luogo.

Altri suggerimenti

public string GetSafeFilename(string filename)
{

    return string.Join("_", filename.Split(Path.GetInvalidFileNameChars()));

}

Questa risposta è stata su un altro thread di Cerere , mi piace davvero pulito e semplice.

Uso Linq per ripulire i nomi dei file. Puoi facilmente estenderlo per verificare anche la presenza di percorsi validi.

private static string CleanFileName(string fileName)
{
    return Path.GetInvalidFileNameChars().Aggregate(fileName, (current, c) => current.Replace(c.ToString(), string.Empty));
}

Aggiornamento

Alcuni commenti indicano che questo metodo non funziona per loro, quindi ho incluso un collegamento a uno snippet di DotNetFiddle in modo da poter validare il metodo.

https://dotnetfiddle.net/nw1SWY

Puoi rimuovere i caratteri illegali usando Linq in questo modo:

var invalidChars = Path.GetInvalidFileNameChars();

var invalidCharsRemoved = stringWithInvalidChars
.Where(x => !invalidChars.Contains(x))
.ToArray();

Modifica
Ecco come appare con la modifica richiesta menzionata nei commenti:

var invalidChars = Path.GetInvalidFileNameChars();

string invalidCharsRemoved = new string(stringWithInvalidChars
  .Where(x => !invalidChars.Contains(x))
  .ToArray());

Queste sono tutte ottime soluzioni, ma si basano tutte su Path.GetInvalidFileNameChars, che potrebbe non essere affidabile come pensi. Si noti la seguente osservazione nella documentazione MSDN su Path.GetInvalidPathChars :

Non è garantito che l'array restituito da questo metodo contenga il set completo di caratteri non validi nei nomi di file e directory. Il set completo di caratteri non validi può variare in base al file system. Ad esempio, su piattaforme desktop basate su Windows, i caratteri di percorso non validi potrebbero includere caratteri ASCII / Unicode da 1 a 31, nonché virgolette (& Quot;), meno di (& Lt;), maggiore di ( gt;), pipe (|), backspace (\ b), null (\ 0) e tab (\ t).

Non è meglio con il metodo <=> . Contiene la stessa identica osservazione.

Per i nomi dei file:

string cleanFileName = String.Join("", fileName.Split(Path.GetInvalidFileNameChars()));

Per percorsi completi:

string cleanPath = String.Join("", path.Split(Path.GetInvalidPathChars()));

Si noti che se si intende utilizzarlo come funzionalità di sicurezza, un approccio più efficace sarebbe quello di espandere tutti i percorsi e quindi verificare che il percorso fornito dall'utente sia effettivamente figlio di una directory a cui l'utente dovrebbe avere accesso.

Per cominciare, Trim rimuove solo i caratteri dall'inizio o fine della stringa . In secondo luogo, dovresti valutare se desideri veramente rimuovere i caratteri offensivi o fallire velocemente e far sapere all'utente che il suo nome file non è valido. La mia scelta è quest'ultima, ma la mia risposta dovrebbe almeno mostrarti come fare le cose nel modo giusto E sbagliato:

StackOverflow domanda che mostra come controlla se una determinata stringa è un nome file valido . Nota che puoi usare la regex da questa domanda per rimuovere i caratteri con una sostituzione di espressioni regolari (se hai davvero bisogno di farlo).

Uso espressioni regolari per raggiungere questo obiettivo. Innanzitutto, costruisco dinamicamente la regex.

string regex = string.Format(
                   "[{0}]",
                   Regex.Escape(new string(Path.GetInvalidFileNameChars())));
Regex removeInvalidChars = new Regex(regex, RegexOptions.Singleline | RegexOptions.Compiled | RegexOptions.CultureInvariant);

Quindi chiamo removeInvalidChars.Replace per trovare e sostituire. Questo ovviamente può essere esteso anche ai caratteri del percorso.

Il modo migliore per rimuovere il carattere illegale dall'input dell'utente è quello di sostituire il carattere illegale usando la classe Regex, creare un metodo nel codice dietro o anche validarlo sul lato client usando il controllo RegularExpression.

public string RemoveSpecialCharacters(string str)
{
    return Regex.Replace(str, "[^a-zA-Z0-9_]+", "_", RegexOptions.Compiled);
}

<asp:RegularExpressionValidator ID="regxFolderName" 
                                runat="server" 
                                ErrorMessage="Enter folder name with  a-z A-Z0-9_" 
                                ControlToValidate="txtFolderName" 
                                Display="Dynamic" 
                                ValidationExpression="^[a-zA-Z0-9_]*$" 
                                ForeColor="Red">

Preferisco assolutamente l'idea di Jeff Yates. Funzionerà perfettamente, se lo modifichi leggermente:

string regex = String.Format("[{0}]", Regex.Escape(new string(Path.GetInvalidFileNameChars())));
Regex removeInvalidChars = new Regex(regex, RegexOptions.Singleline | RegexOptions.Compiled | RegexOptions.CultureInvariant);

Il miglioramento è solo quello di sfuggire alla regex generata automaticamente.

Ecco uno snippet di codice che dovrebbe essere utile per .NET 3 e versioni successive.

using System.IO;
using System.Text.RegularExpressions;

public static class PathValidation
{
    private static string pathValidatorExpression = "^[^" + string.Join("", Array.ConvertAll(Path.GetInvalidPathChars(), x => Regex.Escape(x.ToString()))) + "]+$";
    private static Regex pathValidator = new Regex(pathValidatorExpression, RegexOptions.Compiled);

    private static string fileNameValidatorExpression = "^[^" + string.Join("", Array.ConvertAll(Path.GetInvalidFileNameChars(), x => Regex.Escape(x.ToString()))) + "]+$";
    private static Regex fileNameValidator = new Regex(fileNameValidatorExpression, RegexOptions.Compiled);

    private static string pathCleanerExpression = "[" + string.Join("", Array.ConvertAll(Path.GetInvalidPathChars(), x => Regex.Escape(x.ToString()))) + "]";
    private static Regex pathCleaner = new Regex(pathCleanerExpression, RegexOptions.Compiled);

    private static string fileNameCleanerExpression = "[" + string.Join("", Array.ConvertAll(Path.GetInvalidFileNameChars(), x => Regex.Escape(x.ToString()))) + "]";
    private static Regex fileNameCleaner = new Regex(fileNameCleanerExpression, RegexOptions.Compiled);

    public static bool ValidatePath(string path)
    {
        return pathValidator.IsMatch(path);
    }

    public static bool ValidateFileName(string fileName)
    {
        return fileNameValidator.IsMatch(fileName);
    }

    public static string CleanPath(string path)
    {
        return pathCleaner.Replace(path, "");
    }

    public static string CleanFileName(string fileName)
    {
        return fileNameCleaner.Replace(fileName, "");
    }
}

La maggior parte delle soluzioni precedenti combina caratteri non validi sia per il percorso che per il nome del file, il che è errato (anche quando entrambe le chiamate restituiscono lo stesso set di caratteri). Prima dividerei il percorso + nome file in percorso e nome file, quindi applicherei il set appropriato a entrambi e quindi combinerei nuovamente i due.

wvd_vegt

Se rimuovi o sostituisci con un singolo carattere i caratteri non validi, puoi avere collisioni:

<abc -> abc
>abc -> abc

Ecco un metodo semplice per evitarlo:

public static string ReplaceInvalidFileNameChars(string s)
{
    char[] invalidFileNameChars = System.IO.Path.GetInvalidFileNameChars();
    foreach (char c in invalidFileNameChars)
        s = s.Replace(c.ToString(), "[" + Array.IndexOf(invalidFileNameChars, c) + "]");
    return s;
}

Il risultato:

 <abc -> [1]abc
 >abc -> [2]abc

Genera un'eccezione.

if ( fileName.IndexOfAny(Path.GetInvalidFileNameChars()) > -1 )
            {
                throw new ArgumentException();
            }

Ho scritto questo mostro per divertimento, ti permette di andare e ritorno:

public static class FileUtility
{
    private const char PrefixChar = '%';
    private static readonly int MaxLength;
    private static readonly Dictionary<char,char[]> Illegals;
    static FileUtility()
    {
        List<char> illegal = new List<char> { PrefixChar };
        illegal.AddRange(Path.GetInvalidFileNameChars());
        MaxLength = illegal.Select(x => ((int)x).ToString().Length).Max();
        Illegals = illegal.ToDictionary(x => x, x => ((int)x).ToString("D" + MaxLength).ToCharArray());
    }

    public static string FilenameEncode(string s)
    {
        var builder = new StringBuilder();
        char[] replacement;
        using (var reader = new StringReader(s))
        {
            while (true)
            {
                int read = reader.Read();
                if (read == -1)
                    break;
                char c = (char)read;
                if(Illegals.TryGetValue(c,out replacement))
                {
                    builder.Append(PrefixChar);
                    builder.Append(replacement);
                }
                else
                {
                    builder.Append(c);
                }
            }
        }
        return builder.ToString();
    }

    public static string FilenameDecode(string s)
    {
        var builder = new StringBuilder();
        char[] buffer = new char[MaxLength];
        using (var reader = new StringReader(s))
        {
            while (true)
            {
                int read = reader.Read();
                if (read == -1)
                    break;
                char c = (char)read;
                if (c == PrefixChar)
                {
                    reader.Read(buffer, 0, MaxLength);
                    var encoded =(char) ParseCharArray(buffer);
                    builder.Append(encoded);
                }
                else
                {
                    builder.Append(c);
                }
            }
        }
        return builder.ToString();
    }

    public static int ParseCharArray(char[] buffer)
    {
        int result = 0;
        foreach (char t in buffer)
        {
            int digit = t - '0';
            if ((digit < 0) || (digit > 9))
            {
                throw new ArgumentException("Input string was not in the correct format");
            }
            result *= 10;
            result += digit;
        }
        return result;
    }
}

Penso che sia molto più facile convalidare usando una regex e specificare quali caratteri sono consentiti, invece di cercare di controllare tutti i caratteri cattivi. Vedi questi link: http://www.c-sharpcorner.com/UploadFile/prasad_1/ RegExpPSD12062005021717AM / RegExpPSD.aspx http://www.windowsdevcenter.com/pub/a/ oreilly / windows / notizie / csharp_0101.html

Inoltre, cerca " editor di espressioni regolari " s, aiutano molto. Ce ne sono alcuni in giro che producono anche il codice in c # per te.

Questo sembra essere O (n) e non spende troppa memoria nelle stringhe:

    private static readonly HashSet<char> invalidFileNameChars = new HashSet<char>(Path.GetInvalidFileNameChars());

    public static string RemoveInvalidFileNameChars(string name)
    {
        if (!name.Any(c => invalidFileNameChars.Contains(c))) {
            return name;
        }

        return new string(name.Where(c => !invalidFileNameChars.Contains(c)).ToArray());
    }

Scansionando le risposte qui, tutti ** sembrano comportare l'utilizzo di un array di caratteri di nomi di file non validi.

Certo, questo potrebbe essere un micro-ottimizzazione, ma a beneficio di chiunque stia cercando di controllare un gran numero di valori per essere nomi di file validi, vale la pena notare che la creazione di un hashset di caratteri non validi porterà a prestazioni notevolmente migliori .

In passato sono stato molto sorpreso (scioccato) dalla rapidità con cui un hashset (o dizionario) supera le prestazioni di una lista. Con le stringhe, è un numero ridicolmente basso (circa 5-7 voci dalla memoria). Con la maggior parte degli altri dati semplici (riferimenti a oggetti, numeri ecc.) Il crossover magico sembra essere di circa 20 elementi.

Esistono 40 caratteri non validi in Path.InvalidFileNameChars " list " ;. Ho fatto una ricerca oggi e c'è un buon punto di riferimento qui su StackOverflow che mostra che l'hashset impiegherà poco più della metà del tempo di un array / elenco per 40 elementi: https://stackoverflow.com/a/10762995/949129

Ecco la classe di supporto che utilizzo per disinfettare i percorsi. Ora dimentico perché avevo l'opzione sostitutiva, ma è un bonus carino.

Metodo bonus aggiuntivo " IsValidLocalPath " troppo :)

(** quelli che non usano espressioni regolari)

public static class PathExtensions
{
    private static HashSet<char> _invalidFilenameChars;
    private static HashSet<char> InvalidFilenameChars
    {
        get { return _invalidFilenameChars ?? (_invalidFilenameChars = new HashSet<char>(Path.GetInvalidFileNameChars())); }
    }


    /// <summary>Replaces characters in <c>text</c> that are not allowed in file names with the 
    /// specified replacement character.</summary>
    /// <param name="text">Text to make into a valid filename. The same string is returned if 
    /// it is valid already.</param>
    /// <param name="replacement">Replacement character, or NULL to remove bad characters.</param>
    /// <param name="fancyReplacements">TRUE to replace quotes and slashes with the non-ASCII characters ” and ⁄.</param>
    /// <returns>A string that can be used as a filename. If the output string would otherwise be empty, "_" is returned.</returns>
    public static string ToValidFilename(this string text, char? replacement = '_', bool fancyReplacements = false)
    {
        StringBuilder sb = new StringBuilder(text.Length);
        HashSet<char> invalids = InvalidFilenameChars;
        bool changed = false;

        for (int i = 0; i < text.Length; i++)
        {
            char c = text[i];
            if (invalids.Contains(c))
            {
                changed = true;
                char repl = replacement ?? '\0';
                if (fancyReplacements)
                {
                    if (c == '"') repl = '”'; // U+201D right double quotation mark
                    else if (c == '\'') repl = '’'; // U+2019 right single quotation mark
                    else if (c == '/') repl = '⁄'; // U+2044 fraction slash
                }
                if (repl != '\0')
                    sb.Append(repl);
            }
            else
                sb.Append(c);
        }

        if (sb.Length == 0)
            return "_";

        return changed ? sb.ToString() : text;
    }


    /// <summary>
    /// Returns TRUE if the specified path is a valid, local filesystem path.
    /// </summary>
    /// <param name="pathString"></param>
    /// <returns></returns>
    public static bool IsValidLocalPath(this string pathString)
    {
        // From solution at https://stackoverflow.com/a/11636052/949129
        Uri pathUri;
        Boolean isValidUri = Uri.TryCreate(pathString, UriKind.Absolute, out pathUri);
        return isValidUri && pathUri != null && pathUri.IsLoopback;
    }
}

public static class StringExtensions
      {
        public static string RemoveUnnecessary(this string source)
        {
            string result = string.Empty;
            string regex = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());
            Regex reg = new Regex(string.Format("[{0}]", Regex.Escape(regex)));
            result = reg.Replace(source, "");
            return result;
        }
    }

Puoi usare il metodo chiaramente.

Il nome del file non può contenere caratteri dai simboli Path.GetInvalidPathChars(), + e # e altri nomi specifici. Abbiamo unito tutti i controlli in una classe:

public static class FileNameExtensions
{
    private static readonly Lazy<string[]> InvalidFileNameChars =
        new Lazy<string[]>(() => Path.GetInvalidPathChars()
            .Union(Path.GetInvalidFileNameChars()
            .Union(new[] { '+', '#' })).Select(c => c.ToString(CultureInfo.InvariantCulture)).ToArray());


    private static readonly HashSet<string> ProhibitedNames = new HashSet<string>
    {
        @"aux",
        @"con",
        @"clock$",
        @"nul",
        @"prn",

        @"com1",
        @"com2",
        @"com3",
        @"com4",
        @"com5",
        @"com6",
        @"com7",
        @"com8",
        @"com9",

        @"lpt1",
        @"lpt2",
        @"lpt3",
        @"lpt4",
        @"lpt5",
        @"lpt6",
        @"lpt7",
        @"lpt8",
        @"lpt9"
    };

    public static bool IsValidFileName(string fileName)
    {
        return !string.IsNullOrWhiteSpace(fileName)
            && fileName.All(o => !IsInvalidFileNameChar(o))
            && !IsProhibitedName(fileName);
    }

    public static bool IsProhibitedName(string fileName)
    {
        return ProhibitedNames.Contains(fileName.ToLower(CultureInfo.InvariantCulture));
    }

    private static string ReplaceInvalidFileNameSymbols([CanBeNull] this string value, string replacementValue)
    {
        if (value == null)
        {
            return null;
        }

        return InvalidFileNameChars.Value.Aggregate(new StringBuilder(value),
            (sb, currentChar) => sb.Replace(currentChar, replacementValue)).ToString();
    }

    public static bool IsInvalidFileNameChar(char value)
    {
        return InvalidFileNameChars.Value.Contains(value.ToString(CultureInfo.InvariantCulture));
    }

    public static string GetValidFileName([NotNull] this string value)
    {
        return GetValidFileName(value, @"_");
    }

    public static string GetValidFileName([NotNull] this string value, string replacementValue)
    {
        if (string.IsNullOrWhiteSpace(value))
        {
            throw new ArgumentException(@"value should be non empty", nameof(value));
        }

        if (IsProhibitedName(value))
        {
            return (string.IsNullOrWhiteSpace(replacementValue) ? @"_" : replacementValue) + value; 
        }

        return ReplaceInvalidFileNameSymbols(value, replacementValue);
    }

    public static string GetFileNameError(string fileName)
    {
        if (string.IsNullOrWhiteSpace(fileName))
        {
            return CommonResources.SelectReportNameError;
        }

        if (IsProhibitedName(fileName))
        {
            return CommonResources.FileNameIsProhibited;
        }

        var invalidChars = fileName.Where(IsInvalidFileNameChar).Distinct().ToArray();

        if(invalidChars.Length > 0)
        {
            return string.Format(CultureInfo.CurrentCulture,
                invalidChars.Length == 1 ? CommonResources.InvalidCharacter : CommonResources.InvalidCharacters,
                StringExtensions.JoinQuoted(@",", @"'", invalidChars.Select(c => c.ToString(CultureInfo.CurrentCulture))));
        }

        return string.Empty;
    }
}

Il metodo GetValidFileName sostituisce tutti i dati errati in _.

Un liner per pulire la stringa da qualsiasi carattere illegale per la denominazione dei file di Windows:

public static string CleanIllegalName(string p_testName) => new Regex(string.Format("[{0}]", Regex.Escape(new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars())))).Replace(p_testName, "");

public static bool IsValidFilename(string testName)
{
    return !new Regex("[" + Regex.Escape(new String(System.IO.Path.GetInvalidFileNameChars())) + "]").IsMatch(testName);
}

Questo vorrà che tu voglia ed evitare collisioni

 static string SanitiseFilename(string key)
    {
        var invalidChars = Path.GetInvalidFileNameChars();
        var sb = new StringBuilder();
        foreach (var c in key)
        {
            var invalidCharIndex = -1;
            for (var i = 0; i < invalidChars.Length; i++)
            {
                if (c == invalidChars[i])
                {
                    invalidCharIndex = i;
                }
            }
            if (invalidCharIndex > -1)
            {
                sb.Append("_").Append(invalidCharIndex);
                continue;
            }

            if (c == '_')
            {
                sb.Append("__");
                continue;
            }

            sb.Append(c);
        }
        return sb.ToString();

    }

Penso che alla domanda non sia già stata data una risposta completa ... Le risposte descrivono solo il nome del file pulito o il percorso ... non entrambi. Ecco la mia soluzione:

private static string CleanPath(string path)
{
    string regexSearch = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());
    Regex r = new Regex(string.Format("[{0}]", Regex.Escape(regexSearch)));
    List<string> split = path.Split('\\').ToList();
    string returnValue = split.Aggregate(string.Empty, (current, s) => current + (r.Replace(s, "") + @"\"));
    returnValue = returnValue.TrimEnd('\\');
    return returnValue;
}

Ho creato un metodo di estensione che combina diversi suggerimenti:

Conservare personaggi illegali in un set di hash
Filtraggio dei caratteri sotto ascii 127. Poiché Path.GetInvalidFileNameChars non include tutti i caratteri non validi possibili con codici ascii da 0 a 255. Vedi qui e MSDN
Possibilità di definire il carattere sostitutivo

Fonte:

public static class FileNameCorrector
{
    private static HashSet<char> invalid = new HashSet<char>(Path.GetInvalidFileNameChars());

    public static string ToValidFileName(this string name, char replacement = '\0')
    {
        var builder = new StringBuilder();
        foreach (var cur in name)
        {
            if (cur > 31 && cur < 128 && !invalid.Contains(cur))
            {
                builder.Append(cur);
            }
            else if (replacement != '\0')
            {
                builder.Append(replacement);
            }
        }

        return builder.ToString();
    }
}

O puoi semplicemente farlo

[YOUR STRING].Replace('\\', ' ').Replace('/', ' ').Replace('"', ' ').Replace('*', ' ').Replace(':', ' ').Replace('?', ' ').Replace('<', ' ').Replace('>', ' ').Replace('|', ' ').Trim();

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow