Trova collegamenti simbolici interrotti con Python

https://stackoverflow.com/questions/20794

09-06-2019
|

Domanda

Se chiamo os.stat() su un rotto symlink, Python lancia un OSError eccezione.Questo lo rende utile per trovarli.Tuttavia, ci sono alcuni altri motivi per questo os.stat() potrebbe generare un'eccezione simile.Esiste un modo più preciso per rilevare la rottura? symlinks con Python sotto Linux?

Soluzione

Un detto comune di Python è che è più facile chiedere perdono che permesso.Anche se non sono un fan di questa affermazione nella vita reale, si applica in molti casi.Di solito si vuole evitare il codice che concatena due chiamate di sistema sullo stesso file, perché non si sa mai cosa accadrà al file tra le due chiamate nel codice.

Un errore tipico è scrivere qualcosa del genere:

if os.path.exists(path):
    os.unlink(path)

La seconda chiamata (os.unlink) potrebbe fallire se qualcos'altro la cancella dopo il test if, solleva un'eccezione e interrompe l'esecuzione del resto della funzione.(Potresti pensare che questo non accada nella vita reale, ma la settimana scorsa abbiamo pescato un altro bug simile dal nostro codice base - ed era il tipo di bug che ha lasciato alcuni programmatori a grattarsi la testa e a rivendicare 'Heisenbug' per il ultimi mesi)

Quindi, nel tuo caso particolare, probabilmente farei:

try:
    os.stat(path)
except OSError, e:
    if e.errno == errno.ENOENT:
        print 'path %s does not exist or is a broken symlink' % path
    else:
        raise e

Il fastidio qui è che stat restituisce lo stesso codice di errore per un collegamento simbolico che semplicemente non è presente e un collegamento simbolico interrotto.

Quindi immagino che tu non abbia altra scelta che rompere l'atomicità e fare qualcosa del genere

if not os.path.exists(os.readlink(path)):
    print 'path %s is a broken symlink' % path

Altri suggerimenti

os.lstat() potrebbe essere utile.Se lstat() ha esito positivo e stat() fallisce, probabilmente si tratta di un collegamento interrotto.

Questo non è atomico ma funziona.

os.path.islink(filename) and not os.path.exists(filename)

Infatti da RTFM(leggendo il fantastico manuale) vediamo

os.path.exists(percorso)

Restituisce True se il percorso si riferisce a un percorso esistente.Restituisce False per collegamenti simbolici interrotti.

Dice anche:

Su alcune piattaforme, questa funzione potrebbe restituire False se non viene concessa l'autorizzazione per eseguire os.stat() sul file richiesto, anche se il percorso esiste fisicamente.

Quindi, se sei preoccupato per le autorizzazioni, dovresti aggiungere altre clausole.

Posso menzionare i test per gli hardlink senza Python?/bin/test ha la condizione FILE1 -ef FILE2 che è vera quando i file condividono un inode.

Pertanto, qualcosa del genere find . -type f -exec test \{} -ef /path/to/file \; -print funziona per il test del collegamento reale a un file specifico.

Il che mi porta alla lettura man test e le menzioni di -L E -h che funzionano entrambi su un file e restituiscono true se quel file è un collegamento simbolico, tuttavia ciò non ti dice se manca la destinazione.

L'ho trovato head -0 FILE1 restituirebbe un codice di uscita di 0 se il file può essere aperto e a 1 in caso contrario, che nel caso di un collegamento simbolico a un file normale funziona come test per verificare se la sua destinazione può essere letta.

os.percorso

Puoi provare a utilizzare realpath() per ottenere ciò a cui punta il collegamento simbolico, quindi provare a determinare se si tratta di un file valido utilizzando is file.

(Non posso provarlo al momento, quindi dovrai giocarci e vedere cosa ottieni)

Non sono un tipo Python ma sembra os.readlink()?La logica che userei in Perl è usare readlink() per trovare il target e usare stat() per testare per vedere se il target esiste.

Modificare:Ho creato un po' di perl per la demo readlink.Credo che stat e readlink di Perl e os.stat() e os.readlink() di Python siano entrambi wrapper per le chiamate di sistema, quindi questo dovrebbe tradursi in modo ragionevole come codice di prova:

wembley 0 /home/jj33/swap > cat p
my $f = shift;

while (my $l = readlink($f)) {
  print "$f -> $l\n";
  $f = $l;
}

if (!-e $f) {
  print "$f doesn't exist\n";
}
wembley 0 /home/jj33/swap > ls -l | grep ^l
lrwxrwxrwx    1 jj33  users          17 Aug 21 14:30 link -> non-existant-file
lrwxrwxrwx    1 root     users          31 Oct 10  2007 mm -> ../systems/mm/20071009-rewrite//
lrwxrwxrwx    1 jj33  users           2 Aug 21 14:34 mmm -> mm/
wembley 0 /home/jj33/swap > perl p mm
mm -> ../systems/mm/20071009-rewrite/
wembley 0 /home/jj33/swap > perl p mmm
mmm -> mm
mm -> ../systems/mm/20071009-rewrite/
wembley 0 /home/jj33/swap > perl p link
link -> non-existant-file
non-existant-file doesn't exist
wembley 0 /home/jj33/swap >

Ho avuto un problema simile:come individuare i collegamenti simbolici interrotti, anche quando si verificano in alcune directory principali?Volevo anche registrarli tutti (in un'applicazione che gestisce un numero abbastanza elevato di file), ma senza troppe ripetizioni.

Ecco cosa mi è venuto in mente, compresi i test unitari.

fileutil.py:

import os
from functools import lru_cache
import logging

logger = logging.getLogger(__name__)

@lru_cache(maxsize=2000)
def check_broken_link(filename):
    """
    Check for broken symlinks, either at the file level, or in the
    hierarchy of parent dirs.
    If it finds a broken link, an ERROR message is logged.
    The function is cached, so that the same error messages are not repeated.

    Args:
        filename: file to check

    Returns:
        True if the file (or one of its parents) is a broken symlink.
        False otherwise (i.e. either it exists or not, but no element
        on its path is a broken link).

    """
    if os.path.isfile(filename) or os.path.isdir(filename):
        return False
    if os.path.islink(filename):
        # there is a symlink, but it is dead (pointing nowhere)
        link = os.readlink(filename)
        logger.error('broken symlink: {} -> {}'.format(filename, link))
        return True
    # ok, we have either:
    #   1. a filename that simply doesn't exist (but the containing dir
           does exist), or
    #   2. a broken link in some parent dir
    parent = os.path.dirname(filename)
    if parent == filename:
        # reached root
        return False
    return check_broken_link(parent)

Test unitari:

import logging
import shutil
import tempfile
import os

import unittest
from ..util import fileutil


class TestFile(unittest.TestCase):

    def _mkdir(self, path, create=True):
        d = os.path.join(self.test_dir, path)
        if create:
            os.makedirs(d, exist_ok=True)
        return d

    def _mkfile(self, path, create=True):
        f = os.path.join(self.test_dir, path)
        if create:
            d = os.path.dirname(f)
            os.makedirs(d, exist_ok=True)
            with open(f, mode='w') as fp:
                fp.write('hello')
        return f

    def _mklink(self, target, path):
        f = os.path.join(self.test_dir, path)
        d = os.path.dirname(f)
        os.makedirs(d, exist_ok=True)
        os.symlink(target, f)
        return f

    def setUp(self):
        # reset the lru_cache of check_broken_link
        fileutil.check_broken_link.cache_clear()

        # create a temporary directory for our tests
        self.test_dir = tempfile.mkdtemp()

        # create a small tree of dirs, files, and symlinks
        self._mkfile('a/b/c/foo.txt')
        self._mklink('b', 'a/x')
        self._mklink('b/c/foo.txt', 'a/f')
        self._mklink('../..', 'a/b/c/y')
        self._mklink('not_exist.txt', 'a/b/c/bad_link.txt')
        bad_path = self._mkfile('a/XXX/c/foo.txt', create=False)
        self._mklink(bad_path, 'a/b/c/bad_path.txt')
        self._mklink('not_a_dir', 'a/bad_dir')

    def tearDown(self):
        # Remove the directory after the test
        shutil.rmtree(self.test_dir)

    def catch_check_broken_link(self, expected_errors, expected_result, path):
        filename = self._mkfile(path, create=False)
        with self.assertLogs(level='ERROR') as cm:
            result = fileutil.check_broken_link(filename)
            logging.critical('nothing')  # trick: emit one extra message, so the with assertLogs block doesn't fail
        error_logs = [r for r in cm.records if r.levelname is 'ERROR']
        actual_errors = len(error_logs)
        self.assertEqual(expected_result, result, msg=path)
        self.assertEqual(expected_errors, actual_errors, msg=path)

    def test_check_broken_link_exists(self):
        self.catch_check_broken_link(0, False, 'a/b/c/foo.txt')
        self.catch_check_broken_link(0, False, 'a/x/c/foo.txt')
        self.catch_check_broken_link(0, False, 'a/f')
        self.catch_check_broken_link(0, False, 'a/b/c/y/b/c/y/b/c/foo.txt')

    def test_check_broken_link_notfound(self):
        self.catch_check_broken_link(0, False, 'a/b/c/not_found.txt')

    def test_check_broken_link_badlink(self):
        self.catch_check_broken_link(1, True, 'a/b/c/bad_link.txt')
        self.catch_check_broken_link(0, True, 'a/b/c/bad_link.txt')

    def test_check_broken_link_badpath(self):
        self.catch_check_broken_link(1, True, 'a/b/c/bad_path.txt')
        self.catch_check_broken_link(0, True, 'a/b/c/bad_path.txt')

    def test_check_broken_link_badparent(self):
        self.catch_check_broken_link(1, True, 'a/bad_dir/c/foo.txt')
        self.catch_check_broken_link(0, True, 'a/bad_dir/c/foo.txt')
        # bad link, but shouldn't log a new error:
        self.catch_check_broken_link(0, True, 'a/bad_dir/c')
        # bad link, but shouldn't log a new error:
        self.catch_check_broken_link(0, True, 'a/bad_dir')

if __name__ == '__main__':
    unittest.main()

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow