Come posso guardare un file per le modifiche?

https://stackoverflow.com/questions/182197

05-07-2019
|

Domanda

Ho un file di registro che viene scritto da un altro processo che voglio vedere per le modifiche. Ogni volta che si verifica una modifica, vorrei leggere i nuovi dati per fare qualche elaborazione su di essi.

Qual è il modo migliore per farlo? Speravo che ci fosse una specie di gancio dalla libreria PyWin32. Ho trovato la funzione win32file.FindNextChangeNotification ma non ho idea di come chiederlo per guardare un file specifico.

Se qualcuno ha fatto qualcosa del genere, sarei davvero grato di sapere come ...

[Modifica] Avrei dovuto menzionare che stavo cercando una soluzione che non richiedesse il polling.

[Modifica] Maledizioni! Sembra che questo non funzioni su un'unità di rete mappata. Immagino che Windows non "ascolti" alcun aggiornamento del file come avviene su un disco locale.

Soluzione

Hai già consultato la documentazione disponibile su http://timgolden.me.uk /python/win32_how_do_i/watch_directory_for_changes.html ? Se hai solo bisogno che funzioni sotto Windows, il secondo esempio sembra essere esattamente quello che vuoi (se cambi il percorso della directory con quello del file che vuoi guardare).

Altrimenti, il polling sarà probabilmente l'unica opzione realmente indipendente dalla piattaforma.

Nota: non ho provato nessuna di queste soluzioni.

Altri suggerimenti

Hai provato a utilizzare Watchdog ?

Libreria API Python e utilità di shell per monitorare gli eventi del file system.

Monitoraggio della directory semplificato con


Un'API multipiattaforma.

Uno strumento shell per eseguire comandi in risposta alle modifiche alla directory.


Inizia rapidamente con un semplice esempio in Avvio rapido ...

Se il polling è abbastanza buono per te, guarderei solo se il "tempo modificato" è modifiche alle statistiche del file. Per leggerlo:

os.stat(filename).st_mtime

(Si noti inoltre che la soluzione di eventi di modifica nativa di Windows non funziona in tutte le circostanze, ad esempio su unità di rete.)

import os

class Monkey(object):
    def __init__(self):
        self._cached_stamp = 0
        self.filename = '/path/to/file'

    def ook(self):
        stamp = os.stat(self.filename).st_mtime
        if stamp != self._cached_stamp:
            self._cached_stamp = stamp
            # File has changed, so do something...

Se si desidera una soluzione multipiattaforma, selezionare QFileSystemWatcher . Ecco un codice di esempio (non sterilizzato):

from PyQt4 import QtCore

@QtCore.pyqtSlot(str)
def directory_changed(path):
    print('Directory Changed!!!')

@QtCore.pyqtSlot(str)
def file_changed(path):
    print('File Changed!!!')

fs_watcher = QtCore.QFileSystemWatcher(['/path/to/files_1', '/path/to/files_2', '/path/to/files_3'])

fs_watcher.connect(fs_watcher, QtCore.SIGNAL('directoryChanged(QString)'), directory_changed)
fs_watcher.connect(fs_watcher, QtCore.SIGNAL('fileChanged(QString)'), file_changed)

Non dovrebbe funzionare su Windows (forse con Cygwin?), ma per l'utente unix, dovresti usare " fcntl " chiamata di sistema. Ecco un esempio in Python. È principalmente lo stesso codice se devi scriverlo in C (stessi nomi di funzione)

import time
import fcntl
import os
import signal

FNAME = "/HOME/TOTO/FILETOWATCH"

def handler(signum, frame):
    print "File %s modified" % (FNAME,)

signal.signal(signal.SIGIO, handler)
fd = os.open(FNAME,  os.O_RDONLY)
fcntl.fcntl(fd, fcntl.F_SETSIG, 0)
fcntl.fcntl(fd, fcntl.F_NOTIFY,
            fcntl.DN_MODIFY | fcntl.DN_CREATE | fcntl.DN_MULTISHOT)

while True:
    time.sleep(10000)

Guarda pyinotify .

inotify sostituisce dnotify (da una risposta precedente) in Linux più recenti e consente il monitoraggio a livello di file anziché a livello di directory.

Bene, dopo un po 'di hacking della sceneggiatura di Tim Golden, ho il seguente che sembra funzionare abbastanza bene:

import os

import win32file
import win32con

path_to_watch = "." # look at the current directory
file_to_watch = "test.txt" # look for changes to a file called test.txt

def ProcessNewData( newData ):
    print "Text added: %s"%newData

# Set up the bits we'll need for output
ACTIONS = {
  1 : "Created",
  2 : "Deleted",
  3 : "Updated",
  4 : "Renamed from something",
  5 : "Renamed to something"
}
FILE_LIST_DIRECTORY = 0x0001
hDir = win32file.CreateFile (
  path_to_watch,
  FILE_LIST_DIRECTORY,
  win32con.FILE_SHARE_READ | win32con.FILE_SHARE_WRITE,
  None,
  win32con.OPEN_EXISTING,
  win32con.FILE_FLAG_BACKUP_SEMANTICS,
  None
)

# Open the file we're interested in
a = open(file_to_watch, "r")

# Throw away any exising log data
a.read()

# Wait for new data and call ProcessNewData for each new chunk that's written
while 1:
  # Wait for a change to occur
  results = win32file.ReadDirectoryChangesW (
    hDir,
    1024,
    False,
    win32con.FILE_NOTIFY_CHANGE_LAST_WRITE,
    None,
    None
  )

  # For each change, check to see if it's updating the file we're interested in
  for action, file in results:
    full_filename = os.path.join (path_to_watch, file)
    #print file, ACTIONS.get (action, "Unknown")
    if file == file_to_watch:
        newText = a.read()
        if newText != "":
            ProcessNewData( newText )

Probabilmente potrebbe avere a che fare con un carico maggiore di controllo degli errori, ma per guardare semplicemente un file di registro e fare un po 'di elaborazione prima di sputarlo sullo schermo, funziona bene.

Grazie a tutti per il vostro contributo - grandi cose!

La soluzione più semplice per me è usare lo strumento watchdedo dello watchdog

Da https://pypi.python.org/pypi/watchdog ora ho un processo che cerca i file sql in una directory e li esegue se necessario.

watchmedo shell-command \
--patterns="*.sql" \
--recursive \
--command='~/Desktop/load_files_into_mysql_database.sh' \
.

Controlla la mia risposta a una domanda simile . Puoi provare lo stesso loop in Python. Questa pagina suggerisce:

import time

while 1:
    where = file.tell()
    line = file.readline()
    if not line:
        time.sleep(1)
        file.seek(where)
    else:
        print line, # already has newline

Vedi anche la domanda tail () un file con Python .

Bene, dato che stai usando Python, puoi semplicemente aprire un file e continuare a leggere le righe da esso.

f = open('file.log')

Se la riga letta è non vuota , la elabori.

line = f.readline()
if line:
    // Do what you want with the line

Potresti perdere l'idea che continui a chiamare readline presso l'EOF. In questo caso continuerà a restituire una stringa vuota. E quando qualcosa viene aggiunto al file di registro, la lettura continuerà da dove si è fermata, di cui hai bisogno.

Se stai cercando una soluzione che utilizza eventi o una particolare libreria, specifica questo nella tua domanda. Altrimenti, penso che questa soluzione vada bene.

Ecco una versione semplificata del codice di Kender che sembra fare lo stesso trucco e non importa l'intero file:

# Check file for new data.

import time

f = open(r'c:\temp\test.txt', 'r')

while True:

    line = f.readline()
    if not line:
        time.sleep(1)
        print 'Nothing New'
    else:
        print 'Call Function: ', line

Per guardare un singolo file con polling e dipendenze minime, ecco un esempio completo, basato sulla risposta di Deestan (sopra):

import os
import sys 
import time

class Watcher(object):
    running = True
    refresh_delay_secs = 1

    # Constructor
    def __init__(self, watch_file, call_func_on_change=None, *args, **kwargs):
        self._cached_stamp = 0
        self.filename = watch_file
        self.call_func_on_change = call_func_on_change
        self.args = args
        self.kwargs = kwargs

    # Look for changes
    def look(self):
        stamp = os.stat(self.filename).st_mtime
        if stamp != self._cached_stamp:
            self._cached_stamp = stamp
            # File has changed, so do something...
            print('File changed')
            if self.call_func_on_change is not None:
                self.call_func_on_change(*self.args, **self.kwargs)

    # Keep watching in a loop        
    def watch(self):
        while self.running: 
            try: 
                # Look for changes
                time.sleep(self.refresh_delay_secs) 
                self.look() 
            except KeyboardInterrupt: 
                print('\nDone') 
                break 
            except FileNotFoundError:
                # Action on file not found
                pass
            except: 
                print('Unhandled error: %s' % sys.exc_info()[0])

# Call this function each time a change happens
def custom_action(text):
    print(text)

watch_file = 'my_file.txt'

# watcher = Watcher(watch_file)  # simple
watcher = Watcher(watch_file, custom_action, text='yes, changed')  # also call custom action function
watcher.watch()  # start the watch going

Come puoi vedere in L'articolo di Tim Golden , sottolineato da Horst Gutmann , WIN32 è relativamente complesso e controlla le directory, non un singolo file.

Vorrei suggerire di esaminare IronPython , che è un .NET implementazione di Python. Con IronPython è possibile utilizzare tutte le funzionalità .NET , incluso

System.IO.FileSystemWatcher

Che gestisce singoli file con una semplice interfaccia Evento .

Questa è un'altra modifica dello script di Tim Goldan che gira su Linux e aggiunge un semplice watcher per la modifica dei file usando un dict (file = > time).

utilizzo: whateverName.py path_to_dir_to_watch

#!/usr/bin/env python

import os, sys, time

def files_to_timestamp(path):
    files = [os.path.join(path, f) for f in os.listdir(path)]
    return dict ([(f, os.path.getmtime(f)) for f in files])

if __name__ == "__main__":

    path_to_watch = sys.argv[1]
    print "Watching ", path_to_watch

    before = files_to_timestamp(path_to_watch)

    while 1:
        time.sleep (2)
        after = files_to_timestamp(path_to_watch)

        added = [f for f in after.keys() if not f in before.keys()]
        removed = [f for f in before.keys() if not f in after.keys()]
        modified = []

        for f in before.keys():
            if not f in removed:
                if os.path.getmtime(f) != before.get(f):
                    modified.append(f)

        if added: print "Added: ", ", ".join(added)
        if removed: print "Removed: ", ", ".join(removed)
        if modified: print "Modified ", ", ".join(modified)

        before = after

Questo è un esempio di controllo di un file per le modifiche. Uno che potrebbe non essere il modo migliore per farlo, ma sicuramente è un modo breve.

Strumento pratico per riavviare l'applicazione quando sono state apportate modifiche alla fonte. L'ho fatto giocando con pygame in modo da poter vedere gli effetti che si verificano immediatamente dopo il salvataggio del file.

Se utilizzato in pygame, assicurati che le cose nel loop 'while' siano inserite nel tuo loop di gioco, noto come aggiornamento o altro. Altrimenti, l'applicazione rimarrà bloccata in un ciclo infinito e non vedrai l'aggiornamento del gioco.

file_size_stored = os.stat('neuron.py').st_size

  while True:
    try:
      file_size_current = os.stat('neuron.py').st_size
      if file_size_stored != file_size_current:
        restart_program()
    except: 
      pass

Nel caso volessi il codice di riavvio che ho trovato sul web. Ecco qui. (Non pertinente alla domanda, anche se potrebbe tornare utile)

def restart_program(): #restart application
    python = sys.executable
    os.execl(python, python, * sys.argv)

Divertiti facendo fare agli elettroni quello che vuoi che facciano.

ACTIONS = {
  1 : "Created",
  2 : "Deleted",
  3 : "Updated",
  4 : "Renamed from something",
  5 : "Renamed to something"
}
FILE_LIST_DIRECTORY = 0x0001

class myThread (threading.Thread):
    def __init__(self, threadID, fileName, directory, origin):
        threading.Thread.__init__(self)
        self.threadID = threadID
        self.fileName = fileName
        self.daemon = True
        self.dir = directory
        self.originalFile = origin
    def run(self):
        startMonitor(self.fileName, self.dir, self.originalFile)

def startMonitor(fileMonitoring,dirPath,originalFile):
    hDir = win32file.CreateFile (
        dirPath,
        FILE_LIST_DIRECTORY,
        win32con.FILE_SHARE_READ | win32con.FILE_SHARE_WRITE,
        None,
        win32con.OPEN_EXISTING,
        win32con.FILE_FLAG_BACKUP_SEMANTICS,
        None
    )
    # Wait for new data and call ProcessNewData for each new chunk that's
    # written
    while 1:
        # Wait for a change to occur
        results = win32file.ReadDirectoryChangesW (
            hDir,
            1024,
            False,
            win32con.FILE_NOTIFY_CHANGE_LAST_WRITE,
            None,
            None
        )
        # For each change, check to see if it's updating the file we're
        # interested in
        for action, file_M in results:
            full_filename = os.path.join (dirPath, file_M)
            #print file, ACTIONS.get (action, "Unknown")
            if len(full_filename) == len(fileMonitoring) and action == 3:
                #copy to main file
                ...

Ecco un esempio orientato alla visione di file di input che scrivono non più di una riga al secondo ma di solito molto meno. L'obiettivo è aggiungere l'ultima riga (la scrittura più recente) al file di output specificato. L'ho copiato da uno dei miei progetti e ho appena eliminato tutte le righe irrilevanti. Dovrai compilare o modificare i simboli mancanti.

from PyQt5.QtCore import QFileSystemWatcher, QSettings, QThread
from ui_main_window import Ui_MainWindow   # Qt Creator gen'd 

class MainWindow(QMainWindow, Ui_MainWindow):
    def __init__(self, parent=None):
        QMainWindow.__init__(self, parent)
        Ui_MainWindow.__init__(self)
        self._fileWatcher = QFileSystemWatcher()
        self._fileWatcher.fileChanged.connect(self.fileChanged)

    def fileChanged(self, filepath):
        QThread.msleep(300)    # Reqd on some machines, give chance for write to complete
        # ^^ About to test this, may need more sophisticated solution
        with open(filepath) as file:
            lastLine = list(file)[-1]
        destPath = self._filemap[filepath]['dest file']
        with open(destPath, 'a') as out_file:               # a= append
            out_file.writelines([lastLine])

Naturalmente, la classe QMainWindow che la comprende non è strettamente richiesta, ad es. puoi usare QFileSystemWatcher da solo.

La soluzione migliore e più semplice è usare la coda di cavallo: https://pypi.python.org/pypi/pygtail

from pygtail import Pygtail

while True:
    for line in Pygtail("some.log"):
        sys.stdout.write(line)

Puoi anche utilizzare una semplice libreria chiamata repyt , ecco un esempio:

repyt ./app.py

Sembra che nessuno abbia pubblicato fswatch . È un osservatore di file system multipiattaforma. Basta installarlo, eseguirlo e seguire le istruzioni.

L'ho usato con i programmi Python e Golang e funziona e basta.

relativa alla soluzione @ 4Oh4 una modifica regolare per un elenco di file da guardare;

import os
import sys
import time

class Watcher(object):
    running = True
    refresh_delay_secs = 1

    # Constructor
    def __init__(self, watch_files, call_func_on_change=None, *args, **kwargs):
        self._cached_stamp = 0
        self._cached_stamp_files = {}
        self.filenames = watch_files
        self.call_func_on_change = call_func_on_change
        self.args = args
        self.kwargs = kwargs

    # Look for changes
    def look(self):
        for file in self.filenames:
            stamp = os.stat(file).st_mtime
            if not file in self._cached_stamp_files:
                self._cached_stamp_files[file] = 0
            if stamp != self._cached_stamp_files[file]:
                self._cached_stamp_files[file] = stamp
                # File has changed, so do something...
                file_to_read = open(file, 'r')
                value = file_to_read.read()
                print("value from file", value)
                file_to_read.seek(0)
                if self.call_func_on_change is not None:
                    self.call_func_on_change(*self.args, **self.kwargs)

    # Keep watching in a loop
    def watch(self):
        while self.running:
            try:
                # Look for changes
                time.sleep(self.refresh_delay_secs)
                self.look()
            except KeyboardInterrupt:
                print('\nDone')
                break
            except FileNotFoundError:
                # Action on file not found
                pass
            except Exception as e:
                print(e)
                print('Unhandled error: %s' % sys.exc_info()[0])

# Call this function each time a change happens
def custom_action(text):
    print(text)
    # pass

watch_files = ['/Users/mexekanez/my_file.txt', '/Users/mexekanez/my_file1.txt']

# watcher = Watcher(watch_file)  # simple



if __name__ == "__main__":
    watcher = Watcher(watch_files, custom_action, text='yes, changed')  # also call custom action function
    watcher.watch()  # start the watch going

Non conosco alcuna funzione specifica di Windows. Puoi provare a ottenere l'hash MD5 del file ogni secondo / minuto / ora (dipende da quanto velocemente ne hai bisogno) e confrontarlo con l'ultimo hash. Quando differisce, sai che il file è stato modificato e leggi le righe più recenti.

Proverei qualcosa di simile.

    try:
            f = open(filePath)
    except IOError:
            print "No such file: %s" % filePath
            raw_input("Press Enter to close window")
    try:
            lines = f.readlines()
            while True:
                    line = f.readline()
                    try:
                            if not line:
                                    time.sleep(1)
                            else:
                                    functionThatAnalisesTheLine(line)
                    except Exception, e:
                            # handle the exception somehow (for example, log the trace) and raise the same exception again
                            raw_input("Press Enter to close window")
                            raise e
    finally:
            f.close()

Il ciclo controlla se ci sono nuove righe dall'ultima volta che il file è stato letto - se c'è, viene letto e passato alla funzione ThatAnalisesTheLine . In caso contrario, lo script attende 1 secondo e ritenta il processo.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow