Question

So I have a Python program that pulls access logs from remote servers and processes them. There are separate log files for each day. The files on the servers are in this format:

access.log
access.log-20130715
access.log-20130717

The file "access.log" is the log file for the current day, and is modified throughout the day with new data. The files with the timestamp appended are archived log files, and are not modified. If any of the files in the directory are ever modified, it is either because (1) data is being added to the "access.log" file, or (2) the "access.log" file is being archived, and an empty file takes its place. Every minute or so, my program checks for the most recent modification time of any files in the directory, and if it changes it pulls down the "access.log" file and any newly archived files

All of this currently works fine. However, if a lot of data is added to the log file throughout the day, downloading the whole thing over and over just to get some of the data at the end of the file will create a lot of traffic on the network, and I would like to avoid that. Is there any way to only download a part of the file? If I have already processed, say 1 GB of the file, and another 500 bytes suddenly get added to the log file, is there a way to only download the 500 bytes at the end?

I am using Python 3.2, my local machine is running Windows, and the remote servers all run Linux. I am using Chilkat for making SSH and SFTP connections. Any help would be greatly appreciated!

Was it helpful?

Solution

Call ResumeDownloadFileByName. Here's the description of the method in the Chilkat reference documentation:

Resumes an SFTP download. The size of the localFilePath is checked and the download begins at the appropriate position in the remoteFilePath. If localFilePath is empty or non-existent, then this method is identical to DownloadFileByName. If the localFilePath is already fully downloaded, then no additional data is downloaded and the method will return True.

See http://www.chilkatsoft.com/refdoc/pythonCkSFtpRef.html

OTHER TIPS

You could do that, or you could massively reduce your complexity by splitting the latest log file down into hours, or tens of minutes.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top