Here's a new answer, based on the comments to the previous one.
We'll use a single TCP socket, and send each file by alternating sending name and contents, as netstrings, for each file, all in one big stream.
I'm assuming Python 2.6, that the filesystems on both sides use the same encoding, and that you don't need lots of concurrent clients (but you might occasionally need, say, two—e.g., the real one, and a tester). And I'm again assuming you've got a module filegenerator
whose generate()
method registers with inotify
, queues up notifications, and yield
s them one by one.
client.py:
import contextlib
import socket
import filegenerator
sock = socket.socket()
with contextlib.closing(sock):
sock.connect((HOST, 12345))
for filename in filegenerator.generate():
with open(filename, 'rb') as f:
contents = f.read()
buf = '{0}:{1},{2}:{3},'.format(len(filename), filename,
len(contents), contents)
sock.sendall(buf)
server.py:
import contextlib
import socket
import threading
def pairs(iterable):
return zip(*[iter(iterable)]*2)
def netstrings(conn):
buf = ''
while True:
newbuf = conn.recv(1536*1024)
if not newbuf:
return
buf += newbuf
while True:
colon = buf.find(':')
if colon == -1:
break
length = int(buf[:colon])
if len(buf) >= colon + length + 2:
if buf[colon+length+1] != ',':
raise ValueError('Not a netstring')
yield buf[colon+1:colon+length+1]
buf = buf[colon+length+2:]
def client(conn):
with contextlib.closing(conn):
for filename, contents in pairs(netstrings(conn)):
with open(filename, 'wb') as f:
f.write(contents)
sock = socket.socket()
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
with contextlib.closing(sock):
sock.bind(('0.0.0.0', 12345))
sock.listen(1)
while True:
conn, addr = sock.accept()
t = threading.Thread(target=client, args=[conn])
t.daemon = True
t.start()
If you need more than about 200 clients on Windows, 100 on linux and BSD (including Mac), a dozen on less good platforms, you probably want to go with an event loop design instead of a threaded design, using epoll
on linux, kqueue
on BSD, and IO completion ports on Windows. This an be painful, but fortunately, there are frameworks that wrap everything up for you. Two popular (and very different) choices are Twisted and gevent.
One nice thing about gevent
in particular is that you can write threaded code today, and with a handful of simple changes turn it into event-based code like magic.
On the other hand, if you're eventually going to want event-based code, it's probably better to learn and use a framework from the start, so you don't have to deal with all the fiddly bits of accept
ing and looping around recv
until you get a full message and shutting down cleanly and so on, and just write the parts you care about. After all, more than half the code above is basically boilerplate for stuff that every server shares, so if you don't have to write it, why bother?
In a comment, you said:
Also the files are binary, so it's possible that I'll have problems if client encodings are diferent from server's.
Notice that I opened each file in binary mode ('rb'
and 'wb'
), and intentionally chose a protocol (netstrings) that can handle binary strings without trying to interpret them as characters or treat embedded NUL characters as EOF or anything like that. And, while I'm using str.format
, in Python 2.x that won't do any implicit encoding unless you feed it unicode
strings or give it locale-based format types, neither of which I'm doing. (Note that in 3.x, you'd need to use bytes
instead of str
, which would change a bit of the code.)
In other words, the client and server encodings don't enter into it; you're doing a binary transfer exactly the same as FTP's I mode.
But what if you wanted the opposite, to transfer text and reencode automatically for the target system? There are three easy ways to do that:
- Send the client's encoding (either once at the top, or once per file), and on the server, decode from the client and reencode to the local file.
- Do everything in text/unicode mode, even the socket. This is silly, and in 2.x it's hard to do as well.
- Define an wire encoding—say, UTF-8. The client is responsible for decoding files and encoding to UTF-8 for send; the server is responsible for decoding UTF-8 on receive and encoding files.
Going with the third option, assuming that the files are going to be in your default filesystem encoding, the changed client code is:
with io.open(filename, 'r', encoding=sys.getfilesystemencoding()) as f:
contents = f.read().encode('utf-8')
And on the server:
with io.open(filename, 'w', encoding=sys.getfilesystemencoding()) as f:
f.write(contents.decode('utf-8'))
The io.open
function also, by default, uses universal newlines, so the client will translate anything into Unix-style newlines, and the server will translate to its own native newline type.
Note that FTP's T mode actually doesn't do any re-encoding; it only does newline conversion (and a more limited version of it).