There is no null terminator in Python strings. If you want to send one, you have to do it explicitly: sock.sendall(bytes(data, "utf-8") + b'\0')
.
However, there's no good reason to add a null terminator in the first place, unless you're planning to use it as a delimiter between messages. (Note that this won't work for general Python strings, because they're allowed to include null bytes in the middle… but it will work fine for real human-readable text, of course.)
Using null bytes as a delimiter is not a bad idea… but your existing code needs to actually handle that. You can't just call recv(1024)
and assume it's a whole message; you have to keep calling recv(1024)
in a loop and appending to a buffer until you find a null—and then save everything after that null for the next time through the loop.
Anyway, the sendall
method doesn't return the number of bytes sent because it always sends exactly the bytes you gave it (unless there's an error, in which case is raises). So:
buf = bytes(data, "utf-8") + b'\0'
sock.sendall(buf)
bytes_sent = len(buf)
And on the server side, you might want to write a NullTerminatedHandler class like this:
class NullTerminatedHandler(socketserver.BaseRequestHandler):
def __init__(self):
self.buf = b''
def handle(self):
self.buf += self.request.recv(1024)
messages = self.buf.split(b'\0')
for message in messages[:-1]:
self.handle_message(message)
self.buf = self.buf[:-1]
Then you can use it like this:
class MyTCPHandler(NullTerminatedHandler):
def handle_message(self, message):
print(str(self.client_address[0]) + " wrote: " + str(message.decode()))
While we're at it, you've got some Unicode/string issues. From most serious to least:
- You should almost never just call
decode
with no argument. If you're sending UTF-8 data on one side, always explicitlydecode('utf-8')
on the other. - The
decode
method is guaranteed to return astr
, so writingstr(message.decode())
just makes your code confusing. - There's a reason the sample code uses
format
instead of callingstr
on a bunch of objects and concatenating them—it's usually a lot easier to read. - It's generally more readable to say
data.encode('utf-8')
thanbytes(data, 'utf-8')
.