A possible approach:
- Take first N-1 bytes of the repr of the whole string.
- Examine last 3 bytes to see if you broke an escape sequence and cut of bytes if necessary
- Append a quote, keeping in mind that it may be
'
or"
. - Eval the repr back to utf-8.
- Examine the last few bytes to see if you broke the string in the middle of a Unicode code point and cut out bytes if necessary. You can tell apart leading bytes and continuation bytes by examining the bit pattern.