How do I iterate through unicode symbols, not bytes in python?

Question 1

First of all, in regard to iterating over characters instead bytes, you're already doing it right - your word is an unicode object, not an encoded bytestring.

Now, for combination characters in Unicode:

For many characters containing combination characters there is a composed and decomposed form of writing it, the composed being one code point, and the decomposed a sequence of two (or more?) code points:

Composed and decomposed form of c with cedilla

See U+00E7, U+0063 and U+0327

So in Python, you could either write either form, it will get composed at display time to the same character:

>>> combining_cedilla = u'\u0327'
>>> c_with_cedilla = u'\u00e7'
>>> letter_c = u'\u0063'
>>>
>>> print c_with_cedilla
ç
>>> print letter_c + combining_cedilla
ç

In order to convert between composed and decomposed forms, you can use unicodedata.normalize():

>>> import unicodedata
>>> comp = unicodedata.normalize('NFC', letter_c + combining_cedilla)
>>> decomp = unicodedata.normalize('NFD', c_with_cedilla)
>>>
>>> print comp
ç
>>> print decomp
ç

(NFC stands for "normal form C" (composed), and NFD for "normal form D" (decomposed).

They still are different forms though - one consisting of one code point, the other of two:

>>> comp == decomp
False
>>> len(comp)
1
>>> len(decomp)
2

However, in your case, there simply does not seem to be a combined character for the lowercase и with an accent acute (there is one for и with an accent grave)

Question 2

You can produce [u'к', u'н', u'и́', u'г', u'а'] with the regex module.

Here is the word you have by each user perceived character:

>>> import regex
>>> word = u'кни́га'
>>> len(word)
6
>>> regex.findall(r'\X', word)
['к', 'н', 'и́', 'г', 'а']
>>> len(regex.findall(r'\X', word))
5

Question 3

Acutes are represented by codepoint 301, COMBINING ACUTE ACCENT, so a simple string character replacement should suffice:

>>>print u'кни́га'.replace(u'\u0301', "+")
кни+га

If you encounter accented characters that are not encoded with a combining accent, unicodedata.normalize should do the trick