string.sub issue with non-English characters

Question 1

Lua strings are 8-bit clean, which means strings in Lua are a stream of bytes. The UTF-8 character ñ has multiple bytes, but someText:sub(1,1) returns only the first single byte.

For UTF-8 encoding, all characters in the ASCII range have the same representation as in ASCII, that is, a single byte smaller than 128. For other CodePoints, a sequences of bytes where the first byte is in the range 194-244 and continuation bytes are in the range 128-191.

Because of this, you can use the pattern ".[\128-\191]*" to match a single UTF-8 CodePoint (not Grapheme):

for c in "ñññññññ":gmatch(".[\128-\191]*") do -- pretend the first string is in NFC
    print(c)
end

Output:

ñ
ñ
ñ
ñ
ñ
ñ
ñ

Question 2

Regarding the used character set: Just know what requirements you bake into your own code, and make sure those are actually satisfied. There are various typical requirements:

ASCII-compatible (aka any byte < 128 represents an ASCII character and all ASCII characters are represented as themselves)
Fixed-Size vs Variable-Width (maybe self-synchronizing) Character Set
No embedded 0-bytes

Write your code so you need as few of those requirements as cannot be avoided, and document them.

match a single UTF-8 character: Be sure what you mean by UTF-8 character. Is it Glyph or CodePoint? AFAIK you need full unicode-tables for glyph-matching. Do you actually have to get to this level at all?