Working with Perso-Arabic Text (count + joining issues)
-
12-02-2021 - |
Domanda
I'm trying to display the Persian alphabet in the Debug window in order. Once it reaches the final letter, subsequent letters will be multiplied by the group they are in. So, for example, if I want to display alef (ا) - Persian for the letter "A", I would start with the index of 1. If I reach an index of 33 (Persian has 32 letters), it should display (اا).
The code below works just fine for the Latin alphabet (e.g. "abcdefj..."), but with Persian/Arabic, I've got two problems.
- It gives a count of 33 instead of 32 - i.e. after the letter "ه" it produces a blank character. I suspect it is this, but don't know how to account for it.
For characters that need to double up, like "ش ش" (without the space) it shows as "شش".
Sub Main() Dim t As New PersianAlphabet For i = 1 To 50 Debug.WriteLine(t.NextLetter()) Next End Sub Public Class PersianAlphabet Private charArray As String Private charCount As Integer Private CurrentNumber As Integer = 0 Sub New() 'Dim charArray1() = {"ا", "ب", "پ", "ت", "ث", "ج", "چ", "ح", "خ", "د", "ذ", "ر", "ز", "ژ", "س", "ش", "ص", "ض", "ط", "ظ", "ع", "غ", "ف", "ق", "ک", "گ", "ل", "م", "ن", "و", "ه", "ی"} 'Dim joined As String = String.Join("", charArray1) 'Me.charArray = joined Me.charArray = "ابپتثجچحخدذرزژسشصضطظعغفقکگلمنوهی" Me.charCount = charArray.ToCharArray.Count End Sub Public Function NextLetter(Optional ByVal StartAt As Integer = 1) As String Dim count = (Me.CurrentNumber + StartAt) Dim divisor = count / Me.charCount Dim outstring As New StringBuilder If divisor <= 1 Then outstring.Append(charArray(Int32.Parse(count - 1))) Else Dim tempAlphaCount = Int(divisor) + 1 Dim groupRange = Int(divisor) * Me.charCount Dim alphaIndex = count - groupRange If alphaIndex = 0 Then tempAlphaCount = tempAlphaCount - 1 alphaIndex = Me.charCount End If alphaIndex -= 1 For i = 0 To tempAlphaCount - 1 outstring.Append(charArray(Int32.Parse(alphaIndex))) Next End If Me.CurrentNumber += 1 Return outstring.ToString End Function End Class
Has anyone dealt with these two kinds of issues before? Any thoughts/advice?
Soluzione
Figured out that there is a Unicode of 2805 (not 2804 as expected) within the string. Removing that gives the correct count. Putting that between two letters also allows for non-joining letters.