キャラクターカウントマイナスHTML文字C＃

https://stackoverflow.com/questions/3891685

28-09-2019
|

質問

文字列内の文字の数を数える方法を見つけ、文字列を切り捨ててから返しようとしています。ただし、HTMLタグをカウントしないには、この関数が必要です。問題は、HTMLタグをカウントする場合、切り捨てポイントがタグの中央にある場合、ページが壊れているように見えることです。

これは私がこれまでに持っているものです...

public string Truncate(string input, int characterLimit, string currID) {
    string output = input;

    // Check if the string is longer than the allowed amount
    // otherwise do nothing
    if (output.Length > characterLimit && characterLimit > 0) {

        // cut the string down to the maximum number of characters
        output = output.Substring(0, characterLimit);

        // Check if the character right after the truncate point was a space
        // if not, we are in the middle of a word and need to remove the rest of it
        if (input.Substring(output.Length, 1) != " ") {
            int LastSpace = output.LastIndexOf(" ");

            // if we found a space then, cut back to that space
            if (LastSpace != -1)
            {
                output = output.Substring(0, LastSpace);
            }
        }
        // end any anchors
        if (output.Contains("<a href")) {
            output += "</a>";
        }
        // Finally, add the "..." and end the paragraph
        output += "<br /><br />...<a href='Announcements.aspx?ID=" + currID + "'>see more</a></p>";
    }
    return output;
}

しかし、私はこれに満足していません。これを行うためのより良い方法はありますか？これに対する新しいソリューション、またはおそらく私がこれまでに持っているものに何を追加するかについての提案を提供できれば、それは素晴らしいことです。

免責事項：私はC＃と仕事をしたことがないので、言語に関連する概念に精通していません...私はこれをやっています。

ありがとう、フリスト

解決

問題に適したツールを使用してください。

HTMLは、解析する簡単な形式ではありません。使用することをお勧めします証明された、既存のパーサー自分で転がるのではなく。 XHTMLのみを解析することがわかっている場合は、代わりにXMLパーサーを使用できます。

これらは、セマンティック表現を保持するHTMLで操作を実行する唯一の信頼できる方法です。

正規表現を使用しようとしないでください. 。 HTMLは正規の言語ではなく、その方向に悲しみと悲惨さを自分に引き起こすことしかできません。

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow