I have (for now) just stripped the richtext xml element of all XML tags and unwanted embedded elements using Regex with the following expressions:
//Removes all attachmentref elements
newString = new Regex(@"(<attachmentref(.|\n)*</attachmentref>)").Replace(newString, "");
//Removes all formula elements
newString = new Regex(@"(<formula(.|\n)*</formula>)").Replace(newString, "");
//Removes all xml tags (<par>, <pardef>, <table> etc). Be aware that this also removes any content in the table
newString = new Regex("<(.)*/>").Replace(newString, "");
newString = new Regex("<(.)*>").Replace(newString, "");
newString = new Regex("</(.)*>").Replace(newString, "");
//Trims the text to tidy up the many \n, \r and white-spaces introduced by removing the xml tags.
newString = new Regex(@"\r").Replace(newString, "\n");
newString = new Regex(@"[ \f\r\t\v]+\n").Replace(newString, "\n");
newString = new Regex(@"\n{2,}").Replace(newString, "\n");
//makes < and > appear correctly in the text.
newString = newString.Replace("<", "<").Replace(">", ">");
Its not pretty, but at least the text is readable and some sense of linebreaks are preserved.