I finally came back to this project and I also found a solution to my problem.
First in this section:
from item in xdoc.Descendants(ns + "body").Descendants(ns + "div").Descendants(ns + "p")
select new TTMLElement
{
text = item,
startTime = TimeSpan.Parse(item.Attribute("begin").Value),
endTime = item.Attribute("dur") != null ?
TimeSpan.Parse(item.Attribute("begin").Value).Add(TimeSpan.Parse(item.Attribute("dur").Value)) :
TimeSpan.Parse(item.Attribute("end").Value)
}
item is of type XElement so an XmlReader object can be created from it resulting in this function:
private static string ReadInnerXML(XElement parent)
{
var reader = parent.CreateReader();
reader.MoveToContent();
var innerText = reader.ReadInnerXml();
return innerText;
}
For my purposes of removing the html inside the node I modified the function to look like this:
private static string ReadInnerXML(XElement parent)
{
var reader = parent.CreateReader();
reader.MoveToContent();
var innerText = reader.ReadInnerXml();
innerText = Regex.Replace(innerText, "<.+?>", " ");
return innerText;
}
Finally resulting in the above lambda looking like this:
from item in xdoc.Descendants(ns + "body").Descendants(ns + "div").Descendants(ns + "p")
select new TTMLElement
{
text = ReadInnerXML(item),
startTime = TimeSpan.Parse(item.Attribute("begin").Value),
endTime = item.Attribute("dur") != null ?
TimeSpan.Parse(item.Attribute("begin").Value).Add(TimeSpan.Parse(item.Attribute("dur").Value)) :
TimeSpan.Parse(item.Attribute("end").Value)
}