문제

I've been working on a step for F4SP 2010, where I had to get the HTML content from the Site crawled of Sharepoint, and get a div into it.

My question is not how to do that.

I have the main problem that I do not know what crawled property uses the crawler to save theHTML content.

Could anyone help me identifying this property?

도움이 되었습니까?

해결책

Html content is usually in the crawled property:

Property Set: 11280615-f653-448f-8ed8-2915008789f2 Variant type: 31 Name: html

(http://msdn.microsoft.com/en-us/library/ff795815.aspx)

You can also add a "Spy" stage to examine what crawled properties are available in the pipeline - and see if you can find the one with the correct content. Or enable FFDDumper.

Also remember that crawled property names are case sensitive inside the pipeline.

다른 팁

It depends on which column was used to store the HTML content in the list for which you want to get the data. With default Publishing Sites, the column name is PublishingPageContent, so I assume the crawled property is ows_PublishingPageContent.

I believe that the data that you get from crawled property is IEnumerable. So, I wonder that how you convert it into byte[] and then string?

I have tried this with no success because the "invalid character in a Base-64" is retuned.

XDocument inputDoc = XDocument.Load(args[0]);
            var data = from cp in inputDoc.Descendants("CrawledProperty")
                       where new Guid(cp.Attribute("propertySet").Value).Equals(FMTID_SummaryInformation_DATA) &&
                       cp.Attribute("propertyName").Value == "data" &&
                       cp.Attribute("varType").Value == "31"
                       select cp.Value;

byte[] html = System.Convert.FromBase64String(data);

string dataHtml = Encoding.UTF8.GetString(html);

Edit: Finally, I figured it out using this...

var data = from cp in inputDoc.Descendants("CrawledProperty")
           where new Guid(cp.Attribute("propertySet").Value).Equals(FMTID_SummaryInformation_DATA) &&
               cp.Attribute("propertyName").Value == "data" &&
               cp.Attribute("varType").Value == "31"
           select cp.Value;

string datahtml_base64 = "";

foreach (string datas in data)
{
    datahtml_base64 += datas;
}

byte[] html = System.Convert.FromBase64String(datahtml_base64);
string dataHtml = Encoding.UTF8.GetString(html);

I solved it taking the 'data' crawled property and passing from b64 to byte[] and then using Encoding.UTF8.GetString(byte[]) I could have the html.

Thanks for the two answerers

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 sharepoint.stackexchange
scroll top