Question

I've been working on a step for F4SP 2010, where I had to get the HTML content from the Site crawled of Sharepoint, and get a div into it.

My question is not how to do that.

I have the main problem that I do not know what crawled property uses the crawler to save theHTML content.

Could anyone help me identifying this property?

Était-ce utile?

La solution

Html content is usually in the crawled property:

Property Set: 11280615-f653-448f-8ed8-2915008789f2 Variant type: 31 Name: html

(http://msdn.microsoft.com/en-us/library/ff795815.aspx)

You can also add a "Spy" stage to examine what crawled properties are available in the pipeline - and see if you can find the one with the correct content. Or enable FFDDumper.

Also remember that crawled property names are case sensitive inside the pipeline.

Autres conseils

It depends on which column was used to store the HTML content in the list for which you want to get the data. With default Publishing Sites, the column name is PublishingPageContent, so I assume the crawled property is ows_PublishingPageContent.

I believe that the data that you get from crawled property is IEnumerable. So, I wonder that how you convert it into byte[] and then string?

I have tried this with no success because the "invalid character in a Base-64" is retuned.

XDocument inputDoc = XDocument.Load(args[0]);
            var data = from cp in inputDoc.Descendants("CrawledProperty")
                       where new Guid(cp.Attribute("propertySet").Value).Equals(FMTID_SummaryInformation_DATA) &&
                       cp.Attribute("propertyName").Value == "data" &&
                       cp.Attribute("varType").Value == "31"
                       select cp.Value;

byte[] html = System.Convert.FromBase64String(data);

string dataHtml = Encoding.UTF8.GetString(html);

Edit: Finally, I figured it out using this...

var data = from cp in inputDoc.Descendants("CrawledProperty")
           where new Guid(cp.Attribute("propertySet").Value).Equals(FMTID_SummaryInformation_DATA) &&
               cp.Attribute("propertyName").Value == "data" &&
               cp.Attribute("varType").Value == "31"
           select cp.Value;

string datahtml_base64 = "";

foreach (string datas in data)
{
    datahtml_base64 += datas;
}

byte[] html = System.Convert.FromBase64String(datahtml_base64);
string dataHtml = Encoding.UTF8.GetString(html);

I solved it taking the 'data' crawled property and passing from b64 to byte[] and then using Encoding.UTF8.GetString(byte[]) I could have the html.

Thanks for the two answerers

Licencié sous: CC-BY-SA avec attribution
Non affilié à sharepoint.stackexchange
scroll top