I tried this yesterday but I got no answer, it's true that html has a property in the Spy, but I added to the pipelineextensibility.xml, and I got nothing, tomorrow, I'll try again, and report what I see. Thanks!
–
AmedioJan 26 '12 at 20:36

Mikael, I tried you Logger class to log the input result with a simple log console application like static void Main(string[] args) { Logger.WriteLogFile(args[0], "-input"); Logger.WriteLogFile(args[1], "-output"); } But there is nothing in the log files when I add <CrawledProperty propertySet="11280615-f653-448f-8ed8-2915008789f2" varType="31" propertyName="html"/>
–
AmedioJan 27 '12 at 9:42

1

Then the html is in some other crawled property. At least in the "data" one which is base64 encoded.. but perhaps another as well. If you look through the spy log, see if you can find something which resembles the html your are looking for.
–
Mikael SvensonJan 27 '12 at 12:27

Maybe my solution is to get the byte array of the data property, and saving it to disk, so I can see if it has the html?
–
AmedioJan 27 '12 at 13:08

That was the clue, maybe I was using the wrong encoding to parse b64 resulting bytes to string. Thanks :)
–
AmedioJan 27 '12 at 13:34

It depends on which column was used to store the HTML content in the list for which you want to get the data. With default Publishing Sites, the column name is PublishingPageContent, so I assume the crawled property is ows_PublishingPageContent.

I tried with that property, but had no data returned.
–
AmedioJan 25 '12 at 9:24

Is the property configured to include content in the index, and do you have pages that use that field with content in them?
–
James LoveJan 25 '12 at 9:26

I have the initial configuration, I'm so new with FAST, and they told me to create a stage in pipelineextensibility to get the breadcrumb in HTML format from the page being crawled, so, consider I'm silly and maybe I'm not doing something I need to do. By the moment, I tried de special crawled property 'data' in FAST, which saves binary from the page on a base64 string, but that's the problem, is the binary, and de 'body' property only saves the content parsed, and not complete.
–
AmedioJan 25 '12 at 9:28

1

data is binary, and html has "html" if present. The binary is base64 encoded so can be decoded easily.
–
Mikael SvensonJan 27 '12 at 7:40