Questions in regards to full-text search with PDF SQL Server 2008 restricted to embedded text only?

Question

Flat out impossible to answer without any further information on your search tool and the actual PDFs.

"Happy apple" will be found if the text is 1. not compressed, 2. not encrypted, 3. not weirdly constructed, 4. not re-encoded, or 5. re-encoded but the translation table to Unicode is present and correct.

ad 1: Usually data streams in a PDF are compressed, using one or more algorithms from the standard set (usually LZW or Flate).

ad 2: PDFs may be encrypted with a password, preventing casual inspection. Levels of security range from mid-difficult to theoretically uncrackable with current technology.

ad 3: Single characters may appear on your page in any order. The software used to create it may, at its whim, split up text string in separate parts or even draw each individual character at any position, and omit all spaces. Only strict sorting on absolute x and y coordinates of each text fragment may reveal the original text.

ad 4: If a font gets subsetted, a PDF composer may decide to store 'h' as 0, 'a' as 1 and 'p' as 2 (and so on). The correct glyphs are still associated with these codes, but "the" text now may appear as "0 1 2 2 3 4 1 2 2 5 6" in the text stream. Also, even if it does not subset the font, a PDF composer is free to move characters around anyway.

ad 5: To revert this re-encoding, software may include a ToUnicode table. This is to associate character codes back to the original Unicode values again; one table per re-encoded font. If the table is missing, there usually is no straightforward way to create it.

There is even an ad 6 I did not think of: text may be outlined or appear in bitmaps only.

Only the very simplest PDFs can be searched with a general tool such as command-line grep. For anything else, you need a good PDF decoding tool -- and the better it is, the more points of this list you can tick off. Except, then, #5 and #6.

(Later edit) Oh wait. You obfuscated your actual question enough to entirely throw me off the target, which (I think!) was "does sql-server-2008 search for entire phrases or for individual words?"

Good thing, then, the above still holds. If you cannot search inside your PDFs anyway, the actual question is moot.