But in the same time I need to highlight a match on PDF page with some rectangle like it's done in Safari for example.
Any suggestions how to implement this?
Is there some solutions that don't require to such immense work?

1 Answer
1

Detecting the "bytes" encoded in a TJ does not mean that you have already "text" or even are able to convert it back at all.

In PDF upon drawing text there's an "active" font (Tf). The font has an encoding - there are a lot of different encodings around and some are not "invertible" in the sense that you can get a unicode from it.

If you have an "invertible" encoding that's fine. It is still much work to implement the reverse lookup (especially for the multi byte encodings..) but one fine day you're done.

If your encoding is not so smart, you may still have an additional /ToUnicode map that allows to compute a unicode. An additional effort, but now your fine.

...besides the many existing documents around that support neither of these mappings to unicode...

...and after all: PDF does not contain "text" in that sense, it draws characters. So in theory you must draw the characters in a virtual page before you can sort them in any readable order...

Thanks mtraut ... i am done to get text using TJ/Tj operator ... but when font Encoding is Identity-H . it creates problem to get text ... i dont know how to get text using ToUnicode mapping ?
–
Ravi ChokshiFeb 5 '11 at 8:31

It would need a lot of comments to describe the text extraction mechanics. Instead i'd like you to recommend either the PDF spec (The complete reference) or an existing implementation like jPod. While in Java you should get a good idea of the character lookup.
–
mtrautFeb 5 '11 at 8:57

In short terms: If you have a ToUnicode, you will not need the encoding. Take the (possibly multibyte!) input for the next character and map via the ToUnicode. This is a sophisticated multi index mapping tool - you will hav o implement it. The result should be your unicoce character
–
mtrautFeb 5 '11 at 8:58

Hello Ravi Chokshi, I am also working in PDF Search functionality. And i am success to highlight the search result. But i want to list out the all search occurrence of PDF in List with surrounding text and page number. like does in most of pdf reader application. Can you please help me. Or give some demo code for this for better understanding.
–
MinuMasterFeb 13 '12 at 7:02

Can you let us know how to highlight search results, please?
–
MobihunterzJul 20 '12 at 6:29