pdfbox-users mailing list archives

no idea about examples
look at implementing endPage() and doing something like:
for (List<TextPosition> aCharactersByArticle : charactersByArticle) {
for (TextPosition t : aCharactersByArticle) {
}
}
On May 19, 2012, at 3:54 AM, Hawkins, Thomas A. - Student wrote:
> Any idea as to where I might go for some examples of the textposition class - I've searched
the docs and found nothing. Looking over the old threads, I've only found people with issues
in regards to textposition. This sounds perfect as to what I need, I just need to figure out
how to use it (ie get the x,y and iterate through them)
>
> Thank you.
> ________________________________________
> From: Ian Holsman [kryton@gmail.com]
> Sent: Friday, May 18, 2012 3:46 AM
> To: users@pdfbox.apache.org
> Cc: users@pdfbox.apache.org
> Subject: Re: PDFBox and superscript format .NET
>
> You might want to look at the process operator function and watching for tj&ts operators.
Ts is the super/subscript operator which might give you the information you need. If you track
the textposition class it should give you the x,y position if the lettering.
> Sadly it's harder than it sounds :(
> (I'm a newbie so I might be completely off base)
>
> Sent from my iPhone
>
> On 18/05/2012, at 3:37 PM, "Hawkins, Thomas A. - Student" <thawkins@midway.edu>
wrote:
>
>> As an addendum, I didn't realize when I sent this out - the numbers are a combination
of regular and superscript, since email won't support it, mathematical operators it is. The
numbers should be
>> 8^5 (INSTEAD OF 85)
>> 9^6 (INSTEAD OF 96)
>> 4^7 (INSTEAD OF 47)
>> 10^4 (INSTEAD OF 104)
>> ________________________________________
>> From: Hawkins, Thomas A. - Student [thawkins@midway.edu]
>> Sent: Friday, May 18, 2012 1:21 AM
>> To: users@pdfbox.apache.org
>> Subject: PDFBox and superscript format .NET
>>
>> I am using the .NET version of PDFBox and I have a pdf that contains data such as
this:
>>
>> Name Location
>> Jim Daviees 85
>> Herschel Walker 96
>> Vince Gogh 47
>> Andrew Lincoln 104
>>
>> I need both the name value and the location value. When I use the following code:
>>
>> Dim p As PDDocument = PDDocument.load(fi.FullName)
>> Dim r As PDFTextStripper = New PDFTextStripper
>>
>> Dim stringVal As String = r.getText(p)
>> Dim bytes As Byte() = System.Text.Encoding.ASCII.GetBytes(stringVal)
>>
>> I get the following in the .txt file (also in html when I've converted it to that)
>> Jim Daviees
>> Herschel Walker
>> Vince Gogh
>> Andrew Lincoln
>> 85
>> 96
>> 47
>> 104
>>
>> I'm okay with the layout, as I've got a work around for that, my problem is that
it destroys any mention of the superscript exponents. Is there a way that I can locate these
superscript parts and encapsulate them in brackets or something so as the returned value is
more like this:
>> Jim Daviees
>> Herschel Walker
>> Vince Gogh
>> Andrew Lincoln
>> 8[5]
>> 9[6]
>> 4[7]
>> 10[4]
>>
>> So, nutshell time. Can I use pdfbox (.NET Version) to locate the instances of superscript
in a pdf file (like locating <sup></sup> in html) and change it out for an easily
recognized symbol to be output to my destination file. I picked brackets because I have no
brackets in my source file whatsoever and they would be very easy for me to code around. Thanks
in advance.