You may do it yourself faster. Just add one more loop to get each character and its flags. First character of each line will have TCF_LineBegin flag set.
Also please not that text in PDF file can contain any arbitrary character codes, like null-terminating characters, carriage returns and so on anywhere in line, so you need to filter them too.

How could it be so complex to just get a text of a PDF??

This is because nature of the PDF - it does not contain text lines as you expect. You may check specification yourself.
HTH.

Victor
Tracker Software
Project manager

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.

We provide the information of the paragraphs', lines' and characters' bound boxes. My previous post describes how to get it. Judging by the files that you are using you will have to use the provided coordinates and sort them out manually. Then you can have the result you require for all of the files that you can come across.

Cheers,
Alex

Join us at Google+:https://plus.google.com/+PDFXChangeEditorTSSubscribe at:https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ