Created attachment 44271[details]
pdf before import
simple text is imported incorrectly.
blank spaces are added inside words, and removed between words
example1 after import:
"any informati on or t echnical data that is sensi tive material, includi ng"
example 2 after import:
"authorizedrepresentativesofallparties.ThisAgreementandperformancethereundershallbe"
these are from the same document, but different pages.
before and after documents are attached.

Dear bug submitter!
Due to the fact, that there are a lot of NEEDINFO bugs with no answer within the last six months, we close all of these bugs.
To keep this message short, more infos are available @ https://wiki.documentfoundation.org/QA/NeedinfoClosure#Statement
Thanks for understanding and hopefully updating your bug, so that everything is prepared for developers to fix your problem.
Yours!
Florian

Dear bug submitter!
Due to the fact, that there are a lot of NEEDINFO bugs with no answer within the last six months, we close all of these bugs.
To keep this message short, more infos are available @ https://wiki.documentfoundation.org/QA/NeedinfoClosure#Statement
Thanks for understanding and hopefully updating your bug, so that everything is prepared for developers to fix your problem.
Yours!
Florian

Dear bug submitter!
Due to the fact, that there are a lot of NEEDINFO bugs with no answer within the last six months, we close all of these bugs.
To keep this message short, more infos are available @ https://wiki.documentfoundation.org/QA/NeedinfoClosure#Statement
Thanks for understanding and hopefully updating your bug, so that everything is prepared for developers to fix your problem.
Yours!
Florian

Dear bug submitter!
Due to the fact, that there are a lot of NEEDINFO bugs with no answer within the last six months, we close all of these bugs.
To keep this message short, more infos are available @ https://wiki.documentfoundation.org/QA/NeedinfoClosure#Statement
Thanks for understanding and hopefully updating your bug, so that everything is prepared for developers to fix your problem.
Yours!
Florian

The duplicated bug contains one other PDF examples which include a similar problem (the enwikibooks example, the other one seemed to be fixed in LO4.0.0!)
I tested again the enwikibooks and this example on my LO4.0.0 installation with Win764bit. The Mac is in use, but I highly doubt that this is a platform problem.
Interesting side node: The fixed file is using the PDF 1.4 standard (at least what my PDF viewer is saying);
the broken ones are using with PDF1.4 and PDF1.5 - so this doesn't seem to be a standard problem (?)

Created attachment 92423[details]
patch v1 by Vort
Hello!
The location of problematic algorithm is:
Module: sdext
File: pdfimport\tree\pdfiprocessor.cxx
Function: PDFIProcessor::processGlyphLine
I've tried to figure out how it works, but attempt has failed.
But, as we can see, it do not works in fact.
Because of that, I have reimplemented it.
My version was tested particularly with files
'Autani - Non-Disclosure Agreement (Mutual with Business) (3)'
'Cascading Style Sheets_Print version - Wikibooks, open books for an open world.pdf'
And it works better than previous version.
Here is the patch. Please, test it.
And if you find regressions, let me know - I will look at problematic pdf file and will try to fix algorithm.

Hi Vort,
thanks for working on this. Looks good already, however there are still some issues.
See the file above ("Example PDF with Spaces removed"), there are still spaces removed
e.g. first sentence:
- "Anreise:Gern" instead of "Anreise: Gern"
- "Hamburgeine" instead of "Hamburg eine"

Thank you for the work on this bug which I submitted. I do not know how to apply your patch to test it. If you can point me to info on how to apply your patch, I will try to test it.
Thank you for your efforts.

I checked it with a few PDF files and it looks good to me - so thanks for this.
Can you submit the Patch to gerrit.libreoffice.org for a Code Review?
See https://wiki.documentfoundation.org/Development/gerrit for more information.
@Matt Reischner: You can use "patch -i PATCH_FILE.patch" to apply the patch in your LO working copy.

(In reply to comment #18)
> Oh then you can use "git apply patch_file". If you use a graphic interface,
> there might be also an option to apply a patch.
I think for me to test LibreOffice would have to recompiled into a windows installer. The only way I could make changes is from windows Control Panel...Add Remove Programs...and look for an "Uninstall/Change" option, but uses the installer that was used for the last installation of LibreOffice, and wouldn't know about the existence of a patch.

This is not a duplicate.
But...
When I was fixing this bug, I didn't know about possibility of opening pdf with Writer.
It is well hidden, and I was thinking that related to Writer code in importer is actually a dead code.
I will think what to do with this discovery.
For now I have found that you can just open pdf with Draw and Copy&Paste page contents to Writer.

Thanks vvort for looking into that other related PDF-import-in-Writer bug.
Yes, the problem is still there if you import in Writer via choose File -> Open -> select as file format filter "PDF - Portable Document Format (Writer)"