PDF Funzioni

Remarks about Deprecated PDFlib Functions

Starting with PHP 4.0.5, the PHP extension for PDFlib is
officially supported by PDFlib GmbH. This means that all the
functions described in the PDFlib Reference Manual are
supported by PHP 4 with exactly the same meaning and the same
parameters. However, with PDFlib Version 5.0.4 or higher all parameters
have to be specified. For compatibility reasons, this binding for PDFlib
still supports most of the deprecated functions, but they
should be replaced by their new versions. PDFlib GmbH will not
support any problems arising from the use of these deprecated
functions. The documentation in this section indicates old functions as
"Deprecated" and gives the replacement function to be used instead.

User Contributed Notes 23 notes

Those looking for a free replacement of pdflib may considerpslib at http://pslib.sourceforge.net which produces PostScript but it can be easily turned into PDF by Acrobat Distiller or ghostscript. The API is very similar and even hypertext functions are supported. Thereis also a php extension for pslib in PECL, called ps.

TCPDF is an Open Source PHP class for generating PDF files on-the-fly without requiring external extensions. This class is already adopted by a large number of php projects such as phpMyAdmin, Drupal, Joomla, Xoops, TCExam, etc.

Starting from 2.1 version TCPDF supports UTF-8 Unicode and bidirectional languages such as Arabic and Hebrew.

Here is a function to test whether a file is a PDF without using any external library.
<?php
define('PDF_MAGIC', "\\x25\\x50\\x44\\x46\\x2D");
function is_pdf($filename) {
return (file_get_contents($filename, false, null, 0, strlen(PDF_MAGIC)) === PDF_MAGIC) ? true : false;
}
?>
It's not checking if the whole file is valid, just if the correct header is present at the beginning of the file.

From online help:Following characters are preceded by a backslash: #&;`|*?~<>^()[]{}$\, \x0A and \xFF. ' and " are escaped only if they are not paired. In Windows, all these characters plus % are replaced by a space instead.

So you are probably passing duff paths to pdf2text.exe

Removing escapeshellcmd worked for me. Just make darned sure you are in control of what is being passed through to your system call.

It seems to be a working combination, because it is NOT give you:1) error message in Apache's error_log:Module compiled with module API=20020429, debug=0, thread-safety=0PHP compiled with module API=20020429, debug=0, thread-safety=1

// fonts to embed, they are in the folder of this file:pdf_set_parameter($pdf, 'FontAFM', 'TradeGothic=Tg______.afm');pdf_set_parameter($pdf, 'FontOutline', 'TradeGothic=Tg______.pfb');pdf_set_parameter($pdf, 'FontPFM', 'TradeGothic=Tg______.pfm');

Yet another addition to the PDF text extraction code last posted by jorromer. The code only seemed to work for PDF 1.2 (Acrobat 3.x) or below. This pdfExtractText function uses regular expressions to cover cases I have found in PDF 1.3 and 1.4 documents. The code also handles closing brackets in the text stream, which were ignored by the previous version. My regular expression skills are somewhat lacking, so improvements may possible by a more skilled programmer. I'm sure there are still cases that this function will not handle, but I haven't come across any yet...

// Handle brackets in the text stream that could be mistaken for
// the end of a text field. I'm sure you can do this as part of the
// regular expression, but my skills aren't good enough yet.
$psData = str_replace('\)', '##ENDBRACKET##', $psData);
$psData = str_replace('\]', '##ENDSBRACKET##', $psData);

To extend alex's example earlier, you can use a couple of switches inside the pdf doc to give you the total number of pages, without using any ext. I would have added the whole code, however the site keeps on saying "line is too long... yadayada".

Open the doc using fopen("$file", "rb"); (for reading)

Test the first approx 1000b for the following regex<?phpif(preg_match("/\/N\s+([0-9]+)/", $contents, $found)) { return $found[1];}?>

If that doesn't return anything, you have to read the rest of the file:

I found this info about pdflib scope on a Chinese (I think) site and translated it. I was trying to do pdf_setfont and kept getting the wrong scope error. Turns out it has to be in the Page scope. So pdf_setfont will only work when called between pdf_begin_page and pdf_end_page.

#########################################When API of the PDFlib is called, the error, Can't - IN 'document' scope occurs There is a concept of " the scope " in the PDFlib, as for all API of the PDFlib it is called with some scope, the *1 which is decided This error occurs when it is called other than the scope where API is appointed. The chart below in reference, please verify API call position.

After one hole day understanding how pdflib works i got the conclusion that its enough hard to draw just with words to furthermore for drawing a line maybe you will need something like four lines of code, so i did my own functions to do the life easier and the code more understable to modify and draw. I also made a function that will draw a rect with the corners round and the posibility even to fill it ;)

I am trying to extract the text from PDF files and use it to feed a search engine (Intranet tool). I tried several functions "PDF2TXT" posted below, but not they do not produce the expected result. At least, all words need to be separated by spaces (then used as keywords), and the "junk" codes removed (for example: binary data, pictures...). I start modifying the interesting function posted by Swen, and here is the my current version that starts to work quite well (with PDF version 1.2). Sorry for having a quite different style of programming. Luc

// look at each chunk and decide how to decode it - by looking at the contents of the filter$a_filter = split("/",$chunk["filter"]);

if ($chunk["data"]!=""){// look at the filter to find out which encoding has been used if (substr($chunk["filter"],"FlateDecode")!==false){$data =@ gzuncompress($chunk["data"]); if (trim($data)!=""){// CHANGED HERE, before: $result_data .= ps2txt($data); $result_data .= PS2Text_New($data); } else {

I was having trouble with streaming inline PDf's using PHP 5.0.2, Apache 2.0.54.

This is my code:

<?header("Pragma: public");header("Expires: Mon, 26 Jul 1997 05:00:00 GMT");header("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT");header("Cache-Control: must-revalidate");header("Content-type: application/pdf");header("Content-Length: ".filesize($file));header("Content-disposition: inline; filename=$file");header("Accept-Ranges: ".filesize($file)); readfile($file);exit();?>It would work fine in Mozilla Firefox (1.0.7) but with IE (6.0.2800.1106) it would not bring up the Adobe Reader plugin and instead ask me to save it or open it as a PHP file.

Oddly enough, I turned off ZLib.compression and it started working. I guess the compression is confusing IE. I tried leaving out the content-length header thinking maybe it was unmatched filesize (uncompressed number vs actual received compressed size), but then without it it screws up Firefox too.

What I ended up doing was disabling Zlib compression for the PDF output pages using ini_set:

<?ini_set('zlib.output_compression','Off'); ?>

Maybe this will help someone. Will post over in the PDF section as well.

I was searching for a lowcost/opensource option for combining static html files [as templates] and dynamic output from perl or php routines etc. And the sooner or later I found out that this was the most stable, 'speedest' and customizeable way to produce usable pdf 's with nice formatting :

One should ask why using different scripts :- combination perl/php is great : perl is speedier at some issues like conversion to ps files in my experience- ps to pdf is quickier then direct php to pdf [in my exp.!]- I have total control over every files whenever i change html files as a template I use only editors or other app. for it [online or offline].

p.s. I had to make a opensource solution for creating simpel report analyses that's based on things like :- first page [name / title / #/ date]- some static info [like introduction, copyrights etc]- some dynamic info [outputted from php->dbase queries] combinedwith html tags/images etc.

And this all mixed [so seperated in files for transparancy]. Also the 3 way manner : data-> html, html->ps, ps->pdf, is easier and quickier to program or adjust in every step.