Slashdot videos: Now with more Slashdot!

View

Discuss

Share

We've improved Slashdot's video section; now you can view our video interviews, product close-ups and site visits with all the usual Slashdot options to comment, share, etc. No more walled garden! It's a work in progress -- we hope you'll check it out (Learn more about the recent updates).

Saqib Ali asks: "I am looking for for some OpenSource PDF Writers/Creator. I found one, here. It can basically create PDFs out of common software Like OfficeSuite, Visio, Project or any other Windows Application that uses the Windows Printers. I know OpenOffice can also export to PDF. I am working on a project (fat client) where I need to dynamically create PDF reports from data stored in MySQL DB. I know I can use PHP to create PDFs, and also Apache's Cocoon (you can find an example document, here). Of course, I would like to investigate other OpenSource PDF writers as well. Do you know of any other PDF writers, that I can utilize or learn from by looking at the source-code?"

You can also setup a fake printer under samba. Send it postscript information, and it will write out to a ps file then use ps2pdf on it. Works great. Use any PS printer driver (color of course if you want color pdfs) and install it. When you print, it can output the resulting file into a share somewhere the user can access. Really creative scripting can get it to be either emailed to the user who printed it...

See we develop large maintenance manuals and our current processes produces PS files the size of 10 gigs. We have to take those files and convert them into PDF files. Once they are converted to PDF files, we can't delete the PS until we have verified that the conversion precesses didn't introduce glitches (which it tends to sometimes).

Plus PS files are much bigger than PDF files. If you ever encounter a PS file bigger than 2048MB you are in deep dodo. Can't produce PDF files from it without having to split

Small ones are, but not necessarily large ones - depends on what you're doing.

Sounds like the work you do is a lot of B&W where the bulk of data is text positioning.

My experience comes from the glossy magazine world (managed transfer to all-digital production at a magazine on almost every newsstand in the USA). Worked with documents of similar size to yours, but fewer pages I'm sure. Most of the data was color images, which don't necessarily get much

For Unix/BSD lpd system, you can actually useghostscript as printer filter (if=) in printcap(5),and use that printer to print from samba.No need to manually throw it to ps2pdf for that.I just can't find the posting now where I firstread about it.

wow! I'm gonna check that out. Also, to the original poster I thought I'd mention that PHP has built-in functions for creating PDFs out of raw elements (curves, text blocks, etc), if you're so inclined. Just check to make sure that your PHP administrator has installed the necessary libraries.

I'd like to second that. I'm pulling data from Oracle and dynamically generating letters using FPDF, backing them up, and then sending them to a printer. FPDF gave me the pixel-level precision required to copy the customer's layout. It also handles graphics very well - I'm placing the client's logo on each page.

Openoffice can not only write PDFs, it can also read data from a mySQL (or other ODBC/JDBC compliant) databases.I don't see any reason not to use it out of the box for such a purpose... or am I missing something?

OpenOffice.org doesn't keep the hyperlinks or other metadata in the final pdf. I use Acrobat at work for publishing company docs. Cross-linking is absolutely necessary to make the finished docs useable for end users. Are there any non-adobe OSS PDF writers that keep the meta-data too?

Add iText [lowagie.com] to the mix. It is a Java library capable of doing almost anything. The only down side is it is slower than native C libraries out there. If speed is a real issue, you could compile your iText Java classes using GCJ and convert them into native code. I'm thinking doing so will seed up your application. I haven't tried converting it to native code, has anyone?

Beware gcj native code generation for exceptions. Be very very ware... it's about 100,000 times slower than java, for reasons that are totally beyond my comprehension. Milliseconds to handle a single thrown exception.

Richard Stallman please note: this is a genuinely GNU project, so I'm calling it "GNU a2ps" with pleasure and satisfaction, but the Linux I use is either "Mandrake Linux 9.2" after the distributors who do some much work to get it all packaged and integrated right, or "GNU/SGI/BSD/KDE/Apache/Sun/IBM/{blah,blah,cows come home}/OSF/Linux", or just plain "Linux".

I have been looking (for years) for a PDF generator that will handle complex tables, with requirements that include:- automatic column sizing (like HTML)- clean break between pages- repeated header and footer on each page

None of the open source generators, none of the Commercial ones (including Adobe's expensive solution), and not even LaTeX with the longtable package will do this.

I guess I'm going to have to try some postscript generators like OpenOffice. PDF generation is evidently still in its infancy.

Back in the day I needed to turn some XML files into HTML files by applying an XSL transformation. I also found out that the same process can be done for making PDF files using something called FO (or was it XO?) from the Apache people (not the Indians)

I made XSL files with PDF-generating tags and then ran 'em through this Java library. Since out backend was made in Java anyway it was a perfect fit.

You *may* be wrong. I've been able to run a toolchain of doxygen|latex|ps2pdf and end up with a hyperlinked pdf with a bookmared index. But I have no comprehension of how this works. I don't know why I didn't use pdflatex, but I very clearly remember this working.

You didn't specify OS, though I reckon it's probably an open source one. However, I'll post this anyway, in case it can help anyone:

Under Windows, you can add the driver for the "Apple Color LW 12/660 PS" printer, pointed at the FILE: port (i.e., it prints to a file). The resulting files are PostScript. You can then install GhostScript [wisc.edu] (either on its own or as part of Cygwin [cygwin.com]) and use the ps2pdf utility to convert it to PDF. It's not very featureful (e.g., it can't generate document indices or anything),

PDF995 [pdf995.com] is a (non-open-source, ad-supported) application that sits between your Windows printer driver and ps2pdf, and streamlines the process. I love it... it's made out of open-source parts, but it's not open-source itself though. sorry. ^_^

Whoops. didn't see the part where the questioner asked about being able to see the source code. I guess I just wanted to mention PDF995 to the world-at-large... actually, I knew that my answer wasn't gonna answer the guy's question... that's why I said "sorry" to him in my post.

Mod me OT if you like, but I was aware I was slightly OT in the first place.

I don't know of too many PDF creators for Win32 besides Adobe Acrobat. I work with text-based documents a lot and wonder if there's something cheaper (and hopefully open-source!) that can do the following...

Searching Google for "PDF printer" yields a few promising results. One is Expert PDF 2 [visagesoft.com]. The standard edition appears to include the integrated-font and graphic-compression features you want. I've never heard of it before.

On Windows 2000, I use RedMon [wisc.edu] in conjunction with Ghostscript [wisc.edu]. RedMon is a generic port redirector, but it includes instructions on piping Postscript output from any Windows program into Ghostscript, which can export PDFs.

I use redmon + ghostscript under XP. It's a workable solution but far from perfect. When redmon pops up a save dialog for the pdf it is always below the application from which you print. Sometimes it does not appear at all. If this happens it means you have to reboot (or at least log out) to be able to print your document to it (or anything else). Probably there's a better solution but this is the easiest way to solve the problem. If it works you get a nice pdf but without features such as a table of conten

Hmm, let's try a Word doc in both CutePDF and PDFCreator (the two best free apps, and both based in GhostScript, bar none - I expect the only differences to be in UI, seeing as they ARE both GS based):

My current favorite for PDF generation is to build an XML [w3.org] document programatically. This document has no layout information, so I use Saxon [sourceforge.net] and an XSLT [w3.org] stylesheet to translate it to XSL Formatting Objects [w3.org]. From there, I use FOP [apache.org] to translate to PDF.

The best part is that the XML document contains the content, while the XSLT stylesheet describes how to make a document out of it. If I need a screen version all I have to do is write another stylesheet to translate to HTML.

I second the parent's suggestion. Been there, done that, and it rocked, even when FOP was at the 0.17 release. It worked pretty darned well, and you just had to make another XSLT sheet to turn the document into HTML.

Yes, it's a big task and not the "quick and dirty" method but it works really well and gives you exactly the results that you want if you want to put the time into it. The XML+XSLTT -> processor model is definately the way to for things that you expect to last a while into the future.

It looks like a nice solution in theory. Byt XML-FO is a whole language by itself and add to this the complexity of the XSLT language, and you are looking at two new XML linguo to learn just to generate PDF... ugh !

It really depends on what your looking to do for what I would recommend. I definately don't think you'll be quickly hacking together some solution by looking at others source code. The actual pdf spec is over 500 pages. There are several COM components that will let you draw to a pdf canvas but for anything useful thats pretty basic. In the python world there is the wonderful reportlab. Its built ontop of another library (can't remember what right now) and is very full featured. One project I was on h

Using Mozilla you can print a page to a postscript file and then use the command line program ps2pdf to convert it to a pdf. It isn't exactly a generic PDF library like was asked for, but it is pretty kewl. It works great for creating a quick mirror of a page including images.

ps2pdf is a little shell script which calls ghostscript to convert postscript to pdf. You can also set up an lpd server to use ghostscript as a print filter. I wrote a simple little CGI script to make the generated PDFs available via apache so users didn't need access to the box - they just print to the PDF spool and then download their pdf from the web page. Jobs are deleted after 3 days via cron. 'Course I set all this up before CUPS was available... there's probably an easier way to do this with CUPS-PDF

PDFLib lite [pdflib.com] This is the open-source version. It requires you to use and OSI-approved license on your app. PHP uses a version of PDFLib. We use the commercial version of PDFlib to produce reports like this sample report [panoramicfeedback.com].

These libraries should give you total control over your output. I'm not sure if you want that degree of power, considering you have to do a lot of work yourself. Note also the total lack of support for importing vector images in both (this is

Laytex is the best thing to make pdfs. we use this at my university for creating pdf's that have large amounts of special characters and odd formatting. Im not too sure where to get it, but i do know its cross platform compatable and really sw33t in a *nix enviroment.

Your outputting some kind of report that needs exact formating right?
Generate a template in your WYSIWYG editor of choice, export to PDF. And then edit the PDF with a text editor and insert @@@VAR_1@@@ type srtings as approiate. Then use something as simple as sed to replace them all.

Hmm.. Maby not, looking at a few pdfs taht I happen to have lying around the important part is encoded somehow.. Fuckers. Ok, do the same, generating a template as PS, do the subsutition on the template PS and then ps2pdf....

You can use just about anything.Now.. most programs that allow you to print, can also print to a file,and you get a postscript file. As part of ghostscript, there is theps2pdf tool. So, e.g. making a pdf of say.. www.slashdot.org is a nobrainer.

Other "creators" include OpenOffice.org 1.1 and later.LyXAnd you can ofcourse write Docbook documents, or TeX documents andeasily transform them to pdf.

On windows, there is a very excellent add supported/cheap converter called PDF995 [pdf995.com]. You can get rid of the adds for $10 per person (less in volume). They also have an app called PDFEdit995 which allows you to do lots of modifications and offers lots of utilites, and Signature995 which allows for encryption and digital signatures.
I have found the quaility to be better than GhostScript.

It's pretty readable and the basic text output and font metrics all work. It's very easy to produce output with from Perl, so you can very rapidly prototype your reports and see what the resulting PDF contains.

You can also tweak it a bit and disable the PDF stream compression feature so you can really see what'ts going on.

It may not be what you're looking for because it's an API more than a PDF-writing application, but Reportlab [sourceforge.net] is a great high-level pdf-writing API for python. It's quite easy to write scripts to query DBs and generate good reports. It's also great for charting/graphics. It includes great documentation and lots of example code.

Seriously, if you don't know Python, this is definitely a reason to learn. I've written dozens of tiny systems that pull data from PostgreSQL (MySQL is just as easy) to create special reports for clients. I've also done two fairly large and flexible formats.

The nicest thing about ReportLab is that it gives you primitives like tables, paragraphs, pages, and the like rather than just a drawing library. You also get various chart primitives