Posted
by
timothy
on Monday July 04, 2011 @10:10AM
from the daddy-what-were-operating-systems? dept.

theweatherelectric writes "The pdf.js project aims to implement a PDF viewer using standards-compliant Web technologies. The project has reached its first milestone: it renders the sample PDF (a paper on Mozilla's Tracemonkey JavaScript engine) perfectly. However, that perfection currently comes with some caveats: 'pdf.js produces different results on pretty much every element in the browser×OS matrix. We said above that pdf.js renders the Tracemonkey paper "perfectly" if you're running a Firefox nightly. On a Windows 7 machine where Firefox can use Direct2D and DirectWrite. If you ignore what appears to be a bug in DirectWrite's font hinting. The paper is rendered less well on other platforms and in older Firefoxen, and even worse in other browsers. But such is life on the bleeding edge of the web platform.'"

Even reading the summary it is clear that this is a very, very early development work. This is their *first* milestone, of course it's going to be severely lacking in almost every way. Of course it's not cross-browser and doesn't allow selectable text... but eventually it will be. I, for one, think this is a great idea, and can't wait to see it done!

What's with this trend recently to build everything on fundamentally sucky technologies?

I think it's becoming increasingly obvious that browsers need something that allows native client functionality without the burden of shoe horning everything through Javascript's loosely typed, garbage collectioned, non addressable world. LLVM is gaining a lot of steam so perhaps it should be that with each app seeing a limited API that maps out onto the DOM. Perhaps that can even be created from JS, e.g. an vmEval(url, canvas) function that loads bitcode from some url, turns it into an invokable object wh

No. Java is higher level, garbage collected, has its own system libraries which run indepedently of the browser, and is a very large environment in its own right.

I'm suggesting that there should be provision for LLVM bitcode to be compiled and executed natively in the browser. It's only interaction with the outside world is via exposed DOM apis which are already security hardened. A canvas would be its "display", it sockets would map to websockets, it's file io to web storage and so on. The display could

It will be slower the native x86/ARM code by far, and won't integrate well with the desktop environment.

Does your PDF reader integrate well with the browser environment?

One of the major benefits of rendering PDFs in the browser, aside from the fact that users don't have to download, trust, and run a separate PDF viewer, is that you reduce the security vulnerability surface area. PDFs (well, Adobe Reader) is a major vector for attacks, but that goes away when you sandbox it in the browser.

If it's a vulnerability thing, then what you really need to do is go over to Adobe and bitch smack the moron there that decided that it was a good idea to include scripting and linking abilities into a document format. And if you choose the Seattle branch you're just a short ways from MS so you can bitch slap the hell out of them for doing the same sort of bullshit with.DOC.

Documents are for reading, if you want people to be able to fill in a form, then they should have to use a separate program. It's just

Oddly enough, PDF (a descendant of PostScript) has always been more program than data. Like PostScript, a pdf document is a program in a Forth like language that draws the document on a canvas. Adobe's mistake is in letting it out of the sandbox with bolted on extras.

A number of others have managed to implement PDF sandboxes (often without the bolt-ons) without all the holes.

But it seems amazingly inferior to a platform native PDF reader, on any platform imaginable. It will be slower the native x86/ARM code by far, and won't integrate well with the desktop environment.

What's with this trend recently to build everything on fundamentally sucky technologies?

You're absolutely right. A platform native PDF reader is technically superior. But opening up a new window for each PDF you display really sucks as a user experience. To eliminate this sucky UI experience, browsers support PDF natively (I'm not sure why this hasn't happened), and not rely on Adobe reader, or some other helper application. Even if all the major browsers supported that TODAY, it would be literally years before a broad enough spectrum of people upgraded to use inline PDFs in a design.

What implementing a PDF reader in javascript accomplishes is across the board inline PDFs today. No upgrades required. I think that's worth some sucky technology and inefficient code.

I'm saying none of that. I'm saying that sometimes it would be very useful to display a PDF inline in the same page, and not have it displayed in another window, or tab. Another poster pointed out this is already possible. I'm not familiar with how well this works, and the limitations of this method. I will say that being able to treat a PDF like any other object and have it be manipulated programatically would be a huge advantage for some people.

It would be nice if text-based PDF's interacted with my browser the same way text-based HTML does. So find, save as and other functions would be browser native rather than the kooky half-breed we have today (with pdf reader stuffed into a browser tab for some docs and mysteriously for other pdf docs a new pop up window with native pdf reader).

But opening up a new window for each PDF you display really sucks as a user experience.

Having "defected" from Win XP to Mac OS X back when Vista was released, it's been many years since I used Windows or Linux for long periods of time, rather than temporarily in VMs for work purposes. Now and then, stories like this, or even entire pieces of technology like the renderer in question, remind me just how awful things still are on other platforms.

Another poster asked if the PDF renderer was integrated into the browser, rather than the OS. What a bizarre question. My PDF renderer is integrated

It's not a kludge, it's not a bodged add-on, it's an extensible, intelligent, well integrated piece of technology that's part of a wider architecture that makes more sense than any other OS architecture I've seen above kernel level.

Indeed. I came from a long line of AmigaOS based systems, after which I finally "gave up" and got an XP based laptop after being forced through Win2k at work. That drove me mad for awhile and after significant playing with Desktop Linux (just never feels "right" to me... very happy with it on my servers, but not my dekstop), now I've got mostly Macs in the house.

The system you're referring to here is indeed very simple and elegant. It reminds me a lot of what AmigaOS did with the "datatype" system - Appl

I actually didn't know you could already embed a PDF in a page. I'd guess you can't create your own controls to move to a different page, or manipulate the PDF in other ways, and are reliant upon the helper application.

The point being, viewing a PDF with programmatic controls would allow for a much richer environment than relying on helper applications.

But it seems amazingly inferior to a platform native PDF reader, on any platform imaginable. It will be slower the native x86/ARM code by far, and won't integrate well with the desktop environment.

Regarding speed, two things: First, this will spend most of its time in calls to the browser's Canvas API, which all browsers implement in C++. So it isn't clear that it should be significantly slower than a native implementation. Second, even if this were in 100% JavaScript, that is just around 5X slower than C++ these days. Rendering PDFs might be plenty fast enough at that speed, since you typically render once then show it for a long time. In other words, this isn't something like a game engine that nee

The real question is, if you know you have to display your PDFs in a web browser, why not just convert them to something more web friendly on the server side and then display that to the client? It isn't like you'd be using pdf.js as your PDF viewer for any PDF. It has to be embedded in a website for specific PDFs on that site.

Generally if layout is that important, that means you're printing it. In which case, you probably want to download and print the document from a real PDF viewer. PDF isn't suitable for mobile devices anyway. But then, being on a Mac and using an iPhone, maybe I just take good PDF support for granted? PDF.js sounds like a neat exercise, but I'll keep using Preview. I don't really want your half-assed, slow PDF viewer embedded in my browser, thanks.

The point is that once you can draw a PDF on the screen, you can draw anything. It means you can implement photoshop in Javascript. More importantly, it means you can draw something on the screen, and get it to render exactly the way you want it to, on any system. Right now this isn't possible in a browser, and it sucks.

It is the PDF language that matters, which is basically a successor to Postscript, not the bloated document reader.

I currently have PDFs set to be downloaded and opened in an external application, because PDF rendering in a browser tab (using Adobe's PDF plugin) fucks up important shortcuts: Cmd-W no longer closes the tab but throws up an annoying dialog. That alone would be reason enough to switch.

Small note to webmasters everywhere (if you think about what the parent said): what I hate is websites that force PDF files to be downloads instead of letting my browser handle them. On Mac OS X, viewing a PDF is basically the same as viewing a JPEG. No Adobe reader required, it just works.

That's because OS X's underlying display API is... display PDF! Similar to ye olde Solaris Display PostScript. As a side effect, display and generation of PDFs is trivial - you're outputting to a file rather than to the rasterizer.

It's also the reason why PDFs are trivially displayed in iOS as well - again, being based on OS X means it also inherits display PDF.

what I hate is websites that force PDF files to be downloads instead of letting my browser handle them.

The problem is that the web site incorrectly specifies the file mime type as e.g. "Content-Type=text/html"
instead of "Content-Type=application/pdf".
While in theory the ".pdf" extension or content inspection could be
used to guess it, Firefox (for example) does not use mime type guessing since it is a security issue:
What should Firefox do with this file? [mozillazine.org].

Adobe's PDF sucks. I use either:
a) Google Quick View (my favourite),
b) Chrome's built-in PDF viewer (which is fast, and doesn't crash often, and doesn't hang everything while the PDF is being downloaded.), or
c) Foxit's plugin (very rarely),
depending on the browser and OS being used. But I tried it out, and though the rendering was horrible (Chromium daily on Natty), it didn't seem to hang or ask anything on being closed. The slide-out sidebar was neat, but the open file button didn't do anything.

I was hoping somebody around here might explain the point of opening PDFs embedded in the browser. Instead, your post just confirms my own prejudices. The PDF plugins that I've seen trade off screen space for another toolbar, restrict the functionality over standalone PDF viewer, and break the browser's UI. Chrome's handling of PDF was the single reason I ditched it after a few weeks last year when I tried to switch to it from Firefox (even set to open the PDF viewer was broken as it didn't seem to pass

On a similarly Anglo-Saxon note, almost anything ending in ‘x’ may form plurals in ‘-xen’ (see VAXen and boxen in the main text). Even words ending in phonetic/k/ alone are sometimes treated this way; e.g., ‘soxen’ for a bunch of socks. Other funny plurals are the Hebrew-style ‘frobbotzim’ for the plural of ‘frobbozz’ (see frobnitz) and ‘Unices’ and ‘Twenices’ (rather than ‘Unixes’ and ‘Twenexes’; see Unix, TWENEX in main text). But note that ‘Twenexen’ was never used, and ‘Unixen’ was seldom sighted in the wild until the year 2000, thirty years after it might logically have come into use; it has been suggested that this is because ‘-ix’ and ‘-ex’ are Latin singular endings that attract a Latinate plural.

Are you sure you want to do that? I can understand typesetting math in the browser, but typesetting entire TeX documents?There's already an AMS-endorsed way of typesetting TeX math (Javascript-based) called MathJax (http://www.mathjax.org/), and it works pretty well (well enough for sites like http://mathoverflow.net./ [mathoverflow.net.]

I'm the one who finds this "We do all things now in the browser" highly suspect. I already have a perfectly fine Pdf viewer, called Okular. Why not just give me a link to the Pdf file, so that I can download it, use my favorite Pdf-Viewer and print it out if I like?

I would really appreciate their affords, but I just known this is not done for my convenience, but for cooperate interests. The only reason this is developed, so that they can put some Ads inside the Pdf file, prevent me from downloading it or pr

I'm not in a position to try it myself because the PC on which I'm typing this has integrated graphics, which isn't enough to run KDE according to some idiot who doesn't know what he's talking about.

Fixed that for you. KDE 4 works perfectly with integrated graphics, you just have to turn desktop effects off. It's perfectly usable without desktop effects enabled, all applications detect it and degrade gracefully, and all the controls etc. work pretty much the same. I have a laptop with integrated graphics that doesn't support desktop effects, and I don't notice the difference apart from once a week or so when I suddenly wonder why my terminal emulator doesn't have a transparent background.

Is this a troll, or did you just spectacularly miss the point? That forum post is fairly obviously about the system requirements for the KDE equivalent of Aero Glass or whatever it's called these days...

The end game is that by shifting focus from desktop applications to cloud applications makes the desktop operating system much less important.Envisage a day when you dont need to run just so you can run that one specific app.

this might sound over the top - but i am sure that given time we will be able to play the new "Crysis" (whatever that might be) in the browser on any operating system. (of course there will still likely be some beefy hardware requirements and a juicy broadband). Although im fairly con

pdf is old vector graphics news. If they want to help a parky [google.com] out they can get TinySVG support built in to Firefox so I can finish rebuilding all of my XUL UI's in SVG....that don't work now unless the user knows how to re-enable support then ends up getting owned instead of a warning like getting a self signed cert... Cough. Sorry. Oh while I'm dreaming, getURL, putURL, and parseXML functions so I don't have to "if typeof (parseXML=='undefined')" override them every time would be nice too:) Oh and t

I find it quite hilarious that people speak seriously about coding artificial intelligence as if it will happen in the this decade, when at the same time we can't even achieve a consistent rendering of the same elements in different browsers.

These problems are generally disjoint. Identical cross browser rendering depends on everyone playing the standards game or everyone playing the let's add fixes for every browser game. The AI problem depends on solving the problem of genuine artificial intelligence without the need to pay attention to cross platform compatibility. That said, I doubt we'll see real AI this decade either:)

You're missing the goal: we need strong AI so that all web documents can be sentient. That way, they'll do a conscious effort to be usable on any kind of browser.
See? it's not because AI is cool, it's our only hope!

This is just silly. While I can appreciate it from a point of curiosity and it is probably a fun project, this is really overloading the browser.

I would submit that things like this are actively breaking the browser paradigm. Every PDF viewer allows you to save a local copy of the PDF after they have read it from the temp directory or the download directory. To implement this thing correctly is would require that JS have direct access to the file system, which as I understand it, aint fucking supposed to happen, since that would create untold numbers of security problems in a system already plagued by security problems.

While there may be arguments that this would be ok, they would all be moronic.

The entire notion of the browser needs to be forked out to an application shell with hard as nails security and a presentation shell and never the twain shall meet.

You're presumably voting for #1 and #3, but web designers are voting with the figers for #2, so a browser's options at that point are limited to either not supporting those and reaping the web-developer hate consequences (c.f IE6) or dropping either #1 or #3. Which one do you think they're

Thank you for saying that. It seems to be that displaying an HTML file isn't much different that displaying a PDF file. At high level the program reads the description (eg this font size, put the text here, etc) and hands it off to some renderer.

But WHY? Why spend precios cycles that eat battery life and heat up your PC innards doing the same thing through twenty layers of twisted human logic, that a piece of native runtime plugin code can do as well? 'Plugin' is just a word, it doesn't need to be insecure, alien, buggy or . And even if they are that, the problem lies at another level.

If anything, Pdf.js will be suitable where and when energy and resource conservation isn't a factor.

Please enlighten me, a software developer of many years, what is this gold that is Pdf.js? I mean, apart from proof-of-concept being gold in itself.

Gold is just a bit of an overstatement. More like a valuable, but not precious metal like copper.

The value is that you can display a PDF inline with the website, rather than bringing up a clunky external application like adobe reader. "But you could do that with a plugin!" you say? Correct, but what you can't do with a plugin is actually get most people to in

This is a great point. With the current scarcity of precious computer cycles left in the world why would we waste them on this? The latest government report on general availability of cycles estimates that we will be running out in the next 50 years. Just think of that - a world with no computer cycles remaining! In fact we'll feel the crunch far before that as the scarcity drives up computer cycle prices. Every cycle needs to be preserved and only used for productive purposes!

pdf.js has now reached the point where a significant portion of its issues are actually browser-rendering-engine bugs, or missing features. Finding these gaps and filling some of them has been one of the biggest returns on our investment in pdf.js so far.

The problem isn't what they've written so much as the browsers not being able to support the latest and greatest HTML5/JS functionality.

I'm guessing you're modded Funny because - sadly - this hasn't been true of PDF in a long time now.

Even between desktop PDF readers there's now too much of a difference to even remotely be able to 'rely' on it. Even bitmaps are getting less and less reliable with applications choosing to either respect or ignore gamma tags, let alone color profile information.

As it is, I used to use FoxIt, but that started to get bloaty and including oddball toolbars. So I switched to PDF-Xchange. I'm about to switch aga

Actually Sumatra is pretty nice. it is fast, low resource, haven't done any side by side but so far looking at a ton of PDF that I have they look the same as they did in Foxit so I assume they are rendering correctly.

So if you want to give Sumatra a spin the easiest way is to use Ninite [ninite.com] which turns 'clicky clicky next next next" into click and run. They also have tons of nice software from CCleaner to Glary Utilities and all of it TOOLBAR FREE without having to worry about checkboxes hidden on page 5.

Back when I still used Windows, Acrobat Reader was an absurdly large app and by far the slowest PDF reader I knew over all platforms. It always struck me as absurd that Apple and Linux users had built-in, capable, lightweight PDF viewers while most Windows users used that bloated POS. Maybe acroread is better these days, but I kind of doubt it.

We said above that pdf.js renders the Tracemonkey paper "perfectly" if you're running a Firefox nightly on a Windows 7 machine where Firefox can use Direct2D and DirectWrite, if you ignore what appears to be a bug in DirectWrite's font hinting.

The big downside is that it's all images and you can't do all those fancy things you can do with text. Like select, copy & search.

I'm working on it. To get text out of pdf.js as is, you just implement a TextGraphics object (like their existing CanvasGraphics one) and just implement the text and coordinate transform commands. There's lots of ways of getting that into a copy/pasteable form afterwards, but its early days and I'm just coding up the OCR-ish algorithms needed to infer reading order from non-tagged pdf (the most common case).

I'm not associated with the project, but this is on their todo list too, and someone else might get i