Slashdot videos: Now with more Slashdot!

View

Discuss

Share

We've improved Slashdot's video section; now you can view our video interviews, product close-ups and site visits with all the usual Slashdot options to comment, share, etc. No more walled garden! It's a work in progress -- we hope you'll check it out (Learn more about the recent updates).

An anonymous reader writes "Researchers at the University of Massachusetts have created a tool for automatically searching handwritten historical documents, such as the 140,000 pages that make up George Washington's personal papers in the Library of Congress. The most interesting part is that the papers are scanned versions of the originals and the search tool actually recognizes the handwritten text from these images."

I'd hate to be able to type in my equations, there's a feel to working things out on paper and pen. Besides, the tactile sensation of writing on paper is simply wonderful. No amount of typing can replace that.

Then argue that, not the "tactile sensation of writing on paper". No technology feels like the last, and almost every technology has people who appreciate its particular sensations. That doesn't stop them for getting replaced; the only thing that does that is real arguments.

Maybe if people had taken their handwriting classes in 2nd and 3rd grades seriously, they would not be making mistakes of trying to confuse writing 5 and S.

I wasn't arguing for handwriting, I was putting across my opinion. Didn't realize one had to justify one's opinions. I was giving my reasons for preferring pen and paper, and the tactile sensation is one of the most important factors. I like the feel of writing, and it aids my problem solving capabilities.

And I wasn't blaming the user, but the user has as much responsibility as the language. The alternative is to change the language, which is fairly hard. Besides, there are obvious advantages to writing th

You have to change the orthography, which several Turkish languages have done three times in the last hundred years.

Even if you have a Tablet PC, you're still doing the same thing.

If the Tablet PC converts what you write to character data (as opposed to images), then there is crucial differences. You can output in an easy to read form that's easy to check for errors and easy for other people to decipher. Your input method is less importan

You have to change the orthography, which several Turkish languages have done three times in the last hundred years.

True, but English has been adopted far more widely and has a lot more speakers across the world than Turkish. It would be next to impossible to undertake a mammoth task such as that.

If the Tablet PC converts what you write to character data (as opposed to images), then there is crucial differences. You can output in an easy to read form that's easy to check for errors and easy for other pe

Comparing a language to an operating system is quite ridiculous. You write, read and communicate in English practically every waking minute of your life, starting since childhood. People *think* in English.

An operating system is hardly as ubiquitous.

Language skills are learnt and neural pathways formed when you are quite young, it would take a lot to change that in people.

The last time I checked, people didn't think in Windows API. And the last time I checked, people didn't write their grocery lists in Visual C++. Nor did kids play around in blocks Linux API when they were 3 years old - they were playing around with blocks of alphabets.

In fact, the last time I checked, people had no clue about either of those until they were well versed in a spoken and written language called English.

TeX or LaTeX are neat for writing papers, but not for doing your labnotes or solving a research problem. Writing also helps you think while you are at it, because of the time it takes to get your idea on paper. Not to mention the ease in switching modes - I can write, draw and do everything without bothering to or having to switch between programs. Thought to action, the easiest possible way.

Some of us who have been typing/keyboarding since the time I we were wee lads, can't even remember how to write in cursive.

I think, maybe 3rd or 4th grade is the last time you have to use cursive. I do, however highly recommend giving your kids touch-typing classes, so that they too, can keyboard with fluidity (and rapidly lose their writing skills too).

For me, it is a speed issue - I can type MUCH faster than writing, when I have a lot to do, typing on a computer is the way to go (plus, I can't live wi

I write out my checks in cursive. The other day I was admiring how pretty my cursive looked and how well it had developed from when I was in second grade and told to "TRY HARDER WEAKLING OR YOU WILL NEVER GET A JOB!". Then I realized just how ghey it was that I was enjoying the sight of it and hurridly gave it to the cashier... who was a guy... who (ick) winked at me.

In fact, I've two sets of handwriting - all my equations and math stuff is written straight up, and the rest of the stuff goes cursive. Makes it a lot easier for me (and those reading it) to decipher what I've written.

Cursive also made me write a whole lot faster - the flow that you get from cursive is something that makes one enjoy writing.

I'm 23 and I write in perfect cursive. In fact, I prefer it to typing. Maybe I like it because I suffered a serious injury to my hand when I was 12 that necessitated my learning to use it again from scratch.. I dunno. I just like to write, it relaxes me.

Yes, and I use it to record notes in my lab book I use at work. I record all sorts of things I discover there. Some entries are several pages long with charts and graphs and tables and diagrams. Try doing that in a few minutes in Word or OpenOffice.

The best part is I don't have to worry about backing up my lab books. The only real threat is fire, and it is no more dangerous than it is to CDs or hard drives.

While the cursive handwriting of the 1700's and early 1800's may seem curious to us (notably, the tall 's' that looks like an 'f'), it is a very easy style that is neat, legible, and painless. Notice how there are very few back strokes.

For those who are wondering, cursive is what you use when you get sick of trying to write in print legibly and quickly without getting carpal tunnel. Every culture has it. It's unfortunate it isn't common knowledge anymore in the US. Handwriting is a wonderful skill. It used to be people would judge others based on their handwriting skills in addition to their oratory.

There's also water. If I spill a Coke on my keyboard, all my data's safe; if I spill one on my notebook, it's all gone.

Of course, you use a ballpoint pen for lab notebooks, not fountain pens or other pens based on water-soluble inks. Of course, this won't help you if you spill vodka.:-)

Anyway, in lab situations you might not have a place nearby to put a laptop and you might be running between different laboratories so a laptop is often not very convenient. I was taught that you should write observation

When I was a lad I served a term
As office boy to an attorney's firm
I cleaned the windows and I swept the floor
And I polished up the handle of the big front door
I polished up that handle so carefully
That now I am the Ruler of the Queen's Navy

As office boy I made such a mark
That they gave me the post of a junior clerk
I served the writs with a smile so bland
And I copied all the letters in a big round hand
I copied all the letters in a hand so free
That now I am the Ruler of the Queen's Navy

I had a friend in high school who always wrote in cursive, and this was...a year ago, so I'm pretty sure he's still under 30. I think that he was the only one in the whole school who still did, though.

I do. And I do it well, and I'm proud of it. Of course, I'm in the dying breed that considers the ability to write legibly by hand a part of fluency in one's language. Maybe I should just give in and go back to third grade where I belong.

Am I the only one who read this and actually thought "damn...I can write in cursive....I think...I should give that a try." And then shivered at the though of the nuns who toughit it to me, ruler-in-hand, readh to smack my knuckles with it if I screwed up.

Anyway, I watch Full Metal Jacket and it reminds me of Catholic school. To continue my rabmle, how many people who actually went to catholic school aren't curretly aethiest? I'm guessing not too many.

Does anybody else who is under 30 still write in cursive, other than when they made you do it in elementary school?

I'm 23 and I use both print and cursive. I use print for anything that someone else will have to read (very rare) or for things people make me write that I don't really care about (taking notes in class). Cursive is used for things I want to write. For example, all the first drafts of my London Journal [colingregorypalmer.net] are done in cursive in a notebook I always keep on me.

Interesting you should ask, as I was recently discussing this with some friends. (Probably all over 30, though in my case only just.)

They all still use joined-up (cursive) writing, as do most other people I know. I, on the other hand, haven't used it since I was at uni and found I had trouble reading my writing: I investigated various writing styles and types, and concluded that I could print (i.e write mostly not joined-up) pretty much as fast as I could write joined-up, and that the result was vastly

And college students during exam season. (Can't speak for the Koreans.)

Blue-stained hands-up, all those who remember those glorious essay exams from the mandatory humanities courses, where your grade ceases to be based on the merits of your ideas (and/or your ability to parrot your professor's ideas), but is solely a function of how well-developed the muscles in your right hand are, in order to keep scribbling for the entire three hours what would

Popular handwriting recognition software doesn't work like that - it gains much of it's information from the "pen" strokes used to create the letters.
There's less information in a "finished" printed page than you'd get by tracking the movements a pen made to write it.
For an example of this different approach see this paper describing handwriting recognition using pen mounted accelerometers [grenoble-soc.com].

Somebody invented a way for computers to recognize handwriting. Like, so 10 years ago.

I worked on an OCR system about 20 years ago. No pre-defined bitmaps of text, you trained the system on the font to be recognized. After a few hours you could turn it loose and it did fairly well. While goofing off we tried handwritten text. With good penmanship it worked to a degree.

Yes, they are. They are not using an off-the-shelf OCR package. The OCR functionality is embedded into their software, it is highly specialized, but it is OCR. For those who are fixated on the letter 'C', recognizing multiple characters as a single unit is nothing new.

I agree, my grandmother was heavy into genealogy. She had hundreds of pages of neatly hand written, non-cursive documents. I tried to scan them with many different OCR programs, but none even came close to deciphering the text without skewing it badly. I tried ABBYY, Omnipage Pro 14, and a few others. Anyone have any successes with this kind of thing?

These documents are old and handwritten. Why waste the processing power decyphering results for each search when you can decypher the text once with a similar algorithm and search an index built that way? It's not like the information is ever going to change. (unless we do rewrite history)

Um, that's almost certainly what they did. Running an OCR over 14,000 pages every time you do a search is nearly impossible. I only say nearly because, in theory, you can do it, but then searches days a few days to complete for zero net gain.

These documents are old and handwritten. Why waste the processing power decyphering results for each search when you can decypher the text once with a similar algorithm and search an index built that way? It's not like the information is ever going to change. (unless we do rewrite history)
Context, context, context!
If there's one thing I've learned in all of my schooling (and there is a lot), it is that how the information is portrayed is just as important as the information itself. Think about hearing vs

I took a lot of notes in College. I took a lot more notes in graduate school. I've even taken notes on books I've read for the fun of it. If I could run all of these through my scanner & search them from an application on my desktop, I could be really obnoxious in an argument.

It's an interesting approach that should be extended to other languages than English. Most of the world's history is not about the US and it has certainly not been written down in English. What I would really like to have is a similar tool that can search, say, Greek, or Latin, (or whatever) handwritten text. Imagine being able to query Ovid for an item of interest without having to consult everything he's written. I can imagine that this might encourage people to study the classics (a pet peeve of mine is that many people lack historical sense...) and it would certainly facilitate research in this area.
If you can put the queries in English, with the search engine taking care of translation, it would be even better. Then, extended historical study comes within everyone's reach and the classical studies (or humaniora) might be transformed.

Their handwriting recognition system doesn't work for shit. It couldn't even correctly retrieve results from words that I know are in its scanned letters. The word "governor" appears as a result from one of their suggested queries (*cough* hard coded results *cough*), but if you do a separate search for governor it returns stuff that doesn't even contain the word.

Er.... Do i seriously miss something here or was only some mod fooled by a troll?

Lets examine your definitions:Ocr: document->RGB(via light)->pixels->patern recognitionPTC: Document->Pixels(via light)->RGB->patern recognition.Of course you forget that there are no rgb values here, because its black/white, so there is only a brightness value per pixel left. So what is the difference?

Holy shnikes! Optical Character Recognition! Bah.. I'm part of a research team at the Center for Cybermedia Research who are working on new algorithms for OCR with $4 million from Homeland Security. Its to be used on a gi-normous database containing scanned images of documents relating to Yucca Mountain.

On top of that, OCR has been around for years. Yes, it isn't the best, but its functional. Doesn't census bureau use OCR for its census forms?

Faster to handtype?! Do you have any idea whatsoever the number of handwriten documents the Library of Congress contains? I can' giveyou a number, but rest assured that it is more than any army of typists would care to copy. But computers don't care about work volume and they are a hell of a lot quicker than any human typist could ever hope to be.

great. now people are just going to spoof documents and put pr0n or enlargement spams in the pdfs when i search for anything academic related. i'm glad i dont have that problem yet finding pdf papers via google yet.

Although it is hard to OCR text and very hard to OCR cursive text written in historical documents, performing searches on those documents does not require a complete comprehension of the textand is therefore much easier to do.

For instance, the software may be unable to distinguish the word bug from dog in one person's handwriting, but can still mark it with probabilities of the word's possible meanings.

If a person later searches for the word bug or dog at a future date along with other terms, a mathematical calculation can be done for the likelyhood of the match and the searcher can make his/her own judgement to the meaning of the text.

In the legal field, finding context in a search is typically as (or more) important as finding a single word... Products like Summation (Summation.com) and Adobe's industrial strength Acrobat Capture (? - may have a new name... Server-based - uses "hot folders" that are monitored, batches, etc.) have OCR capabilities that are pretty flexible, reading from text, pdf, MS Word, JPEG, BMP, GIF, or TIFF... Of course, these can be expensive...

"Do you think that OCR is actually the wrong way to think about this problem? After all, we don't really care about characters, but rather about what words and ideas have been written. Do you have a strong background in pattern recognition, machine learning, image processing and computer graphics? Google currently "reads" almost every web page in the world. Come help us read all the printed material as well!"

Manmatha says, "Right now, searching a scanned handwritten document is very hard to do. Scanned historical documents are basically images, or pictures, and currently can only be searched if someone manually transcribes the documents or creates and index of their contents. This is time consuming and expensive to do. Given the cost, most handwritten documents are never transcribed or indexed," Manmatha says. "But there is an enormous amount of handwritten, historical material.