ocr

With his latest project, [Roni Bandini] has simultaneously given the world a new type of audiobook and music. Traditional audiobooks are basically the adult equivalent of having somebody read you a bedtime story, but BookSound actually turns the written word into electronic music. You won’t be able to boast to your friends that as a matter of fact, you have read that popular new novel, but at least you might be able to dance to it.

[Roni] says he’s still working on perfecting the word to music mapping, so the results shown in the video after the break are still a bit rough. But even in these early stages there’s no denying this is an exceptionally unique project, and we’re excited to see where it goes from here.

Inside the classy looking 3D printed enclosure is a Raspberry Pi, an OLED display, and the button and switch which make up the extent of the device’s controls. At the end of the arm is a standard Raspberry Pi Camera module, which gives the BookSound a bird’s eye view of the book to be songified.

To turn your favorite book into electronic beats, simply open it up, put it under the gaze of BookSound, and press the button on the front. Because the Raspberry Pi isn’t exactly a powerhouse, it takes about two minutes for it to scan the page, perform optical character recognition (OCR), and compose the track before you start to hear anything.

If you’re wondering what the secret sauce is to turn words into music, [Roni] isn’t ready to share his source code just yet. But he was able to give us a few high-level explanations of what’s going on inside BookSound. For example, to generate the song’s BPM, the software will count how many words per paragraph are on the page: so a book with shorter paragraphs will consequently have a faster tempo to match the speed at which the author is moving through ideas. Similarly, drum kicks are generated based on the number of syllables in each paragraph. In the future, he’s looking at adding “lyrics” by running commonly used words on the page through a text to speech engine and inserting them into the beat.

Java isn’t everyone’s cup of tea. With all its boilerplate and overhead, you’re almost always better off with a proper IDE that handles everything under the hood for you. However, if you learn a new language, you don’t really want to be bothered setting up a clunky and complex IDE. If only you could use a simple, standard Windows program that you are most likely already familiar with. This wish led [RubbaBoy] to create the MSPaintIDE, a Java development environment that let’s you write your code in — yes — MS Paint.

If you’re thinking now that you will end up writing your program with MS Paint’s text tool and create a regular image file from it — then you are right. Once set up, MSPaintIDE will compile all your PNG source files into a regular Java JAR file. And yes, it has syntax highlighting and a dark theme. [RubbaBoy] uses a custom-made OCR to transform the image content into text files and wraps it all into few-button-click environment — including git integration. You can see a demonstration of it in the video after the break, and find the source code on GitHub.

One of the big problems with doing PCB layout is finding a suitable footprint for the components you want to use. Most tools have some library although — of course — some are better than others. You can often get by with using some generic footprint, too. That’s not handy for schematic layout, though, because you’ll have to remember what pin goes where. But if you can’t find what you are looking for SnapEDA is an interesting source of components available for many different layout tools. What really caught our eye though was a relatively new service they have that uses computer vision and OCR to generate schematic symbols directly from a data sheet. You can see it work in the video below.

The service seems to be tied to parts the database already knows about. and has a known footprint available. As you’ll see in the video, it will dig up the datasheet and let you select the pin table inside. The system does OCR on that part of the datasheet, lets you modify the result, and add anything that it missed.

People with dementia have trouble with some of the things we take for granted, including dressing themselves. It can be a remarkably difficult task involving skills like balance, pattern recognition inside of other patterns, ordering, gross motor skill, and dexterity to name a few. Just because something is common, doesn’t mean it is easy. The good folks at NYU Rory Meyers College of Nursing, Arizona State University, and MGH Institute of Health Professions talked with a caregiver focus group to find a way for patients to regain their privacy and replace frustration with independence.

Although this is in the context of medical assistance, this represents one of the ways we can offload cognition or judgment to computers. The system works by detecting movement when someone approaches the dresser with five drawers. Vocal directions and green lights on the top drawer light up when it is time to open the drawer and don the clothing inside. Once the system detects the article is being worn appropriately, the next drawer’s light comes one. A camera seeks a matrix code on each piece of clothing, and if it times out, a caregiver is notified. There is no need for an internet connection, nor should one be given.

Currently, the system has a good track record with identifying the clothing, but it is not proficient at detecting when it is worn correctly, which could lead to frustrating false alarms. Matrix codes seemed like a logical choice since they could adhere to any article of clothing and get washed repeatedly but there has to be a more reliable way. Perhaps IR reflective threads could be sewn into clothing with varying stitch lengths, so the inside and outside patterns are inverted to detect when clothing is inside-out. Perhaps a combination of IR reflective and absorbing material could make large codes without being visible to the human eye. How would you make a machine-washable, machine-readable visual code?

We can almost count on our eyesight to fail with age, maybe even past the point of correction. It’s a pretty big flaw if you ask us. So, how can a person with aging eyes hope to continue reading the printed word?

There are plenty of commercial document readers available that convert text to speech, but they’re expensive. Most require a smart phone and/or an internet connection. That might not be as big of an issue for future generations of failing eyes, but we’re not there yet. In the meantime, we have small, cheap computers and plenty of open source software to turn them into document readers.

[rgrokett] built a RaspPi text reader to help an aging parent maintain their independence. In the process, he made a good soup-to-nuts guide to building one. It couldn’t be easier to use—just place the document under the camera and push the button. A Python script makes the Pi take a picture of the text. Then it uses Tesseract OCR to convert the image to plain text, and runs the text through a speech synthesis engine which reads it aloud. The reader is on as long as it’s plugged in, so it’s ready to work at the push of a button. We can probably all appreciate such a low-hassle design. Be sure to check out the demo after the break.

We often read about the minicomputers of the 1960s, and see examples of their use in university research laboratories or medium-sized companies where they might have managed the accounts. It’s tempting though to believe that much of the world in those last decades of the analogue era remained untouched by computing, only succumbing in the decade of the microcomputer, or of the widespread use of the Internet.

What could be more synonymous with the pre-computing age than the mail system? Hundreds of years of processing hand-written letters, sorted by hand, transported by horses, boats, railroads and then motor transport, then delivered to your mailbox by your friendly local postman. How did minicomputer technology find its way into that environment?

Thus we come to today’s film, a 1970 US Postal Service short entitled “Reading And Sorting Mail Automatically”. In it we see the latest high-speed OCR systems processing thousands of letters an hour and sorting them by destination, and are treated to a description of the scanning technology.

If a Hackaday reader in 2017 was tasked with scanning and OCR-ing addresses, they would have high-resolution cameras and formidable computing power at their disposal. It wouldn’t be a trivial task to get it right, but it would be one that given suitable open-source OCR software could be achieved by most of us. By contrast the Philco engineers who manufactured the Postal Service’s scanners would have had to create them from scratch.

This they performed in a curiously analogue manner, with a raster scan generated by a CRT. First a coarse scan to identify the address and its individual lines, then a fine scan to pick out the line they needed. An optical sensor could then pick up the reflected light and feed the information back to the computer for processing.

The description of the OCR process is a seemingly straightforward one of recognizing the individual components of letters which probably required some impressive coding to achieve in the limited resources of a 1960s minicomputer. The system couldn’t process handwriting, instead it was reserved for OCR-compatible business mail.

Finally, the address lines are compared with a database of known US cities and states, and each letter is routed to the appropriate hopper. We are shown a magnetic drum data store, the precursor of our modern hard drives, and told that it holds an impressive 10 megabytes of data. For 1970, that was evidently a lot.

It’s quaint to see what seems to be such basic computing technology presented as the last word in sophistication, but the truth is that to achieve this level of functionality and performance with the technology of that era was an extremely impressive achievement. Sit back and enjoy the film, we’ve placed it below the break.

Every once in a while a project comes along with that magical power to consume your time and attention for many months. When you finally complete it, you feel sorry that you don’t have to do anything more.

What is so special about this Bingo ball reader? It may seem like an ordinary OCR project at first glance; a camera captures the image and OCR software recognizes the number. Simple as that. And it works without problems, like every simple gadget should.

But then again, maybe it’s not that simple. Numbers are scattered all over the ball, so they have to be located first, and the best candidate for reading must be selected. Then, numbers are painted onto a sphere rather than a flat surface, sometimes making them deformed to the point where their shape has to be recovered first. Also, the angle of reading is not fixed but somewhere on a 360° scale. And then we have the glare problem to boot, as Bingo balls are so shiny that every light source reflects as a saturated bright spot.

So, is that all of it? Well, almost. The task is supposed to be performed by an embedded microcontroller, with limited speed and memory, yet the recognition process for one ball has to be fast — 500 ms at worst. But that’s just one part of the process. The project includes the pipelined mechanism which accepts the ball, transports it to be scanned by the OCR and then shot by the public broadcast camera before it gets dumped. And finally, if the reading was not reliable enough, the ball has to be subtly rotated so that the numbers would be repositioned for another reading attempt.

Despite these challenges I did manage to build this system. It’s fast and reliable, and I discovered some very interesting tricks along the way. Take a look at the quick demo video below to get a feel for the speed, and what the system “sees”. Then join me after the break to dive into the details of this interesting embedded build.