Search for words in your images in Office 365

‎12-13-201702:30 PM

Unlock content inside of images easily with this new search capability in Office 365.

Earlier this year, we rolled out automatic detection of images that are uploaded to SharePoint and OneDrive. This intelligence identifies whether an image is a whiteboard, a receipt, outdoors, a business card, an X-ray and many other types. You can then search for ‘whiteboard’ and you’ll see all the whiteboard photos you’ve captured and uploaded.

Now, as we announced at Ignite, any printed words in an image are automatically detected, extracted and made searchable. Using computer vision technology, when you upload the image, the location data (if available) from a photograph (such as Oslo, Norway), and the identification and extraction of text will happen automatically and become searchable. You can search in SharePoint, OneDrive or Office.com to find your captures.

Use visual content intelligence to simplify your work life

Many people complete expense reports for travel. While at a restaurant, snap a photo of the receipt. You can do this directly from the OneDrive mobile app, Office Lens mobile app, or just upload a photo you’ve taken with your device. Later on, when you go to file your expenses, you don’t have to remember where you stored it, but instead can search for something that you remember about the expense, for example ‘sushi’ or a location.

We’re excited to bring you this new capability and would love to hear how you use it and what ideas you have to make the service better. Let us know in the comments, or submit new ideas to onedrive.uservoice.com.

Text extracted from an image is in the language captured from the image and is searchable in that language.

The detection of the image type right now is only English. For example, a receipt, business card whiteboard. In the future, we’ll automatically look at the language set on the SharePoint site that the image was uploaded to and translate the type into that language. In the case of OneDrive, we’ll translate it to the language you have set in your preferences.

What other features do you have planned?

We really want to connect your captures to workflows. The goal is to look at what the object is and take action based on it, via Flow or PowerApps, so we can help you move your work forward. We also will learn from patterns you have with types of objects – personalized learning, as part of the Microsoft Graph, to suggest actions and perform them automatically for you after the pattern is established.

Sounds good! So will it become possible to upload images automatically from the OneDrive mobile app to OneDrive for Business? Will the Windows Photos app support images stored in OneDrive for Business? Aso. Aso.

"we’ll automatically look at the language set on the SharePoint site"That may not be a useful indicator of the desired language of the image type. The current language of the user is closer, like in OneDrive. Why do the types need to be translated, why not let the MUI do it?

Having the text itself extracted using language detection is good, but your receipts might be in a variety of languages, so you'll have to remember to also search for kaffe or kahvi.

In the past, if a PDF was uploaded as an image (scanned document), SharePoint did not OCR the PDF document and the text was not searchable. With this implementation, will PDF's be searchable, if scanned as an image?

You mention 21 different file formats but unfortunately not PDF, is this going to be implemented as well or are PDFs excluded from this search feature at all?

Many of our users do not understand the difference between searchable OCR PDFs and non OCR PDFs, therefore they are mostly disappointed if search does not show all expected files.

In a global organization where you have different scanners wit and without OCR it is a nightmare to explain people that it is about the document and not about the search why they won’t find non OCR PDFs in the search results.

I am not able to use this functionality. Is there any pre-configuration required to use this functionality. I have uploaded my business card in asset library in SharePoint and now not able to search using any word of my business card. I have also uploaded business card in OneDrive and same thing happen in onedrive also.

This photo intelligence feature don't work with my Onedrive for Business account. Is this already implemented ? I made a photo in JPEG with a really clear text inside and uploaded it to my OneDrive map. But afterwards the search function in OneDrive couldn't find my photo... ?

It would be useful to know how this will work with an organization's compliance rules. For example, if a user uploads a bunch of photos that turn out to have PII (Social Security numbers, credit card numbers, etc.) in them, will that get flagged in some way? Will the user be notified? Will an admin be notified?

Hello everyone,
I've been trying to gather some answers to your questions.
This feature has been completed rolled out at the end of last year.
PDF files are generated by many different applications which has consequences for how those documents are made searchable. Even though as an end user, it appears that a PDF is one format, how the PDF is created makes a big difference in how to make it searchable. In SharePoint there is already a search function makes many types of PDFs searchable. There's no plans currently to extend the work of the image recognition team to PDFs imminently but engineering is aware that this is a concern, but there are many nuances to how to make this cover every situation.
The data extracted is processed and lives wherever the data is stored, which includes geo support for data sovereignty.
Hope this helps.

If what you want to do is take a graphic file (a photo for example) where there is text on a sign or a screen shot of text or any graphic that contains text and extract the text from the photo, this works very well and we use it all the time.

The instructions are a little vague however.

You MUST use OneNote for this function, however. At the moment it does not work in any other Office Program.

Once you realize that you have to use OneNote, the rest is easy.

Copy the graphic from anywhere -- any application -- with Control C.

Go to a page in OneNote

Paste the graphic into that page using Control V

Right Click on the Graphic You have just pasted

Click on the Command "Copy Text from Picture"(This extracts the text from the photo and places it in the Clipboard)

Go to any application you wish and Control V will paste the text from the picture (now in the clipboard) into the application where you want it.

I think I have seen this work in SharePoint Online in a limited fashion in the last 2 months - but have not had a chance to test thoroughly, which is why I say 'limited'. I have been using SharePoint Wiki pages to document a simple process and inserting PNG files as screenshots of the application. When I perform a search, the PNG files I inserted, which are stored in the Site Assets library are being returned as results. I'm fairly certain it is not returning the Wiki Page and that the PNG has no metadata that would facilitate it being found as a match. I have yet to see this work with non-OCR'd PDFs, but I am also exploring the use of Muhimbi's PDF converter for use with Flow that I think could be configured to do the trick. Of course, there is a price for that, but in theory, I should be able to use Flow and have it monitor a specific SharePoint Library (or several) and when a file is added, use Muhimbi to OCR it and put it back in the same spot (or email it, or move it somewhere else).

I checked the roadmap (you can visit the site here: https://office.com/roadmap) and filtered out SharePoint that is in development or rolling out but I was not able to find any. We are very interested to know what features you would like for all of our products and you can post your ideas here: https://office365.uservoice.com/. Office 365 User Voice is where we ask feedback from our customers. You can post your ideas here and other customers that want the same feature can vote up your idea. Developers can then include the most wanted feature in future updates.

I manually generated a searchable PDF from a scan and it works great once uploaded. The only other option I've run across is paying for a separate service and building the plumbing to SharePoint.

Jason, thanks for the quick response. I was afraid that was the case. Our scanning solution does have OCR capability so I guess well be doing some testing to see how well that works. Was hoping not to need any third party functionality though.

We have recently moved to OneDrive, but are now regretting the move. Nothing is searchable, either from a scan from a phone (android or iPhone), nor an upload. In all scenarios they are PDFs, but we checked png/jpg as well. Nothing.

Well, I shouldn't say nothing, we have one document that was scanned with 4 words, and that document ocassionally shows up in search results for words it does not have. Out of 20+ word searches, only two showed a result, and it was the one PDF that has 4 words of which the two "matches" were not even in that document.

Microsoft simply replied with "there are many issues we are facing from the OneDrive update in summer." I suppose I was wanting to confirm if any one else is having 100% success rate?