image understanding

Information about AI from the News, Publications, and Conferences

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Let's first write a simple Image Recognition Model using Inception V3 and Keras The goal of the inception module is to act as a "multi-level feature extractor" by computing 1 1, 3 3, and 5 5 convolutions within the same module of the network -- the output of these filters are then stacked along the channel dimension and before being fed into the next layer in the network. The original incarnation of this architecture was called GoogLeNet, but subsequent manifestations have simply been called Inception vN where N refers to the version number put out by Google. What are we going to Detect? What does this Image say to a Computer?

Artificial Intelligence generated many possibilities which enhanced the understanding power of human. Today AI has become the foundation of the trending technologies in the market. When it comes about processing visual information AI is helping in identifying specific objects or categorizing images based on their content. Artificial Intelligence can also execute image recognition with the use of computer vision to communicate with humans. AI communications includes to understand the human gestures and then react accordingly.

The singular example of AI's progress in the last several years is how well computers can recognize something in a picture. Still, even simple tests can show how brittle such abilities really are. The latest trick to game the system comes courtesy of researchers at Auburn University in Auburn, Ala., and media titan Adobe Systems. In a paper released this week, they showed that top image-recognition neural networks easily fail if objects are moved or rotated even by slight amounts. A fire truck, for example, seen from head on, could be correctly recognized.

Understanding clothes and broad fashion products from such an image would have huge commercial and cultural impacts on modern societies. Deploying such a technology would empower not only the fashion buyers to find what they want, but also those small and large sellers to have quicker sales with less hassle. This technology requires excellence in several computer vision tasks: what the product is in the image (image classification), where it is (object detection, semantic image segmentation, instance segmentation), visual similarity, how to describe the product and its image (image captioning), etc. Recent works in convolutional neural networks (CNNs) have significantly improved the state-of-the-art performance of those tasks. In the image classification task, ResNeXt-101 method has achieved 85.4% in top-1-accuracy1 in ImageNet-1K; in object detection, the best method2 has achieved 52.5% mAP in the COCO 2017 benchmark for generic object detection; in semantic image segmentation, the top-performing method3 has reached 89% mIOU in PASCAL VOC leaderboard for the generic object segmentation.

Want to build an ML model but don't have enough training data? In this post I'll show you how I built an ML pipeline that gathers labeled, crowdsourced training data, uploads it to an AutoML dataset, and then trains a model. I'll be showing an image classification model using AutoML Vision in this example but the same pipeline could easily be adapted to AutoML Natural Language. Here's an overview of how it works: Want to jump to the code? The full example is available on GitHub.

At GraphAware, one of our Graph based solutions is the Knowledge Platform, an Intelligent Insight Engine built atop Neo4j. In order to provide to our customers the ability to unlock hidden insights from new forms of data, we decided to start an R&D phase for video analysis. For this blog post we will analyse the Neo4j Youtube channel video transcripts, extract some insights and show what type of business value such analysis can bring. Youtube offers the ability to download the transcription of the videos, when available. Fetching this data can be done in multiple ways, like connecting to the Google APIs with your preferred client.

Last week I published a blog post about how easy it is to train image classification models with Keras. What I did not show in that post was how to use the model for making predictions. This, I will do here. But predictions alone are boring, so I'm adding explanations for the predictions using the lime package. I have already written a few blog posts (here, here and here) about LIME and have given talks (here and here) about it, too.

Zero-shot learning (ZSL) is concerned with the recognition of previously unseen classes. It relies on additional semantic knowledge for which a mapping can be learned with training examples of seen classes. While classical ZSL considers the recognition performance on unseen classes only, generalized zero-shot learning (GZSL) aims at maximizing performance on both seen and unseen classes. In this paper, we propose a new process for training and evaluation in the GZSL setting; this process addresses the gap in performance between samples from unseen and seen classes by penalizing the latter, and enables to select hyper-parameters well-suited to the GZSL task. It can be applied to any existing ZSL approach and leads to a significant performance boost: the experimental evaluation shows that GZSL performance, averaged over eight state-of-the-art methods, is improved from 28.5 to 42.2 on CUB and from 28.2 to 57.1 on AwA2.

Google has introduced some new experimental features for developers working with the Lenovo Mirage Solo, the standalone Daydream headset released earlier this year. First up is see-through mode, a setting that lets the user see the real space around them through the VR headset. Google says this mode plus the Mirage Solo's tracking technology will allow developers to build AR prototypes. It demonstrated an application of this feature through an experimental app that lets Mirage Solo wearers position virtual furniture in a real-world surrounding. Secondly, the company says it's adding APIs that support position controller tracking with six degrees of freedom, which will enable more real-world, natural movement in VR.

Every company today is a tech company, a maxim that was proven out today when one of the world's oldest and biggest art auction houses acquired an AI startup. Sotheby's has bought Thread Genius, which has built a set of algorithms that can both instantly identify objects and then recommend images of similar objects to the viewer. Sotheby's' said it is not disclosing the value of the deal but said it was non-material to the company. Thread Genius was a relatively young company, founded in 2015 and making a debut last year as part of TechStars New York's Winter 2017 cohort. Co-founders Andrew Shum and Ahmad Qamar, who were also Thread Genius's only two employees, were both engineering alums from Spotify.