Attention Mechanisms in Neural Networks are (very) loosely based on the visual attention mechanism found in humans. Human visual attention is well-studied and while there exist different models, all of them essentially come down to being able to focus on a certain region of an image with “high resolution” while perceiving the surrounding image in “low resolution”, and then adjusting the focal point over time.

Attention in Neural Networks has a long history, particularly in image recognition. […] But only recently have attention mechanisms made their way into recurrent neural networks architectures that are typically used in NLP (and increasingly also in vision). That’s what we’ll focus on in this post.

Ronald Bergan ponders why American films are so reluctant to depict the Hiroshima bombing.

In the opening dialogue of Alain Resnais’s masterful Hiroshima Mon Amour (1959), the reference to which all other films on the subject must incline, a French actor in Hiroshima for a film, tells her Japanese lover that she has seen everything in Hiroshima – the exhibits in the museum, the news footage of the injured and dying. However, he keeps insisting, “You saw nothing in Hiroshima. Nothing.”

The NIPS consistency experiment was an amazing, courageous move by the organizers this year to quantify the randomness in the review process. They split the program committee down the middle, effectively forming two independent program committees. Most submitted papers were assigned to a single side, but 10% of submissions 166 were reviewed by both halves of the committee. This let them observe how consistent the two committees were on which papers to accept. For fairness, they ultimately accepted any paper that was accepted by either committee.

The results were revealed this week: of the 166 papers, the two committees disagreed on the fates of 25.9% of them: 43. But this “25%” number is misleading, and most people I’ve talked to have misunderstood it: it actually means that the two committees disagreed more than they agreed on which papers to accept.

Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. In this paper, we present a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image. The model is trained to maximize the likelihood of the target description sentence given the training image. Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions. Our model is often quite accurate, which we verify both qualitatively and quantitatively. For instance, while the current state-of-the-art BLEU score the higher the better on the Pascal dataset is 25, our approach yields 59, to be compared to human performance around 69. We also show BLEU score improvements on Flickr30k, from 55 to 66, and on SBU, from 19 to 27.

Wanting to replace the medical equipment for taking X-rays, experts in Mexico have created a system of digital x-ray imaging, which replaces the traditional plaque by a solid detector, which delivers results in five seconds. Analog equipment take six minutes to develop the traditional film.