Posts from 2017

Friday, December 15, 2017

It’s been an incredible (and incredibly busy!) three weeks for the 25 mentor organizations participating in Google Code-in (GCI) 2017, our seven week global contest designed to introduce teens to open source software development. Participants complete bite sized “tasks” in topics that include coding, documentation, UI/UX, quality assurance and more. Volunteer mentors from each open source project help participants along the way.

Total registered students has already surpassed 2016 numbers and we are less than halfway to the finish! We’re thrilled that high school students are embracing GCI like never before.

Check out some of the statistics below (current as of Thursday, December 14):

Total registered students: 6,146

Number of students who have completed at least one task: 1,573 (51% of those students have completed more than 3 tasks, earning them a GCI t-shirt)

Total number of tasks completed: 5,499

Most tasks completed by one student: 39

Top 5 Countries by Tasks Completed

Countries Represented by Mentors and Students

Of course, GCI wouldn’t be possible without the effort of the more than 725 mentors and organization administrators. Based in 65 countries, mentors answer questions, review submissions, and approve tasks for students at all hours of the day -- and sometimes night! They work tirelessly to help encourage and guide the next generation of open source contributors.

Every year we express our gratitude to the mentors and organization administrators. We are particularly grateful for them given how many more students are participating in GCI this year. Thank you all, and hang in there!

Tuesday, December 12, 2017

Training a neural network usually involves defining a loss function, which tells the network how close or far it is from its objective. For example, image classification networks are often given a loss function that penalizes them for giving wrong classifications; a network that mislabels a dog picture as a cat will get a high loss. However, not all problems have easily-defined loss functions, especially if they involve human perception, such as image compression or text-to-speech systems. Generative Adversarial Networks (GANs), a machine learning technique that has led to improvements in a wide range of applications including generating images from text, superresolution, and helping robots learn to grasp, offer a solution. However, GANs introduce new theoretical and software engineering challenges, and it can be difficult to keep up with the rapid pace of GAN research.

A video of a generator improving over time. It begins by producing random noise, and eventually learns to generate MNIST digits.

In order to make GANs easier to experiment with, we’ve open sourced TFGAN, a lightweight library designed to make it easy to train and evaluate GANs. It provides the infrastructure to easily train a GAN, provides well-tested loss and evaluation metrics, and gives easy-to-use examples that highlight the expressiveness and flexibility of TFGAN. We’ve also released a tutorial that includes a high-level API to quickly get a model trained on your data.

This demonstrates the effect of an adversarial loss on image compression. The top row shows image patches from the ImageNet dataset. The middle row shows the results of compressing and uncompressing an image through an image compression neural network trained on a traditional loss. The bottom row shows the results from a network trained with a traditional loss and an adversarial loss. The GAN-loss images are sharper and more detailed, even if they are less like the original.

TFGAN supports experiments in a few important ways. It provides simple function calls that cover the majority of GAN use-cases so you can get a model running on your data in just a few lines of code, but is built in a modular way to cover more exotic GAN designs as well. You can just use the modules you want -- loss, evaluation, features, training, etc. are all independent.. TFGAN’s lightweight design also means you can use it alongside other frameworks, or with native TensorFlow code. GAN models written using TFGAN will easily benefit from future infrastructure improvements, and you can select from a large number of already-implemented losses and features without having to rewrite your own. Lastly, the code is well-tested, so you don’t have to worry about numerical or statistical mistakes that are easily made with GAN libraries.

Most neural text-to-speech (TTS) systems produce over-smoothed spectrograms. When applied to the Tacotron TTS system, a GAN can recreate some of the realistic-texture, which reduces artifacts in the resulting audio.

When you use TFGAN, you’ll be using the same infrastructure that many Google researchers use, and you’ll have access to the cutting-edge improvements that we develop with the library. Anyone can contribute to the github repositories, which we hope will facilitate code-sharing among ML researchers and users.

Tuesday, December 5, 2017

Google has always embraced new approaches to organizing all the world's information, and this includes all the world's geography. Today we are announcing the open source release of Google's S2 library, the core geometric library on which Google's global geographic database is built.

A unique feature of the S2 library is that unlike traditional geographic information systems, which represent data as flat two-dimensional projections (similar to an atlas), the S2 library represents all data on a three-dimensional sphere (similar to a globe). This makes it possible to build a worldwide geographic database with no seams or singularities, using a single coordinate system, and with low distortion everywhere compared to the true shape of the Earth. While the Earth is not quite spherical, it is much closer to being a sphere than it is to being flat!

Notable features of the library include:

Flexible support for spatial indexing, including the ability to approximate arbitrary regions as collections of discrete S2 cells. This feature makes it easy to build large distributed spatial indexes. (The image above illustrates the S2 space-filling curve, an important tool used for spatial indexing.)

The reference implementation of the S2 library is written in C++, and subsets have been ported to Go, Java, and Python. An early version of the code was released in 2011, but today's announcement represents a major update along with a commitment to maintain the library going forward. The code is under active development and new features will be released regularly. (The Java port is based on the 2011 code and does not have the same robustness, performance, or features as the current C++ version.)

To learn more, start by reading the overview and quick start documents, then explore the documentation site. The library also has extensive documentation in the header files, which is where the most authoritative information can be found. More introductions and tutorials will be added over time - contributions are welcome!

Monday, December 4, 2017

Across many scientific disciplines, but in particular in the field of genomics, major breakthroughs have often resulted from new technologies. From Sanger sequencing, which made it possible to sequence the human genome, to the microarray technologies that enabled the first large-scale genome-wide experiments, new instruments and tools have allowed us to look ever more deeply into the genome and apply the results broadly to health, agriculture and ecology.

One of the most transformative new technologies in genomics was high-throughput sequencing (HTS), which first became commercially available in the early 2000s. HTS allowed scientists and clinicians to produce sequencing data quickly, cheaply, and at scale. However, the output of HTS instruments is not the genome sequence for the individual being analyzed — for humans this is 3 billion paired bases (guanine, cytosine, adenine and thymine) organized into 23 pairs of chromosomes. Instead, these instruments generate ~1 billion short sequences, known as reads. Each read represents just 100 of the 3 billion bases, and per-base error rates range from 0.1-10%. Processing the HTS output into a single, accurate and complete genome sequence is a major outstanding challenge. The importance of this problem, for biomedical applications in particular, has motivated efforts such as the Genome in a Bottle Consortium (GIAB), which produces high confidence human reference genomes that can be used for validation and benchmarking, as well as the precisionFDA community challenges, which are designed to foster innovation that will improve the quality and accuracy of HTS-based genomic tests.

CAPTION: For any given location in the genome, there are multiple reads among the ~1 billion that include a base at that position. Each read is aligned to a reference, and then each of the bases in the read is compared to the base of the reference at that location. When a read includes a base that differs from the reference, it may indicate a variant (a difference in the true sequence), or it may be an error.

Today, we announce the open source release of DeepVariant, a deep learning technology to reconstruct the true genome sequence from HTS sequencer data with significantly greater accuracy than previous classical methods. This work is the product of more than two years of research by the Google Brain team, in collaboration with Verily Life Sciences. DeepVariant transforms the task of variant calling, as this reconstruction problem is known in genomics, into an image classification problem well-suited to Google's existing technology and expertise.

CAPTION: Each of the four images above is a visualization of actual sequencer reads aligned to a reference genome. A key question is how to use the reads to determine whether there is a variant on both chromosomes, on just one chromosome, or on neither chromosome. There is more than one type of variant, with SNPs and insertions/deletions being the most common. A: a true SNP on one chromosome pair, B: a deletion on one chromosome, C: a deletion on both chromosomes, D: a false variant caused by errors. It's easy to see that these look quite distinct when visualized in this manner.

We started with GIAB reference genomes, for which there is high-quality ground truth (or the closest approximation currently possible). Using multiple replicates of these genomes, we produced tens of millions of training examples in the form of multi-channel tensors encoding the HTS instrument data, and then trained a TensorFlow-based image classification model to identify the true genome sequence from the experimental data produced by the instruments. Although the resulting deep learning model, DeepVariant, had no specialized knowledge about genomics or HTS, within a year it had won the the highest SNP accuracy award at the precisionFDA Truth Challenge, outperforming state-of-the-art methods. Since then, we've further reduced the error rate by more than 50%.

DeepVariant is being released as open source software to encourage collaboration and to accelerate the use of this technology to solve real world problems. To further this goal, we partnered with Google Cloud Platform (GCP) to deploy DeepVariant workflows on GCP, available today, in configurations optimized for low-cost and fast turnarounds using scalable GCP technologies like the Pipelines API. This paired set of releases provides a smooth ramp for users to explore and evaluate the capabilities of DeepVariant in their current compute environment while providing a scalable, cloud-based solution to satisfy the needs of even the largest genomics datasets.

DeepVariant is the first of what we hope will be many contributions that leverage Google's computing infrastructure and ML expertise to both better understand the genome and to provide deep learning-based genomics tools to the community. This is all part of a broader goal to apply Google technologies to healthcare and other scientific applications, and to make the results of these efforts broadly accessible.

Thursday, November 30, 2017

This year Google brought over 320 mentors from all over the world (33 countries!) to Google's offices in Sunnyvale, California for the 2017 Google Summer of Code Mentor Summit. This year 149 organizations were represented, which provided the perfect opportunity to meet like-minded open source enthusiasts and discuss ways to make open source better and more sustainable.

The Mentor Summit is run as an unconference in which attendees create and join sessions based on their interests. “I liked the unconference sessions, that they were casual and discussion based and I got a lot out of them. It was the place I connected with the most people,” said Cassie Tarakajian, attending on behalf of the Processing Foundation.

Attendees quickly filled the schedule boards with interesting sessions. One theme in this year’s session schedule was the challenging topic of failing students. Derk Ruitenbeek, part of the phpBB contingent, had this to say:

“This year our organisation had a high failure rate of 3 out of 5 accepted students. During the Mentor Summit I attended multiple sessions about failing students and rating proposals and got a lot [of] useful tips. Talking with other mentors about this really helped me find ways to improve student selection for our organisation next time.”

This year was the largest Mentor Summit ever – with the exception of our 10 Year Reunion in 2014 – and had the best gender diversity yet. Katarina Behrens, a mentor who worked with LibreOffice, observed:

“I was pleased to see many more women at the summit than last time I participated. I'm also beyond happy that now not only women themselves, but also men engage in increasing (not only gender) diversity of their projects and teams.”

We've held the Mentor Summit for the past 10+ years as a way to meet some of the thousands of mentors whose generous work for the students makes the program successful, and to give some of them and the projects they represent a chance to meet. This year was their first Mentor Summit for 52% of the attendees, giving us a lot of fresh perspectives to learn from!

We love hosting the Mentor Summit and attendees enjoy it, as well, especially the opportunity to meet each other. In fact, some attendees met in person for the first time at the Mentor Summit after years of collaborating remotely! According to Aveek Basu, who mentored for The Linux Foundation, the event was an excellent opportunity for “networking with like minded people from different communities. Also it was nice to know about people working in different fields from bioinformatics to robotics, and not only hard core computer science.”

You can browse the event website and read through some of the session notes that attendees took to learn a bit more about this year’s Mentor Summit.

Now that Google Summer of Code 2017 and the Mentor Summit have come to a close, our team is busy gearing up for the 2018 program. We hope to see you then!

Tuesday, November 28, 2017

Today marks the start of the 8th consecutive year of Google Code-in (GCI). It’s the biggest contest ever and we hope you’ll come along for the ride!

The Basics

What is Google Code-in?

Our global, online contest introducing students to open source development. The contest runs for 7 weeks until January 17, 2018.

Who can register?

Pre-university students ages 13-17 that have their parent or guardian’s permission to register for the contest.

How do students register?

Students can register for the contest beginning today at g.co/gci. Once students have registered and the parental consent form has been submitted, students can choose which task they want to work on first. Students choose the task they find interesting from a list of hundreds of available tasks created by 25 participating open source organizations. Tasks take an average of 3-5 hours to complete. The task categories are:

Coding

Documentation/Training

Outreach/Research

Quality Assurance

User Interface

Why should students participate?

Students not only have the opportunity to work on a real open source software project, thus gaining invaluable experience, but they also have the opportunity to be a part of the open source community. Mentors are readily available to help answer their questions while they work through the tasks.

Google Code-in is a contest so there are prizes! Complete one task and receive a digital certificate. Three completed tasks and you’ll also get a fun Google t-shirt. Finalists get a hoodie. Grand Prize winners receive an all expense paid trip to Google headquarters in California!

Details

Over the last 7 years, more than 4,500 students from 99 countries have successfully completed over 23,000 tasks in GCI. Intrigued? Learn more about GCI by checking out our rules and FAQs. And please visit our contest site and read the Getting Started Guide.

Teachers, if you are interested in getting your students involved in Google Code-in we have resources available to help you get started.

Monday, November 27, 2017

Today Google joins Red Hat, Facebook, and IBM alongside the Linux Kernel Community in increasing the predictability of open source license compliance and enforcement.

We are taking an approach to compliance enforcement that is consistent with the Principles of Community-Oriented GPL Enforcement. We hope that this will encourage greater collaboration on open source projects, and foster discussion on how we can all continue to work closely together.

Thursday, November 16, 2017

The Google Container Tools team originally built container-diff, a new project to help uncover differences between container images, to aid our own development with containers. We think it can be useful for anyone building containerized software, so we’re excited to release it as open source to the development community.

Containers and the Dockerfile format help make customization of an application’s runtime environment more approachable and easier to understand. While this is a great advantage of using containers in software development, a major drawback is that it can be hard to visualize what changes in a container image will result from a change in the respective Dockerfile. This can lead to bloated images and make tracking down issues difficult.

Imagine a scenario where a developer is working on an application, built on a runtime image maintained by a third-party. During development someone releases a new version of that base image with updated system packages. The developer rebuilds their application and picks up the latest version of the base image, and suddenly their application stops working; it depended on a previous version of one of the installed system packages, but which one? What version was it on before? With no currently existing tool to easily determine what changed between the two base image versions, this totally stalls development until the developer can track down the package version incompatibility.

Introducing container-diff

container-diff helps users investigate image changes by computing semantic diffs between images. What this means is that container-diff figures out on a low-level what data changed, and then combines this with an understanding of package manager information to output this information in a format that’s actually readable to users. The tool can find differences in system packages, language-level packages, and files in a container image.

Users can specify images in several formats - from local Docker daemon (using the prefix `daemon://` on the image path), a remote registry (using the prefix `remote://`), or a file in the .tar in the format exported by "docker save" command. You can also combine these formats to compute the diff between a local version of an image and a remote version. This can be useful when experimenting with new builds of an image that you might not be quite ready to push yet. container-diff supports image tarballs and the registry protocol natively, enabling it to run in environments without a Docker daemon.

Examples and Use Cases

Here is a basic Dockerfile that installs Python inside our Debian base image. Running container-diff on the base image and the new one with Python, users can see all the apt packages that were installed as dependencies of Python.

And below is a Dockerfile that inherits from our Python base runtime image, and then installs the mock and six packages inside of it. Running container-diff with the pip differ, users can see all the Python packages that have either been installed or changed as a result of this:

This can be especially useful when it’s unclear which packages might have been installed or changed incidentally as a result of dependency management of Python modules.

These are just a few examples. The tool currently has support for Python and Node.js packages installed via pip and npm, respectively, as well as comparison of image filesystems and Docker history. In the future, we’d like to see support added for additional runtime and language differs, including Java, Go, and Ruby. External contributions are welcome! For more information on contributing to container-diff, see this how-to guide.

Now that we’ve seen container-diff compare two images in action, it’s easy to imagine how the tool may be integrated into larger workflows to aid in development:

Changelog generation: Given container-diff’s capacity to facilitate investigation of filesystem and package modifications, it can do most of the heavy lifting in discerning changes for automatic changelog generation for new releases of an image.

Continuous integration: As part of a CI system, users can leverage container-diff to catch potentially breaking filesystem changes resulting from a Dockerfile change in their builds.

container-diff’s default output mode is “human-readable,” but also supports output to JSON, allowing for easy automated parsing and processing by users.

Single Image Analysis

In addition to comparing two images, container-diff has the ability to analyze a single image on its own. This can enable users to get a quick glance at information about an image, such as its system and language-level package installations and filesystem contents.

Let’s take a look at our Debian base image again. We can use the tool to easily view a list of all packages installed in the image, along with each one’s installed version and size:

We could use this to verify compatibility with an application we’re building, or maybe sort the packages by size in another one of our images and see which ones are taking up the most space.

For more information about this tool as well as a breakdown with examples, uses, and inner workings of the tool, please take a look at documentation on our GitHub page. Happy diffing!

Special thanks to Colette Torres and Abby Tisdale, our software engineering interns who helped build the tool from the ground up.

Wednesday, November 8, 2017

Tangent is a new, free, and open source Python library for automatic differentiation. In contrast to existing machine learning libraries, Tangent is a source-to-source system, consuming a Python function f and emitting a new Python function that computes the gradient of f. This allows much better user visibility into gradient computations, as well as easy user-level editing and debugging of gradients. Tangent comes with many more features for debugging and designing machine learning models.

This post gives an overview of the Tangent API. It covers how to use Tangent to generate gradient code in Python that is easy to interpret, debug and modify.

Neural networks (NNs) have led to great advances in machine learning models for images, video, audio, and text. The fundamental abstraction that lets us train NNs to perform well at these tasks is a 30-year-old idea called reverse-mode automatic differentiation (also known as backpropagation), which comprises two passes through the NN. First, we run a “forward pass” to calculate the output value of each node. Then we run a “backward pass” to calculate a series of derivatives to determine how to update the weights to increase the model’s accuracy.

Training NNs, and doing research on novel architectures, requires us to compute these derivatives correctly, efficiently, and easily. We also need to be able to debug these derivatives when our model isn’t training well, or when we’re trying to build something new that we do not yet understand. Automatic differentiation, or just “autodiff,” is a technique to calculate the derivatives of computer programs that denote some mathematical function, and nearly every machine learning library implements it.

Existing libraries implement automatic differentiation by tracing a program’s execution (at runtime, like TF Eager, PyTorch and Autograd) or by building a dynamic data-flow graph and then differentiating the graph (ahead-of-time, like TensorFlow). In contrast, Tangent performs ahead-of-time autodiff on the Python source code itself, and produces Python source code as its output.

As a result, you can finally read your automatic derivative code just like the rest of your program. Tangent is useful to researchers and students who not only want to write their models in Python, but also read and debug automatically-generated derivative code without sacrificing speed and flexibility.

You can easily inspect and debug your models written in Tangent, without special tools or indirection. Tangent works on a large and growing subset of Python, provides extra autodiff features other Python ML libraries don’t have, is high-performance, and is compatible with TensorFlow and NumPy.

Automatic differentiation of Python code

How do we automatically generate derivatives of plain Python code? Math functions like tf.exp or tf.log have derivatives, which we can compose to build the backward pass. Similarly, pieces of syntax, such as subroutines, conditionals, and loops, also have backward-pass versions. Tangent contains recipes for generating derivative code for each piece of Python syntax, along with many NumPy and TensorFlow function calls.

Tangent has a one-function API:

import tangent
df = tangent.grad(f)

Here’s an animated graphic of what happens when we call tangent.grad on a Python function:

If you want to print out your derivatives, you can run

import tangent
df = tangent.grad(f, verbose=1)

Under the hood, tangent.grad first grabs the source code of the Python function you pass it. Tangent has a large library of recipes for the derivatives of Python syntax, as well as TensorFlow Eager functions. The function tangent.grad then walks your code in reverse order, looks up the matching backward-pass recipe, and adds it to the end of the derivative function. This reverse-order processing gives the technique its name: reverse-mode automatic differentiation.

The function df above only works for scalar (non-array) inputs. Tangent also supports

We are working to add support in Tangent for more aspects of the Python language (e.g., closures, inline function definitions, classes, more NumPy and TensorFlow functions). We also hope to add more advanced automatic differentiation and compiler functionality in the future, such as automatic trade-off between memory and compute (Griewank and Walther 2000; Gruslys et al., 2016), more aggressive optimizations, and lambda lifting.

We intend to develop Tangent together as a community. We welcome pull requests with fixes and features. Happy deriving!

By Alex Wiltschko, Research Scientist, Google Brain Team

Acknowledgments

Bart van Merriënboer contributed immensely to all aspects of Tangent during his internship, and Dan Moldovan led TF Eager integration, infrastructure and benchmarking. Also, thanks to the Google Brain team for their support of this post and special thanks to Sanders Kleinfeld and Aleks Haecky for their valuable contribution for the technical aspects of the post.

Thursday, October 26, 2017

We’re thrilled to introduce 25 open source organizations that are participating in Google Code-in 2017. The contest, now in its eighth year, offers 13-17 year old pre-university students an opportunity to learn and practice their skills while contributing to open source projects.

Google Code-in officially starts for students on November 28. Students are encouraged to learn about the participating organizations ahead of time and can get started by clicking on the links below:

XWiki: a web platform for developing collaborative applications using the wiki paradigm

Zulip: powerful, threaded open source group chat with apps for every major platform

These mentor organizations are hard at work creating thousands of tasks for students to work on, including code, documentation, user interface, quality assurance, outreach, research and training tasks. The contest officially starts for students on Tuesday, November 28th at 9:00am PST.

Wednesday, October 25, 2017

“The underlying physical laws necessary for the mathematical theory of a large part of physics and the whole of chemistry are thus completely known, and the difficulty is only that the exact application of these laws leads to equations much too complicated to be soluble.”
-Paul Dirac, Quantum Mechanics of Many-Electron Systems (1929)

In this passage, physicist Paul Dirac laments that while quantum mechanics accurately models all of chemistry, exactly simulating the associated equations appears intractably complicated. Not until 1982 would Richard Feynman suggest that instead of surrendering to the complexity of quantum mechanics, we might harness it as a computational resource. Hence, the original motivation for quantum computing: by operating a computer according to the laws of quantum mechanics, one could efficiently unravel exact simulations of nature. Such simulations could lead to breakthroughs in areas such as photovoltaics, batteries, new materials, pharmaceuticals and superconductivity. And while we do not yet have a quantum computer large enough to solve classically intractable problems in these areas, rapid progress is being made. Last year, Google published this paper detailing the first quantum computation of a molecule using a superconducting qubit quantum computer. Building on that work, the quantum computing group at IBM scaled the experiment to larger molecules, which made the cover of Nature last month.

Today, we announce the release of OpenFermion, the first open source platform for translating problems in chemistry and materials science into quantum circuits that can be executed on existing platforms. OpenFermion is a library for simulating the systems of interacting electrons (fermions) which give rise to the properties of matter. Prior to OpenFermion, quantum algorithm developers would need to learn a significant amount of chemistry and write a large amount of code hacking apart other codes to put together even the most basic quantum simulations. While the project began at Google, collaborators at ETH Zurich, Lawrence Berkeley National Labs, University of Michigan, Harvard University, Oxford University, Dartmouth College, Rigetti Computing and NASA all contributed to alpha releases. You can learn more details about this release in our paper, OpenFermion: The Electronic Structure Package for Quantum Computers.

One way to think of OpenFermion is as a tool for generating and compiling physics equations which describe chemical and material systems into representations which can be interpreted by a quantum computer1. The most effective quantum algorithms for these problems build upon and extend the power of classical quantum chemistry packages used and developed by research chemists across government, industry and academia. Accordingly, we are also releasing OpenFermion-Psi4 and OpenFermion-PySCF which are plugins for using OpenFermion in conjunction with the classical electronic structure packages Psi4 and PySCF.

The core OpenFermion library is designed in a quantum programming framework agnostic way to ensure compatibility with various platforms being developed by the community. This allows OpenFermion to support external packages which compile quantum assembly language specifications for diverse hardware platforms. We hope this decision will help establish OpenFermion as a community standard for putting quantum chemistry on quantum computers. To see how OpenFermion is used with diverse quantum programming frameworks, take a look at OpenFermion-ProjectQ and Forest-OpenFermion - plugins which link OpenFermion to the externally developed circuit simulation and compilation platforms known as ProjectQ and Forest.

The following workflow describes how a quantum chemist might use OpenFermion in order to simulate the energy surface of a molecule (for instance, by preparing the sort of quantum computation we described in our past blog post):

The researcher initializes an OpenFermion calculation with specification of:

An input file specifying the coordinates of the nuclei in the molecule.

The basis set (e.g. cc-pVTZ) that should be used to discretize the molecule.

The researcher uses the OpenFermion-Psi4 plugin or the OpenFermion-PySCF plugin to perform scalable classical computations which are used to optimally stage the quantum computation. For instance, one might perform a classical Hartree-Fock calculation to choose a good initial state for the quantum simulation.

The researcher then specifies which electrons are most interesting to study on a quantum computer (known as an active space) and asks OpenFermion to map the equations for those electrons to a representation suitable for quantum bits, using one of the available procedures in OpenFermion, e.g. the Bravyi-Kitaev transformation.

The researcher selects a quantum algorithm to solve for the properties of interest and uses a quantum compilation framework such as OpenFermion-ProjectQ to output the quantum circuit in assembly language which can be run on a quantum computer. If the researcher has access to a quantum computer, they then execute the experiment.

A few examples of what one might do with OpenFermion are demonstrated in ipython notebooks here, here and here. While quantum simulation is widely recognized as one of the most important applications of quantum computing in the near term, very few quantum computer scientists know quantum chemistry and even fewer chemists know quantum computing. Our hope is that OpenFermion will help to close the gap between these communities and bring the power of quantum computing to chemists and material scientists. If you’re interested, please checkout our GitHub repository - pull requests welcome!
By Ryan Babbush and Jarrod McClean, Quantum Software Engineers, Quantum AI Team

1 If we may be allowed one sentence for the experts: the primary function of OpenFermion is to encode the electronic structure problem in second quantization defined by various basis sets and active spaces and then to transform those operators into spin Hamiltonians using various isomorphisms between qubit and fermion algebras.↩

Wednesday, October 11, 2017

Crossposted on the Google Research Blog
Machine learning has made huge advances in many applications including natural language processing, computer vision and recommendation systems by capturing complex input/output relationships using highly flexible models. However, a remaining challenge is problems with semantically meaningful inputs that obey known global relationships, like “the estimated time to drive a road goes up if traffic is heavier, and all else is the same.” Flexible models like DNNs and random forests may not learn these relationships, and then may fail to generalize well to examples drawn from a different sampling distribution than the examples the model was trained on.

Today we present TensorFlow Lattice, a set of prebuilt TensorFlow Estimators that are easy to use, and TensorFlow operators to build your own lattice models. Lattices are multi-dimensional interpolated look-up tables (for more details, see [1--5]), similar to the look-up tables in the back of a geometry textbook that approximate a sine function. We take advantage of the look-up table’s structure, which can be keyed by multiple inputs to approximate an arbitrarily flexible relationship, to satisfy monotonic relationships that you specify in order to generalize better. That is, the look-up table values are trained to minimize the loss on the training examples, but in addition, adjacent values in the look-up table are constrained to increase along given directions of the input space, which makes the model outputs increase in those directions. Importantly, because they interpolate between the look-up table values, the lattice models are smooth and the predictions are bounded, which helps to avoid spurious large or small predictions in the testing time.

How Lattice Models Help You

Suppose you are designing a system to recommend nearby coffee shops to a user. You would like the model to learn, “if two cafes are the same, prefer the closer one.” Below we show a flexible model (pink) that accurately fits some training data for users in Tokyo (purple), where there are many coffee shops nearby. The pink flexible model overfits the noisy training examples, and misses the overall trend that a closer cafe is better. If you used this pink model to rank test examples from Texas (blue), where businesses are spread farther out, you would find it acted strangely, sometimes preferring farther cafes!

Slice through a model’s feature space where all the other inputs stay the same and only distance changes. A flexible function (pink) that is accurate on training examples from Tokyo (purple) predicts that a cafe 10km-away is better than the same cafe if it was 5km-away. This problem becomes more evident at test-time if the data distribution has shifted, as shown here with blue examples from Texas where cafes are spread out more.

A monotonic flexible function (green) is both accurate on training examples and can generalize for Texas examples compared to non-monotonic flexible function (pink) from the previous figure.

In contrast, a lattice model, trained over the same example from Tokyo, can be constrained to satisfy such a monotonic relationship and result in a monotonic flexible function (green). The green line also accurately fits the Tokyo training examples, but also generalizes well to Texas, never preferring farther cafes.

In general, you might have many inputs about each cafe, e.g., coffee quality, price, etc. Flexible models have a hard time capturing global relationships of the form, “if all other inputs are equal, nearer is better, ” especially in parts of the feature space where your training data is sparse and noisy. Machine learning models that capture prior knowledge (e.g. how inputs should impact the prediction) work better in practice, and are easier to debug and more interpretable.

Pre-built Estimators

We provide a range of lattice model architectures as TensorFlow Estimators. The simplest estimator we provide is the calibrated linear model, which learns the best 1-d transformation of each feature (using 1-d lattices), and then combines all the calibrated features linearly. This works well if the training dataset is very small, or there are no complex nonlinear input interactions. Another estimator is a calibrated lattice model. This model combines the calibrated features nonlinearly using a two-layer single lattice model, which can represent complex nonlinear interactions in your dataset. The calibrated lattice model is usually a good choice if you have 2-10 features, but for 10 or more features, we expect you will get the best results with an ensemble of calibrated lattices, which you can train using the pre-built ensemble architectures. Monotonic lattice ensembles can achieve 0.3% -- 0.5% accuracy gain compared to Random Forests [4], and these new TensorFlow lattice estimators can achieve 0.1 -- 0.4% accuracy gain compared to the prior state-of-the-art in learning models with monotonicity [5].

Build Your Own

You may want to experiment with deeper lattice networks or research using partial monotonic functions as part of a deep neural network or other TensorFlow architecture. We provide the building blocks: TensorFlow operators for calibrators, lattice interpolation, and monotonicity projections. For example, the figure below shows a 9-layer deep lattice network [5].

Example of a 9-layer deep lattice network architecture [5], alternating layers of linear embeddings and ensembles of lattices with calibrators layers (which act like a sum of ReLU’s in Neural Networks). The blue lines correspond to monotonic inputs, which is preserved layer-by-layer, and hence for the entire model. This and other arbitrary architectures can be constructed with TensorFlow Lattice because each layer is differentiable.

In addition to the choice of model flexibility and standard L1 and L2 regularization, we offer new regularizers with TensorFlow Lattice:

Monotonicity constraints [3] on your choice of inputs as described above.

Laplacian regularization [3] on the lattices to make the learned function flatter.

We hope TensorFlow Lattice will be useful to the larger community working with meaningful semantic inputs. This is part of a larger research effort on interpretability and controlling machine learning models to satisfy policy goals, and enable practitioners to take advantage of their prior knowledge. We’re excited to share this with all of you. To get started, please check out our GitHub repository and our tutorials, and let us know what you think!

Monday, October 9, 2017

We are now accepting applications for open source organizations who want to participate in Google Code-in 2017. Google Code-in, a global online contest for pre-university students ages 13-17, invites students to learn by contributing to open source software.

Working with young students is a special responsibility and each year we hear inspiring stories from mentors who participate. To ensure these new, young contributors have a great support system, we select organizations that have gained experience in mentoring students by previously taking part in Google Summer of Code.

Organizations must apply before Tuesday, October 24 at 16:00 UTC.

17 organizations were accepted last year, and over the last 7 years, 4,553 students from 99 different countries have completed more than 23,651 tasks for participating open source projects. Tasks fall into 5 categories:

Tuesday, October 3, 2017

We’re excited to announce 2017’s second round of Open Source Peer Bonus winners. Google Open Source established this program six years ago to encourage Googlers to recognize and celebrate external contributors to the open source ecosystem Google depends on.

The Open Source Peer Bonus program works like this: Googlers nominate open source contributors outside of the company who deserve recognition for their contributions to open source projects, including those used by Google. Nominees are reviewed by a volunteer team of engineers and the winners receive our heartfelt thanks and a small token of our appreciation.

To date, we’ve recognized nearly 600 open source contributors from dozens of countries who have contributed their time and talent to more than 400 open source projects. You can find past winners in recent blog posts: Fall 2016, Spring 2017.

Without further ado, we’d like to recognize the latest round of winners and the projects they worked on. Here are the individuals who gave us permission to thank them publicly:

Monday, October 2, 2017

We’re headed to Vancouver this week, with about 25 Googlers who are incredibly excited to attend Node.js Interactive. With a mix of folks working on Cloud, Chrome, and V8, we’re going to be giving demos and answering questions at the Google booth.

A few of us are also going to be giving talks. Here’s a list of the talks Googlers will be giving at the conference, ranging from serverless Slack bots to JavaScript performance tuning.

Wednesday, October 4th

Franzi is located in Munich, Germany where she works at Google on Chrome V8. Franzi, like James and Anna, is a member of the Node.js Core Technical Committee. She speaks across the globe on the topic of JavaScript virtual machines. She has a PhD in mathematics, but left academia to follow her true passion: writing code.

Franzi will discuss her perspective on Chrome V8 in Node.js, and what the Chrome V8 team is doing to continue to support Node.js. Want to know what the future of browser development looks like? This is a must-attend keynote.

If you were given a magic wand that would remove all implementation flaws from your web application, would it be free of security problems? If it took you more five seconds to say “No!” (or if, worse, you said “Yes!”), then you’re the target audience for this talk. If you’re in the target audience, don’t fret, much of the security community is there with you. After this talk, attendees will understand why the answer to the aforementioned question is an emphatic “No!” and they will learn an approach to decrease their chance of failing to consider an important vector of attack for their current and future web applications.

This year, V8 launched Ignition and Turbofan, the new compiler pipeline that handles all JavaScript code generation. Previously, achieving high performance in Node.js meant catering to the oddities of our now-deprecated Crankshaft compiler. This talk covers our new code generation architecture - what makes it special, a bit about how it works, and how to write high performance code for the new V8 pipeline.

Thursday, October 5th

Ever since v8-inspector was moved to V8's repository, we have been working on a number of new features for DevTools, usable for both Chrome and Node.js. The talk will demonstrate code coverage, type profiling, and give a deep dive into how evaluating a code snippet in DevTools console works in V8.

Memory leaks are hard. This talk with introduce developers to what memory leaks are, how they can exist in a garbage collected language, the available tooling that can help them understand and isolate memory leaks in their code. Specifically it will talk about heap snapshots, the new sampling heap profiler in V8, and other various other tools available in the ecosystem.

This talk will show you how to build both voice and chat bots using serverless technologies. Amir Shevat, Head of Developer Relations at Slack, has overseen 17K+ bots deployed on the platform. He will present a maturing model, as best practices, for enterprise bots covering all sorts of use cases ranging for devops, HR, and marketing. Alan Ho from Google Cloud will then show you how to use various serverless technologies to build these bots. He’ll give you a demo of Slack and Google Assistant bots incorporating Google’s latest serverless technology including Edge (API Management), CloudFunctions (Serverless Compute), Cloud Datastore, and API.ai.

ES Modules and Common JS go together like old bay seasoning and vanilla ice cream. This talk will dig into the inconsistencies of the two patterns, and how the Node.js project is dealing with reconciling the problem. The talk will look at the history of modules in the JavaScript ecosystem and the subtle difference between them. It will also skim over how ECMA-262 is standardized by the TC39, and how ES Modules were developed.

Node.js has had a transformational effect on the way we build software. However, convincing your organization to take a bet on Node.js can be difficult. My personal journey with Node.js has included convincing a few teams to take a bet on this technology, and this community. Let’s take a look at the case for Node.js we made at Google, and how you can make the case to bring it to your organization.

Tuesday, September 26, 2017

Today we are open sourcing Abseil, a collection of libraries drawn from the most fundamental pieces of Google’s internal codebase. These libraries are the nuts-and-bolts that underpin almost everything that Google runs. Bits and pieces of these APIs are embedded in most of our open source projects, and now we have brought them together into one comprehensive project. Abseil encompasses the most basic building blocks of Google’s codebase: code that is production tested and will be fully maintained for years to come.

By adopting these new Apache-licensed libraries, you can reap the benefit of years (over a decade in many cases) of our design and optimization work in this space. Our past experience is baked in.

Just as interesting, we’ve also prepared for the future: several types in Abseil’s C++ libraries are “pre-adoption” versions of C++17 types like string_view and optional - implemented in C++11 to the greatest extent possible. We look forward to moving more and more of our code to match the current standard, and using these new vocabulary types helps us make that transition. Importantly, in C++17 mode these types are merely aliases to the standard, ensuring that you only ever have one type for optional or string_view in a project at a time. Put another way: Abseil is focused on the engineering task of providing APIs that remain stable over time.

Consisting of the foundational C++ and Python code at Google, Abseil includes libraries that will grow to underpin other Google-backed open source projects like gRPC, Protobuf and TensorFlow. We love those projects, and we love the users of those projects - we want to ensure smooth usage for these things over time. In the next few months we’ll introduce new distribution methods to incorporate these projects as a collection into your project.

Continuing with the “over time” theme, Abseil aims for compatibility with major compilers, platforms and standard libraries for approximately 5 years. Our 5-year target also applies to language version: we assume everyone builds with C++11 at this point. (In 2019 we’ll start talking about requiring C++14 as our base language version.) This 5-year horizon is part of our balance between “support everything” and “provide modern implementations and APIs.”

Highlights of the initial release include:

Zero configuration: most platforms (OS, compiler, architecture) should just work.

Our primary synchronization type, absl::Mutex, has an elegant interface and has been extensively optimized.

Efficient support for handling time: absl::Time and absl::Duration are conceptually similar to std::chrono types, but are concrete (not class templates) and have defined behavior in all cases. Additionally, our clock-sampling API absl::Now() is more heavily optimized than most standard library calls for std::chrono::system_clock::now().

String handling routines: among internal users, we’ve been told that releasing absl::StrCat(), absl::StrJoin(), and absl::StrSplit() would itself be a big improvement for the open source C++ world.

The project has support for C++ and some Python. Over time we’ll tie those two projects together more closely with shared logging and command-line flag infrastructure. To start contributing, please see our contribution guidelines and fork us on GitHub. Check out our documentation and community page for information on how to contact us, ask questions or contribute to Abseil.

Monday, September 25, 2017

Google Open Source is proud to announce the 14th year of Google Summer of Code (GSoC)! Yes, GSoC is officially well into its teenage years - hopefully without that painful awkward stage - and we are excited to introduce more new student developers to the world of open source software development.

Over the last 13 years GSoC has provided over 13,000 university students from around the world with an opportunity to hone their skills by contributing to open source projects during their summer break. Participants gain invaluable experience working directly with mentors on open source projects, and earn a stipend upon successful completion of their project.

We’re excited to keep the tradition going! Applications for interested open source organizations open on January 4, 2018 and student applications open in March*.

Are you an open source project interesting in learning more? Visit the program site to learn about what it means to be a mentor organization and how to submit a good application. We welcome all types of organizations - both large and small - and each year about 20% of the organizations we accept are completely new to GSoC.

Students, it’s never too early to start thinking about your proposal. You can check out the organizations that participated in Google Summer of Code 2017 as well as the projects students worked on. We also encourage you to explore other resources like the student and mentor guides and frequently asked questions.

You can always learn more on the program website. Please stay tuned for more details!

Tuesday, September 19, 2017

Applications often require access to small pieces of sensitive data at build or run time, referred to as secrets. Secrets are generally more sensitive than other environment variables or parts of your repository as they may grant access to additional data, such as user information.

HashiCorp Vault is a popular open source tool for secret management, which allows a developer to store, manage and control access to tokens, passwords, certificates, API keys and other secrets. Vault has authentication backends, which allow developers to use many kinds of identities to access Vault, including tokens, or usernames and passwords.