TensorFlow Technical Director: TensorFlow team how to manage open source projects

TensorFlow technical director in this article and we share their team management experience in open source projects.

Open source is not just to contribute to the code and then want someone to use it. I also know this, but after becoming a member of the Google TensorFlow team, I realized that the factors that need to be considered about building a community around a software are a lot.

community service

When a new project is released, the only expert in the project is the person who wrote the project. They are the only ones who can write documents and answer questions, and they can most effectively improve the software. As a result, the core members of our TensorFlow team have also become a bottleneck in the expansion of the project: after all, we can not do anything right away. We know how to write code and documentation, because these tasks are part of our day-to-day work. On the other hand, answering a lot of questions from community developers is not something we should do, although we also know that this is critical to the success of the project.

Typically, each engineer is responsible for a specific area for a week, in a round-robin manner. As a result, the engineers of the rotation will be much less productive in this week's normal work, but at least the frequency at which everyone is interrupted is reduced to a few months.

Pull requests

The purpose of our open source TensorFlow is to hope to improve it through the community's contribution. So far, we already have more than 400 external contributors added code, from small document fix to large additions, such asSupports OS X GPU,OpenCL implementationorInfiniBand RDMAThe First, the core engineer of the rotation must review each contribution to determine whether it is valuable. If the contribution through the initial review, will trigger a groupJenkins testTo ensure that it does not cause any malfunction. If these actions are also reviewed, the engineer on duty may wish to see other engineers who have a better understanding of the field, so this will be forwarded to the expert for review.

GitHub brand new detailsCode review toolIn the process to provide a great help; without them before, dealing with all the personal opinion is a painful thing. Often, large PRs will stay at work for a while, and the core engineer will work with one or more external contributors. After everyone is satisfied, the PR will be merged in GitHub and then merged into the next run sync In our internal code base.

Code License Agreement

As part of our automatic pull request, we will be the contributor's GitHub account with us inCla.developers.google.comOn the record to ensure that any external contributor has a code license agreement (CLA). Our goal is to ensure that the entire code base can be distributed uninterruptedly under the Apache 2.0 protocol. When the pull request engineer wants to collate all the problems that arise, if a pull request is associated with a different mailbox, or if the contributor needs to log in as a company, the situation may become complicated.

GitHub issues

There are now over 5,000 issues raised to TensorFlow, for some people, it seems a bit frustrating. But this is my favorite metric & mdash; & mdash; it shows that the user has really used this item! In order to ensure that each submitted issue is responded to, the engineer on duty will always be concerned with the emerging information and try to classify them using the label. If we are a feature that we are unlikely to implement in the short term, we will mark it as "Contributions Welcome". For bugs, we will try to give priority. Since this time, with the external users themselves have become experts in some areas, we see more and more problems without our help can be resolved. Especially in Windows like this we are not used every day on the platform.

If an issue fails to find an answer or solution through the community, and its priority is high, the watch engineer assigns it to an engineer who knows more about the field. The entire TensorFlow team has a GitHub account, so we can use the regular GitHUb issue tracker to assign the problem. We have considered the user to submit the bug copied to our internal system, but for the same information to synchronize the cost of two copies is too high. Therefore, in addition to our engineers to focus on the internal tracker, but also need to open GitHub someone posted a bug in the e-mail notification, in order to see their own distribution in time.

Stack Overflow

Derek MurrayIs the head of the Stack Overflow watch group. I am very afraid of his ability to answer questions, according to hisProfile page, He published the post has been more than 1.3 million people browsing. He also managed to build an automated spreadsheet driven by RSS feeds. At the beginning, we are responsible for the rotation every week, but the number of issues that need to be dealt with later becomes very large, and one is difficult to handle. So later on the basis of rotation, we used the automatic allocation of the problem to replace the previous practice.

I am currently in this group, so every morning after browsing my own mail, I will look through the spreadsheet to see what they are assigned to the problem. Unfortunately, we can not answer all the questions ourselves, but we will review every incoming question. If the question is relatively simple, we will answer.

The engineer on duty is a & ldquo; frontline & rdquo; role, but sometimes answering questions requires more time or expertise. If the question can be answered, but no one in the community to answer, we will look at the code (usually use the "git blame") to see who the team may have some ideas on this issue. Then the engineer on duty will send an e-mail asking if the internal experts we find can help.

Mailing Lists

We set up a mailing list, at first we do not know what to do with it. But it is clear that it is very bad way to use it to track the problem or answer the general question.

Later, we used it as a forum to use. But in actual use, we found that, even for architectural problems, GitHub issue is also more suitable than it.

So now we use the mailing list to send information and share the notice, which is worth subscribing.

Code synchronization

Many people who chat with me will be surprised by the fact that the code library we use inside Google is almost exactly the same as what we offer on GitHub. However, there are some differences between them: for example, the support of Google's dedicated infrastructure is separate, the path is not the same. But the synchronization process is fully mechanized. We push the internal changes at least once a week, and we will always pull them from GitHub.

The tricky question is that we want to make two-way synchronization. In the GitHub public project and our internal version, there are many changes are occurring at the same time, we need to repeatedly put them all merged. Since there is no ready-made infrastructure available, we have created a series of Python scripts to handle these issues. The script pulls the changes on GitHub into our internal repository and converts all the header paths and other minor changes, merges them with the latest inline code and creates an internal copy. And then we can synchronize in the other direction, we will be all the internal code into an external format, and use the same script to merge the results into the latest GitHub.

For internal changes, we will also try to ensure that every check-in will be presented with a single git commit and the author's GitHub account and comments on those changes are also included. We have a special in the GitHub "tensorflow-gardener" account for the management of this process,Click here to viewAn internal commit to migrate to GitHub will become what.

Make sure that the conversion process is still very challenging in the case of code changes. In order to verify the validity, each internal change can be run after the script is converted to an external version, and there is no difference from the original build. It is necessary to run this test on any internal changes that involve the TensorFlow codebase, and the changes that can not pass the test will be rejected. For those who send pull request, we sometimes ask them to make strange changes. Often, the reason for this is that we must ensure that their code works properly with this synchronized infrastructure.

test

Because we need to support many platforms, so we want to have a wide range of test infrastructure. TensorFlow runs on Linux, Windows, OS X desktops, and iOS, Android, Android Things, and Raspberry Pi. At the same time we also provide different code paths for the GPU, including CUDA and OpenCL support, as well as Bazel, cmake, and plain makefile build process.

It is impossible for each developer to manually test the above items after the change, so we have a set of programs that can run on most supported platformsAutomated test system, All of which are controlled by the Jenkins automation system. Maintaining this kind of work requires a lot of time and effort, because there are always operating system updates, hardware problems, and other problems that are not related to TensorFlow. We have a team of engineers who are responsible for ensuring that the entire test system is working properly. This team has helped us many times, so that we have survived, so this investment is worth it.

Developer relationship

In Google, we work in the open source field is not alone. We learned a lot from projects like Kubernetes and Open Source Program Office. We also have a very hard team of developer relationship experts to help us, and they have dealt with a lot of manual work around document, code examples, and other developer experience problems. Our long-term goal is to pass critical expertise outside the core developer so that more Google internal and external people can contribute to the community.

Let the core engineer "part-time" commitment to customer service work is a big advantage is that you can directly understand the problems encountered by users. Participation in customer service also drives us to improve common mistakes and add documentation because it allows us to see a direct return in terms of reduced effort.

Looking to the future, we hope that this work can be carried out more extensively, and hope that more people can be familiar with the details of the framework, the improvement of the document, we created more "guide" to help people deal with common tasks ( Such as misclassification). Prior to that, I was lucky to have the opportunity to interact with so many external developers, and I wanted to have a positive impact by helping them to use the machine to learn to create new and stunning applications.

About the Author:

Pete WardenIs the technical director of the TensorFlow Mobile team, formerly Jetpac's chief technology officer, which was acquired by Google in 2014. The main job is to optimize its deep learning technology on mobile and embedded devices. He has worked at Apple, responsible for GPU optimization for image processing, and has written books on data processing for O'Reilly.