This website uses cookies to deliver some of our products and services as well as for analytics and to provide you a more personalized experience. Click here to learn more. By continuing to use this site, you agree to our use of cookies. We've also updated our Privacy Notice. Click here to see what's new.

This website uses cookies to deliver some of our products and services as well as for analytics and to provide you a more personalized experience. Click here to learn more. By continuing to use this site, you agree to our use of cookies. We've also updated our Privacy Notice. Click here to see what's new.

About Optics & Photonics TopicsOSA Publishing developed the Optics and Photonics Topics to help organize its diverse content more accurately by topic area. This topic browser contains over 2400 terms and is organized in a three-level hierarchy. Read more.

Topics can be refined further in the search results. The Topic facet will reveal the high-level topics associated with the articles returned in the search results.

Abstract

Recently, integrated optics has gained interest as a hardware platform for implementing machine learning algorithms. Of particular interest are artificial neural networks, since matrix-vector multiplications, which are used heavily in artificial neural networks, can be done efficiently in photonic circuits. The training of an artificial neural network is a crucial step in its application. However, currently on the integrated photonics platform there is no efficient protocol for the training of these networks. In this work, we introduce a method that enables highly efficient, in situ training of a photonic neural network. We use adjoint variable methods to derive the photonic analogue of the backpropagation algorithm, which is the standard method for computing gradients of conventional neural networks. We further show how these gradients may be obtained exactly by performing intensity measurements within the device. As an application, we demonstrate the training of a numerically simulated photonic artificial neural network. Beyond the training of photonic machine learning implementations, our method may also be of broad interest to experimental sensitivity analysis of photonic systems and the optimization of reconfigurable optics platforms.

Figures (4)

Fig. 1. (a) Schematic of the ANN architecture demonstrated in Ref. [11]. The boxed regions correspond to OIUs that perform a linear operation represented by the matrix W^l. Integrated phase shifters (blue) are used to control the OIU and train the network. The red regions correspond to nonlinear activations fl(·). (b) Illustration of operation and gradient computation in an ANN. The top and bottom rows correspond to the forward and backward propagation steps, respectively. Propagation through a square cell corresponds to matrix multiplication. Propagation through a rounded region corresponds to activation. ⊙ is element-wise vector multiplication.

Fig. 2. Schematic illustration of our proposed method for experimental measurement of gradient information. The box region represents the OIU. The colored ovals represent tunable phase shifters, and we illustrate computing the gradient with respect to the red and the yellow phase shifters, labeled 1 and 2, respectively. (a) We send the original set of amplitudes Xl−1 and measure the constant intensity terms at each phase shifter. (b) We send the adjoint mode amplitudes, given by δl, through the output side of our device, recording XTR* from the opposite side, as well as |eaj|2 in each phase shifter. (c) We send in Xl−1+XTR, interfering eog and eaj* inside the device and recovering the gradient information for all phase shifters simultaneously.

Fig. 3. Numerical demonstration of the time-reversal procedure of Section 4. (a) Relative permittivity distribution for three MZIs arranged to perform a 3×3 linear operation. Blue boxes represent where phase shifters would be placed in this system. As an example, we compute the gradient information for a layer with Xl−1=[001]T and δl=[010]T, corresponding to the bottom left and middle right ports, respectively. (b) Real part of the simulated electric field Ez corresponding to injection from the bottom left port. (c) Real part of the adjoint Ez, corresponding to injection from the middle right port. (d) Time-reversed adjoint field as constructed by our method, fed in through all three ports on the left. (e) Gradient information dLdεl(x,y) as obtained directly by the adjoint method, normalized by its maximum absolute value. (f) Gradient information as obtained by the method introduced in this work, normalized by its maximum absolute value. Namely, the field pattern from (b) is interfered with the time-reversed adjoint field of (d), and the constant intensity terms are subtracted from the resulting intensity pattern. Panels (e) and (f) match with high precision.

Fig. 4. Numerical demonstration of a photonic ANN implementing an XOR gate using the backpropagation algorithm and adjoint method described in this work. (a) Architecture of the ANN. Two layers of 3×3 OIUs with z2 activations. (b) Mean-squared error (MSE) between the predictions and targets as a function of training iterations. (c) Absolute value of the network predictions (blue circles) and targets (black crosses) before training. (d) Absolute value of the network predictions after training, showing that the network has successfully learned the XOR function.