IBM’s nanophotonic tech integrates optic data right into chips

Ready for prime time, tech can push terabits per second over long distances.

IBM's integrated silicon nanophotonics transceiver on a chip; optical waveguides are highlighted here in blue, and the copper conductors of the electronic components in yellow.

IBM Research

IBM has developed a technology that integrates optical communications and electronics in silicon, allowing optical interconnects to be integrated directly with integrated circuits in a chip. That technology, called silicon nanophotonics, is now moving out of the labs and is ready to become a product. It could potentially revolutionize how processors, memory, and storage in supercomputers and data centers interconnect.

Silicon nanophotonics were first demonstrated by IBM in 2010 as part of IBM Research's efforts to build Blue Waters, the NCSA supercomputer project that the company withdrew from in 2011. But IBM Research continued to develop the technology, and today announced that it was ready for mass production. For the first time, the technology "has been verified and manufactured in a 90-nanometer CMOS commercial foundry," Dr. Solomon Assefa, Nanophotonics Scientist for IBM Research, told Ars.

A single CMOS-based nanophotonic transceiver is capable of converting data between electric and optical with virtually no latency, handling a data connection of more than 25 gigabits per second. Depending on the application, hundreds of transceivers could be integrated into a single CMOS chip, pushing terabits of data over fiber-optic connections between processors, memory, and storage systems optically over distances ranging from two centimeters to two kilometers.

Enlarge/ A representation of a silicon nanophotonic transceiver, with photodetector (the red rectangle on the left side of the cube, with the incoming beam) and optical modulator (the blue box with beam on the right), integrated with electronic transistors via nine layers of conductor embedded in the silicon

IBM Research

Cheap and scalable, the technology could have a huge impact on how data centers for public cloud computing and other "exascale" applications—and the computers in them—are built. Eventually, it could find its way into processors themselves, though the low cost of 90-nanometer chips makes them "the most cost-effective solution looking into the next decade," Assefa said.

The computer as network

Many of the current "petascale" supercomputers use optical interconnects to transfer data between computing nodes. Cloud computing data centers, VI social networking data centers, and other "exascale" computing environments are all potential beneficiaries of silicon nanophotonics. Applications that run in data centers like those of Facebook and Google all face the problem of data bottleneck. "These services are relying on increasingly interconnected networks," Assefa said.

For many distributed applications—such as Hadoop-based data analysis, video, and image processing—the greatest limiting factor is often the network, as information flows between processors. Network connections are also expensive, as they rely on additional hardware to manage them.

But direct optical connections could change the architecture of the systems built for these applications. They could be integrated into data center racks, allowing the components of servers to be set up in different physical units without a loss of performance. It could also allow for much more efficient and redundant storage networking. "Once you have this sort of universal technology," Assefa said, "it provides a lot of different kinds of bandwidth capability. "

For now, IBM is not making any announcements about actual product based on CMOS silicon nanophotonics. Assefa would not say if IBM intends to license the technology. But the first natural targets for the technology are IBM's own supercomputer and CMOS-based mainframe systems.

Did they state any reason why they verified it in a 90nm technology instead of a more recent technology? Will they be verifying it in a newer technology next or is it ready to go?

They said that it's ready to go with newer technology. But they weren't talking about putting it into a SoC or anything like that. But the optics technology is a size-limiter, so it makes a lot less economic sense to do it with newer processes.

If this does get into smaller chips we very well could see advancements in quantum computing with optical throughput within those connections. Think about it: an advanced connection but without all the equipment we use today. It's possible to make computers even thinner than what they are now and compact enough to house extra peripherals such as SSDs and multiple wireless connections with power, performance, and longitivity of batteries that use this type of connections. Not necessarily a reality per say right now, but with enough brain storming, it could become a reality say the next decade or two.... Just a thought..

Fascinating. I wonder if it could run an IPv6 stack? I wouldn't think that it would be useful to do so within a single computer, but the article mentions applications for linking components at a datacenter scale, which makes me wonder about the ways that you could connect all the devices. It will be interesting to see how this tech develops!

edit: should read "could run an IPv6 stack" instead of "runs an IPv6 stack".

Possibly a silly question, but how is such high speed data input (e.g. the >25Gbps mentioned here) processed by the electronic portion of the chip which is presumably capped at ~5GHz?

I think 5Ghz refers to the number of discrete operations the processor can do. But each operation can move more than 1 bit of data at a time.

I understand that CPUs can process more than one instruction per cycle etc. But in this case, there's >25Gbps coming in over (presumably?) one link, and a ~5GHz component on the other end trying to process that stream; how does that work?

A single CMOS-based nanophotonic transceiver is capable of converting data between electric and optical with virtually no latency, handling a data connection of more than 25 gigabits per second.

I may be wrong, but I think the key phrase here is "virtually no latency". It is already possible to get onwards of 25 gigabits connectivity. The problem is the components need to be near to each other for low latency. This technology seems to imply data center implementations are freed from this constraint making building and programming supercomputers with hundreds of processors cheaper and simpler.

Possibly a silly question, but how is such high speed data input (e.g. the >25Gbps mentioned here) processed by the electronic portion of the chip which is presumably capped at ~5GHz?

I think 5Ghz refers to the number of discrete operations the processor can do. But each operation can move more than 1 bit of data at a time.

I understand that CPUs can process more than one instruction per cycle etc. But in this case, there's >25Gbps coming in over (presumably?) one link, and a ~5GHz component on the other end trying to process that stream; how does that work?

Let's just say the data was being transferred using 32bit instruction registers. With a 5Ghz processor that's already 160 Gbps. I'm going to assume processors could use larger 128 or 256 bit registers to transfer data (Or receive data). I don't see how that's a problem for cpus of today to handle. Maybe I'm not understanding your question correctly.

Edit: and with my example, I'm assuming worst case that each 32 bit chunk as an instruction.

A processor's internal clock speed is not directly related to it's external data throughput. In x86 systems, the front side bus is where it counts.

This could potentially be a more efficient, simple replacement to PCI-E, especially if they can build an optical switchboard to control the data lanes.

More grand uses would be to have banks of CPU's connect to banks of RAM and storage all working on the same huge dataset. That fact that this can go up to two kilometers means that the storage can be on the other side of the datacenter instead of being directly attached. The potential for large clustered work is huge here.

Thanks guys, I do understand how processors work. But this is an opto-electronic interface, not a processor with registers and busses etc.

This device has data coming in to the optical side at 25Gbps, on what appears to be a single optical link. Unless that stream is somehow split (which this article makes no mention of) then the electronic device on the other side of the optical element must somehow process the stream, whilst (presumably) running at a speed far less than 25GHz.

I would assume that this is more complex than just some magical transistor that can send/receive electrical signals on one side and optical pulses on the other. Without the paper, or some schematics, I don't think that any of us are really able to comment on exactly how they did it. Personally, I'm guessing that its the equivalent to whatever laser module was used before in fiber optic cards, just with everything fabricated into a single chip. Essentially, it works the same way that 10Gbps ethernet cards work w/o having to run the entire processor chip at 10GHz except with light coming out the back instead of electricity.

This device has data coming in to the optical side at 25Gbps, on what appears to be a single optical link. Unless that stream is somehow split (which this article makes no mention of) then the electronic device on the other side of the optical element must somehow process the stream, whilst (presumably) running at a speed far less than 25GHz.

It has to do with the analog nature of light, compared to the digital nature of bits and processors : a single wave (of light, sound, electromagnetic field, etc...) can contain a nearly infinite number of frequencies, each of them carrying a stream of information. The photonic part of the chip must somehow be able to split the incoming lightwave into different beams carrying a part of information (demultiplexing), then the digital part of the chip extracts data from those beam at a pace of 5Ghz. If there's 5 beams, you get a digital 25Gbps data transfert rate.

Thanks guys, I do understand how processors work. But this is an opto-electronic interface, not a processor with registers and busses etc.

This device has data coming in to the optical side at 25Gbps, on what appears to be a single optical link. Unless that stream is somehow split (which this article makes no mention of) then the electronic device on the other side of the optical element must somehow process the stream, whilst (presumably) running at a speed far less than 25GHz.

But in this case, there's >25Gbps coming in over (presumably?) one link, and a ~5GHz component on the other end trying to process that stream; how does that work?

Transistors in current cpus can run at 50+ GHz. The cascade operations (imagine a multiplier, where 32 adders each need the result of the previous adder) simply make the design need many times lower clock speed. Around 10 logic operations in series proves to be the minimum length for useful work (any more, and you should pipeline).

So, 25GHz signal can be transmitted/received without much effort; it's just that doing computation of the data will be tricky. (unless the signal is just serialized data, which gets muxed into 32/64 bits)

Light-weight, press release style article. Nice work, Ars. :-/ This article does a rather bad job of defining the problems that this technology could solve. You can tell from the responses. People are talking about quantum computing and USB 4. Meh. Interesting tech marred by an article that really says nothing.

Possibly a silly question, but how is such high speed data input (e.g. the >25Gbps mentioned here) processed by the electronic portion of the chip which is presumably capped at ~5GHz?

Odds are they are using HEMTs or MESFETs to build a serial to parallel converter instead of MOSFETs which are used in a CMOS setup, as you would normally see. These can operate at a much higher clock(fastest HEMT can go up to 1THz) at the expense of static power consumption. Then, once the data is converted to parallel regular CMOS process components can handle it, as it is at a slower clock but transferred in a wider bus.

From the paper, analog power consumption is 28mW for the receiver and 36mW for emitter at 25Gbps. Does someone know how this compare to standard copper links?

That's actually the problem with building silicon only lasers.you'd need a truck battery to power your device.

I skimmed the 2010 presentation, I see a lot of wave-guides, WDM muxers etc. An impressive looking thing they call a 5Ghz optical switch.

None of this looks like is stuff you have thousands of on a chip. I think the idea is that chip designers spend a few square millimetres of die area and many mW of power to get a 25Gb/s, super-low latency optical bus.

From the paper, analog power consumption is 28mW for the receiver and 36mW for emitter at 25Gbps. Does someone know how this compare to standard copper links?

That's actually the problem with building silicon only lasers.you'd need a truck battery to power your device.

I skimmed the 2010 presentation, I see a lot of wave-guides, WDM muxers etc. An impressive looking thing they call a 5Ghz optical switch.

None of this looks like is stuff you have thousands of on a chip. I think the idea is that chip designers spend a few square millimetres of die area and many mW of power to get a 25Gb/s, super-low latency optical bus.

A processor's internal clock speed is not directly related to it's external data throughput. In x86 systems, the front side bus is where it counts.

This could potentially be a more efficient, simple replacement to PCI-E, especially if they can build an optical switchboard to control the data lanes.

More grand uses would be to have banks of CPU's connect to banks of RAM and storage all working on the same huge dataset. That fact that this can go up to two kilometers means that the storage can be on the other side of the datacenter instead of being directly attached. The potential for large clustered work is huge here.

Well motherboards could be made cheaper, with greater performance. GPUs could have access to greater storage.

Sean Gallagher / Sean is Ars Technica's IT Editor. A former Navy officer, systems administrator, and network systems integrator with 20 years of IT journalism experience, he lives and works in Baltimore, Maryland.