War starts when people fail to communicate with each other. The current U.S. and China dispute is so complex and overreaching, any rational discussions online can devolve into flame wars. There are so many topics, making the multi-variable optimizations difficult. Overlaying all this with a gloomy long-term implications of technology, it is far easier to just pick a side and rooting for the red / blue team.

The Gloomy Long-Term Implications of Technology

It is far easier for the Bay Area people thinking themselves as a force of good. But the technology we developed over the past a few years greatly expanded central governments' ability. It is too easy to track down a person, collect all their communication records, for profiling and categorization. Alternative technologies to combat these implications such as end-to-end encryption can be easily outlawed at governments' will. It is pleasantly surprising to see the United States resisted so long. As Republican given up their ideology completely for the totalitarianism fantasy, finally, the expansion of the executive branch power will result, not necessarily a president for life (although likely), but at the very least, a one-party state. Whether it is Republican or Democrat are besides the point. Populists, on either far right or far left, come dangerously close in ideology terms. After all, the United States has a Republican president running unprecedented fiscal deficit and issuing orders to anyone by the name of national security right now.

The Chinese has been playing the one-party state game for too long. The art of ruling, lies in appeasing many, allowing a few to vent, and exterminating anomalies. The digital technologies allow them to scale up. With such surveillance power, the crime rate will fall, so does the freedom.

The Gear Up to a New War

When a new war begins to break out, both sides first stop talk with each other. The media on both sides seem to have agendas. In China, the media appeals to the nationalistic honor, tries to remind the average Chinese the the past under western imperialism with Opium War and Korean War. In the United States, the media paints an evil axis of China, tries to gain a moral high-ground for the U.S. position. The sheer number of fanatics for both sides makes civil discussion impossible. It seems that media are well-positioned to setup the war between the two power.

What the United States Wants

The current trade war is difficult partly because the United States demands are fairly opaque. It is a baggage of things, ranging from pure economical to pure political. It is understandable because the Trump administration are not known for making crisp clear demands. There are feelings, numbers, ideologies, all bagged together in the trade deal.

The Feelings: the United States felt that they were in a one-sided relationship. In the past two decades, it benefitted more to the Chinese. This can be seen from stagnation of the U.S. growth and the stellar growth of China. More specifically, the feeling can be seen from the broad ban of the U.S. internet companies in China, the joint-venture requirements for any U.S. adventures to the Chinese domestic market. The fact of great many made-in-China products means the less of made-in-America. That again, attributed back to the stagnation of the U.S. common people for the past decade.

The Numbers: the United States sees the hard-cold trade imbalance as a proof that the relationship is truly one-sided. If the Americans make less than the Chinese from this relationship, isn't it enough to prove the United States lost?

The Ideologies: to many Americans, the Communist China by the prefix is evil. The behavior in Tibet, Xinjiang and South China Sea is a proof the communists will go far to suppress oppositions. Many years of propaganda in the United States attributed the end of the Cold War to the superiority of Capitalism over Communism (rather than, for example, the open government over the authoritarian government).

It makes the U.S. demand unlikely to be simply economical. If the U.S. wants a balanced trade, the problem should already be solved last year. The Chinese wants to buy from the U.S. to the extent of anything the U.S. wants to sell. The agricultural products in a little over the past decade rose from 0% to almost 20% of total U.S. exports to China. There are a long list of things that the Chinese want to import but banned by national security reasons.

Beyond the economical demands, the U.S. wants to fix the open-market problem. The Chinese was quick to extend the olive branch on that front with the 100% Tesla-owned factory in China, even with some Chinese investments.

The sticky points, lie in the alleged IP theft, cyber warfare and the humanitarian concerns. The Chinese was quick to promise. But the United States wants more than a promise.

What's China's Red Line

One misunderstanding from the U.S. media and discussions, is how serious the Chinese regarding sovereignty. There are many disputes in China about how the slow progress to implement open-market hurts the mutual trusts within WTO. During the interview with Ren Zhengfei on May 21st, he mentioned this as well. The humanitarian aspects with current regime is another topic has many resonating audience within China. However, imposing a U.S. based overseeing body in the Chinese governing system is difficult for the Chinese to swallow. The sovereignty issue is a big part of Chinese education in the past half a century. The extraterritorial rights granted to westerners since Opium War are something the Chinese will not forget.

The Endgame

With the United States being the only world super power, it has the full range of options to play out the endgame. Given the unpredictability of the Trump administration, the trade war could end tomorrow with only a lip service to appeal the electoral base. It is always back to what the United States views China in the long term. If the United States sees its role to contain China and sees China as the evil axis that endangers the U.S. dominated world order, the United States should escalate fearlessly to a war with China while it can, do what it is the most familiar with (toppling the regime). The consequence of that, is a far weaker, poorer China, with 1.5 billion people that cannot feed themselves. I wish to appeal to many of my American friends, this is an undesirable humanitarian dilemma.

Alternatively, the United States could fool itself into the sanction game. Even without coordinated efforts with Europe and Japan, the sanction from the United States will greatly damage the Chinese with limited negative impact to the U.S. corporations. However, it is unlikely the United States will see a more friendly China there. With the us v.s. them mentality, it is hard to imagine a pro-American regime being born that way. An inward-looking China will ultimately poses greater threat than an outward-looking one.

The United States has to recognize that without a hot war, it needs to work with China. The shared sovereignty request is not acceptable, by both the regime and the people. On the other hand, if the United States wants a friendlier China, the demands should be a rule-based mechanism that enforces IP protection and the participation of foreign capital. The right to participate made-in-China 2025 would also be a far more interesting play for the United States than forcing China to abandon them.

When programming with CUDA, there are several ways to exploit concurrency for CUDA kernel launches. As explained in some of these slides, you can either:

Create thread corresponding each execution flow, execute serially on stream per thread, coordinate with either cudaEventSynchronize or cudaStreamSynchronize;

Carefully setup CUDA events and streams such that the correct execution flow will follow.

The 2. seems more appealing to untrained eyes (you don't have to deal with threads!) but in practice, often error-prune. One of the major issue, is that the cudaEventRecord / cudaStreamWaitEvent pair doesn't capture all synchronization needs. Comparing this to Grand Central Dispatch provided primitives: dispatch_group_enter / dispatch_group_leave / dispatch_group_notify, the under-specified part is where the cudaEventEnter happens. This often leads to a surprising fact that when you cudaStreamWaitEvent on a event not yet recorded on another stream (with cudaEventRecord), the current stream will treat as if this event is already happened and won't wait at all.

This is OK if your execution flows is static, thus, all the kernels need to be executed on which stream, are fully specified upfront. Requires some careful arrangement? Yes, but it is doable. However, it all breaks down if some coordinations need to happen after some kernel computations are done. For example, based on the newly computed losses, to determine whether decrease learn rate or not. Generally-speaking, for any computation graph that supports control structure, these coordinations are necessary.

The obvious way to solve this, is to go route 1. However, that imposes other problems, especially given pthread's handling of spawn / join is something much left to be desired.

For a few brave souls wanting to go route 2. to solve this, how?

After CUDA 5.x, a new method cudaStreamAddCallback is provided. This method itself carries some major flaws (before Kepler, cudaStreamAddCallback could cause unintended kernel launch serializations; the callback itself happens on the driver thread; and you cannot call any CUDA API inside that callback). But if we can gloss over some of these fundamental flaws and imagine, here is how I could make use of it with the imaginary cudaEventEnter / cudaEventLeave pair.

At the point I need to branch to determine whether to decrease learn rate, before cudaStreamAddCallback, I call cudaEventEnter to say that a event need to happen before certain stream to continue. Inside the callback, I get the loss from GPU, makes the decision, and call cudaEventLeave on the right event to continue the stream I want to branch into.

In real world, the above just cannot happen. We miss cudaEventEnter / cudaEventLeave primitives, and you cannot do any CUDA API call inside such callback. More over, the code will be complicated with these callbacks anyway (these are old-fashioned callbacks, not even lambda functions or dispatch blocks!).

What if, I can write code as if it is all synchronous, but under the hood, it all happens on one thread, so I don't have to worry about thread spawn / join when just scheduling work from CPU?

In the past a few days, I've been experimenting how to make coroutines work along cudaStreamAddCallback, and it seems all working! To make this actually useful in NNC probably will take more time, but I just cannot wait to share this first :P

First, we need to have a functional coroutine implementation. There are a lot stackful C coroutine implementations online and my implementation borrowed heavily from these sources. This particular coroutine implementation just uses makecontext / swapcontext / getcontext.

Setup basic data structures:

Setup a main run loop that can schedule coroutines:

Now, create a new task:

Usual utilities for coroutine (ability to yield, launch a new coroutine, and wait for existing coroutine to finish):

With above utilities, you can already experiment with coroutines:

Unsurprisingly, you should be able to see print outs in order of:

coroutine f first executed, it launches coroutine g. When g gives up control (taskyield), coroutine f continues to execute until finish. After that, scheduler resumes coroutine g, and it finishes as well.

You can also try to taskwait(task, gtask) in coroutine f, to see that f will finish only after coroutine g is scheduled again until finish.

So far, we have a functional coroutine implementation in C. Some of these code doesn't seem to make sense, for example, why we need a mutex and a condition variable? Because a secret function that enables us to wait on a stream is not included above:

taskcudawait will put the current coroutine on-hold until the said stream finishes. Afterwards, you can do branch, and knowing comfortably kernels in the stream above are all done. The condition variable and the mutex is necessary because the callback happens on the driver thread.

It seems above utilities would cover all my usages (the taskwait and taskresume are important to me because I don't want too much hard to control async-y when launch sub-coroutines). Will report back if some of these doesn't hold and I failed to implement fully-asynchronous, control structure supported computation graph with these cute little coroutines.

NNC is a tiny deep learning framework I was working on for the past three years. Before you close the page on yet another deep learning framework. let me quickly summarize why: starting from scratch enables me to toy with some new ideas on the implementation, and some of these ideas, after implemented, has some interesting properties.

After three years, and given the fresh new takes on both APIs and the implementation, I am increasingly convinced this will also be a good foundation to implement high-level deep learning APIs in any host languages (Ruby, Python, Java, Kotlin, Swift etc.).

What are these fresh new takes? Well, before we jump into that, let's start with some not-so-new ideas inside NNC: Like every other deep learning framework, NNC operates dataflow graphs. Data dependencies on the graph are explicitly specified. NNC also keeps the separation of symbolic dataflow graphs v.s. concrete dataflow graphs. Again, like every other deep learning framework, NNC supports dynamic execution, which is called dynamic graph in NNC.

With all that get out of the way, the interesting bits:

NNC supports control flows, with a very specific while loop construct and multi-way branch construct;

NNC implements a sophisticated tensor allocation algorithm that treats tensors as a region of memory, which enables tensor partial reuse;

The above allocation algorithm handles control flows, eliminates data transfers for while loop, and minimizes data transfers for branching;

Dynamic execution in NNC is implemented on top of its static graph counterpart, thus, all optimization passes available for static graph can be applied when doing automatic differentiation in the dynamic execution mode;

Tensors used during the dynamic execution can be reclaimed, there is no explicit tape session or requires_grad flag;

You can read more about it on http://libnnc.org/. Over the next a few months, I will write more about this. There are still tremendous amount of work ahead for me to get to a point of release. But getting ahead of myself and put some pressure on is not a bad thing either :P The code lives in the unstable branch of libccv: ccv_nnc.h.

Ten years ago, I began to post some predictions of 4-year in the future. The principle of these predictions are simple: it was a combination of things we chatted, things I read, and a stash of reasonable imaginations. Later, to make this a bit more fun and educating, I would also map out the potential market political environments before the prediction. With this setup, everything now looks more systematic and professional. But to be honest, everything that is going to happen in the next a few years has already set in motion today. It won't be that entertaining to predict that Apple will design the 14th generation of iPhone in 2022.

That's been said, what it would look like in 2022, now the 2018 starts to unfolding?

First, the elimination of poverty in China. In 2021, the 100th anniversary of Communist Party of China, the leadership in China will announce that they have finished building the moderately prosperous society in all respects. For China, the moderately prosperous society in all respects is a measurable goal, and the end result is the elimination of poverty. In 2022, China's GDP per capita will reach 10,000 USD. If China cannot reach that goal, everything else is not very meaningful to predict due to the global instability.

The main theme of 21st century is the decline of American power. But in this 4 years, we can only see occasional hints of such, this nation is and continue to be the main player in global economy, and the major powerhouse for technology development.

There are at least 5x improvements in raw computation power from heterogeneous computing paradigm. Single chip can reach 0.5 Petaflops (full 32-bit floating point) by the end of 2022. On-device memory per GPU card can reach up to 48GiB. The view for mobile is not as rosy however. In the next 4 years, the price of mobile system-on-chip (SoC) will continue to go down, but the speed on traditional workload will not improve much, and at max, 2x. More work will go into function-specific optimizations and feature integration.

Now, grand scene has been set, what will happen next?

Cars. In 2022, most production electric cars and luxury vehicles will have level-3 autonomous driving capability. Middle-class will now drive more electric cars while for families with annual income less than $30,000, they will continue to drive cars with internal-combustion engine. Although most electric cars on sales have level-3 autonomous driving capability, there is no viable after-market component for level-3 autonomous driving.

Level-3: No human attention needed in most highway and local environments. The system will alert the driver under certain conditions.

To many people's surprise, traditional car manufacturers who started early in the electric vehicle market don't have much first-mover advantage. Specifically, BMW's i-series sales number will plunge. Ford and Toyota, who were once considered late-comers to the market now both have successful battery electric vehicles (total unit sales exceeding 100,000). Even so, globally, there will be two or three new but established all-electric car manufacturer. Tesla, who had some false starts in autonomous driving technology finally gets to level-3 in 2020. Its Model 3, is either a huge success (200,000 unit sales per year) or a moderate one (80,000 unit sales per year). The most popular battery electric car? We probably haven't seen it yet, and it is likely to be a cross-over between minivan and compact SUV.

High-speed railway. We've been talking about the high-speed railway from Mumbai to Hyderabad since 2013. At the end of 2022, it will finish. The high-speed railway between San Francisco and Los Angeles? It probably hasn't even broken the ground.

With the stability of oil price, and the mass use of the new generation airplanes, there will be more ultra-long distance non-stop flights (more than 16 hours). There will be no regular supersonic commercial flights by 2022 though.

The HIV vaccine will hit the market in the next 4 years. This probably will be the single most known medical breakthrough in that 4 years. A lot of important breakthroughs, often have some miniscule starts. Cheap brush-less motors, SoC, and cheap sensors, thanks to the ubiquitousness of smartphones. These gadgets becomes the important ingredients of why information agriculture now works. Especially in Asia, you will find some modern lean production farms with high yield and high quality produce. Equipped with information technology, these farms have yield more than 2x of their industrial-farm counterparts, closer to the yield of small labor intensive farming.

2017 was called the origin year of AR. However, even after 4 years, there will be no mass-market successful AR hardware (more than 5 million total unit sales).

And it finally happened, Amazon, just before the end of 2022, starts deliveries with drones in certain area of North America. These delivery drones, along with autonomous trunk on the inter-state high ways (one driver, many trunks), symbolizes the beginning of elimination of low-paying jobs.

Lucky for all of us, after a economic downturn, Bitcoin will stop being an investment vehicle.

Decades have passed before we had a yet high quality consumer software. It is now taken that software supposes to be crashy, laggy and barely functional. Why and how we get here? When the question is asked, many people felt the nostalgia, where the software is simpler, and people crafts their cathedral. They often overlooked the fact that the software we built today, was many orders of magnitudes more complex than software we had in 1980s. Even today's software with simplest tableau operations, its graphic user interface combining with the complex animations and multi-touch interactions, if, built from scratch, requires many months of developer time.

For what I can remember, concept of quality was popularized in 1970s from Japan. 1970s, through the quest for quality, the Japanese auto industry reached the level of low cost that its American competitors could only dream of.

It has been 4 years since the last prediction for the year 2016. My original plan is to draft a prediction every 2 years, and scope for the next 4 years. Gates once said, we always overestimate the change that will occur in the next two years and underestimate the change that will occur in the next ten. A decade ago, having computing devices as small as a palm with Pentium 4 computational power was unimaginable. Even 8 years ago, it was a difficult fate for us to build an all-in-one TV with high-end PC capability.

Review the Prediction of the Past 4 Years

The prediction of the past 4 years has been accurate. The biggest promise of economic stability has been kept with all the unusual fiscal policies, otherwise such predictions can hardly be any believable if at all. Reviewing the prediction I made 4 years ago, Internet connection speed, the unfortunate market share of 3D TV, Television on demand, computational power, driving assistance (self-driving), and photography technology have matched the reality pretty well. However, for wireless power source, Pads and ultrabook merging, commercial supersonic flight, unemployment rate, and artificial intelligence has been off quite a bit. No predictions on unmanned aerial vehicles. Overall, some of these predictions are too optimistic, and some of these are simply ignorant.

The Economic / Social Outlook for the Next 4 Years

However, it is harder to predict the next 4 years on the same social / economic stability promise. Globally, the economy growth slowdown will be a given. On the contrary, the United States will be least affected due to the dominance of Dollar in Global economy. In Europe, it is unlikely the economic situation in Spain, Greece and other Mediterranean countries will get any better. As slow as politics go, the possibility of one or several countries exiting euro-zone becomes ever more real. However, under the gloomy environment, Japan's outlook improved marginally after several scheduled tax hikes. The tricky bits, is China. China would likely to take either two paths:

1). Its GDP will land at around 4.5% to 5.5% yoy growth in the next 4 years. This is after a controlled turbulence landing, with some finesse mix of fiscal / monetary stimulus. Overall, the fiscal sheet is more balanced, and as the world manufacturer, China integrates more efficiency in its system, and it is harder to compete on efficiency front even with much lower labor costs. This is a China as a newly-minted developed nation, seating comfortably among the rest of developed nations with GDP per capita between $9,000 and $10,000.

2). Its GDP will land at 4.5% or even below in the next 2 years and will be considered as fatal. Fiscal and monetary tools seem ineffective due to large amounts of capital outflow, as well as loosen control over capital in general after 2008. The social uprising turns out to be much easier than expected. The regional government would be hard to contain the unrest, and the central government would likely to have several rounds of negotiations with opposition leaders, it becomes impossible to predict what would happen afterwards.

For the sake of making any progress on this prediction, I will pick the China option 1 as the background for the next 4 years. If option 2 turns out to be closer to the reality, it nullifies all the predictions I am going to make below.

India, for the lack of systematic knowledge in that area, it is hard to predict the impact of India to the global technology and economy outlook. For Russian and Middle-East oil-producing countries, the assumption will be that oil per barrel will float around $40 to $100, and Russian's economy will struggle nevertheless due to the more volatility in the oil price.

The Basis of Any Predictions

The success of any prediction, if at all, looks at the past patterns. For the past 100 years or so, it has been the capturing and interpretation of exponential growth. It has been emphasized in enormous books and talks about the fascination of exponential growth. However,by applying exponential growth, without the underlying understanding of technological principles, we risk of hitting some fundamental laws of the physics, and makes no progress at all (and on the other hand, a premature prejudice of "understanding" the fundamental limits of physics, can be fatal too).

The exponential growth is made possible only with two key terms: standardization and the economy of the scale. The modern marvel of this kind, is the iPhone. Without the scale of the iPhone, modern high resolution screen with capacitive touch will cost thousands dollars to manufacture per square inch. But now, everyone gets a modern high resolution touchscreen with a few bucks.

These two key words, will manifest themselves in many forms, and will continue to play wonders in the next 4 years.

The Prediction

The smart hardware has been around for more than 10 years. But what makes sense as a "smart hardware"?

It makes the basic functionalities we assumed about that hardware a no-brainer. Smooth, one touch, perfect and care-free integration;

It extends beyond the basic functionalities, but operates under well-defined principles (good example, a router that caches cloud content and make the access instantaneous, bad example, a refrigerator that orders food for you);

It is unlikely to be something completely new.

Then, there is the un-PC era. In the next 4 years, homes rarely own any desktop computers, even though aggregated processing power in a single-family house can easily reach more than 10Tflops. There is a change of the interface too. People now interact with these devices by either touching or talk. The graphical interfaces now have a meaningful conversational re-touch.

Despite the potential conflicts and regional instability, the transportation will be more cost effective. In terms of the land transportation, self-driving or smarter driving assistant will be standard add-on in newly shipped vehicles. However, it is far from becoming the mandatory standard. The Abu Dhabi PRT was a failure in the Middle-East, but similar transportation services will run commercially in some cities. The next generation of long distance land-transportation is still in experimental phase in the United States. Not only that, some of the longest commercial flights are cancelled due to the cost. Commercial transportation is going to be more expensive, and slower.

Entertainment industry gets a big boost in time of recession. People still spend disproportionate time on big television, The movement of "cutting-the-cord" will happen much faster than expected. The United States 15 to 35 year viewership on cable will drop at the rate of 10% to 20% year over year and accelerating. Today's top TV show numbers (5m viewer at the premiere) will keep steady. But shows with 2m to 3m premiere viewership will see a drop to 1m or less. In the United States, online streaming players will ink deals with major sports and have exclusive rights to stream online. People will spend more than 3 hours a day on streaming services, either on television or on their mobile devices.

Shared economy is not going the way you would expect. At its core, shared economy moves the assets out of the company such as AirBnb or Uber's balance sheet and bumped up its profitability. At boom times, asset-light companies can move fast and quickly get rid of less profitable businesses painlessly. At down times, these companies will try to own more assets as the asset prices are all cheap. However, the most popular way for them to do so will not be out-right purchase. Instead, they will launch finance programs to help its share economy workers to own these assets, and leave the risk of asset depreciation to them.

The mobile messaging service will consolidate. Respectable players on messaging service will reach 300m daily active users, and have at least 2b message sent per day. Any player cannot reach that hallmark will be dead. There will be only 3 to 4 major players in that space, if not less. All the messaging services will have the ability to make audio and video calls, which will continue to marginalize the phone call service business for traditional phone service providers. In the United States at least, more than one online-based business will enter ISP business. The speed of the Internet will continue to improve. Home Internet speed globally will average to 100Mbps. Global mobile Internet speed will average to 10Mbps. Specifically, the mobile Internet service in Middle / South Africa will reach average 500Kbps. In the other word, as long as you can pay, with your cellphone, you can have semi-stable Internet connection and will be able to do video calls anywhere in the world except Antarctic.

Cost-effectiveness is penetrating medical equipments. With lower cost of processing power and general application of machine learning techniques in signal processing, popular and essential medical equipments will reach a point that are cheap and versatile enough to even be delivered to the most remote area on Earth. The profound impact will be a global lift in life expectancy.

Virtual reality gears will have tractions in many more homes. They are still struggling to find its killer applications. But on average, shipped units per year will be around 30m globally at the end of 2019. Industrial robots will replace more human labor, which is a good thing for China. Privatization of space technology continues. One or more private companies will accomplish at least one low-orbit manned mission.

Birthdays are often joyless for me. I've yet to find this as an excuse to celebrate for. But the clock is ticking, and every time, it kicks me hard on the back this day of the year. It is always a thing for me to accomplish something, to set a goal, and work towards it. But looking back, I've done nothing tangible, not even to mention worthy causes. When I die, I die.

硅谷老是说，让世界变得更美好。再年轻一点的时候，也觉得热血沸腾。But now, I only want to touch lives. So be it one, or two, or many. 但是成长这么大，除了父母，也没有人会在乎我是活着还是死去了。过去三四年，却也没有做成什么事情。Life is not about a house, a car or a pack of children. What I want, is to make beautiful objects and put these into people's hands.