2018-12-27T22:51:12+08:00www.heyuhang.com/Octopress2018-12-19T12:20:51+08:00www.heyuhang.com/blog/2018/12/19/tensorflow-function-curring-in-nets-factorySlim is widely used in TensorFlow. All networks , including ResNet, Inception and MobileNet, are wrapped by a net factory: nets_factory.py. A typical network call looks like:

importtensorflowastfimportfunctoolsdefget_network_fn(name,num_classes,weight_decay=0.0,is_training=False):"""Returns a network_fn such as `logits, end_points = network_fn(images)`. Args: name: The name of the network. num_classes: The number of classes to use for classification. If 0 or None, the logits layer is omitted and its input features are returned instead. weight_decay: The l2 coefficient for the model weights. is_training: `True` if the model is being used for training and `False` otherwise. Returns: network_fn: A function that applies the model to a batch of images. It has the following signature: net, end_points = network_fn(images) The `images` input is a tensor of shape [batch_size, height, width, 3] with height = width = network_fn.default_image_size. (The permissibility and treatment of other sizes depends on the network_fn.) The returned `end_points` are a dictionary of intermediate activations. The returned `net` is the topmost layer, depending on `num_classes`: If `num_classes` was a non-zero integer, `net` is a logits tensor of shape [batch_size, num_classes]. If `num_classes` was 0 or `None`, `net` is a tensor with the input to the logits layer of shape [batch_size, 1, 1, num_features] or [batch_size, num_features]. Dropout has not been applied to this (even if the network's original classification does); it remains for the caller to do this or not. Raises: ValueError: If network `name` is not recognized. """ifnamenotinnetworks_map:raiseValueError('Name of network unknown %s'%name)func=networks_map[name]@functools.wraps(func)defnetwork_fn(images,**kwargs):arg_scope=arg_scopes_map[name](weight_decay=weight_decay)withslim.arg_scope(arg_scope):returnfunc(images,num_classes,is_training=is_training,**kwargs)ifhasattr(func,'default_image_size'):network_fn.default_image_size=func.default_image_sizereturnnetwork_fn

The entry function get_network_fn(name, num_classes, weight_decay=0.0, is_training=False) receives merely four args. However, many networks accept more than four args (more precisely, three args here as the name indicates the network name). For example, resnet v2 accepts multiple args:

A problem naturally arises: What happened here? How to appropriately call the right network function?

The mystery lies in function currying. Currying is the technique of breaking down the evaluation of a function that takes multiple arguments into evaluating a sequence of single-argument functions. Slim network factory just exploits this advantage to break down the complex and multiple arguments into prerequiste arguments that are shared by all network entry functions and accessory arguments typically belong each individual network function.

]]>2018-11-26T17:43:36+08:00www.heyuhang.com/blog/2018/11/26/tensorflow-weird-train-loss-curveI have decided to train ILSVRC2012 classification task from scratch. The neural network architecture used here is MobileNet V1. However, I have got strange loss curve while reducing the learning rate. Training configuration is simple: SGD optimizer, initial learning rate is 0.1 but decays every 0.5 epoch with decay rate 0.1.

Problem Description

It seems as if a maxim that reduced learning rate often leads to smaller loss value, or at least non-increasing loss value. But my experiment contradicts with this belief: the reduced learning rate inversely gets larger loss value, as is shown in the following loss curve figure,

Bug Traceback

This bug seems to derive from training process at first glance, that’s why I have spent a bunch of days sinking into my training code. However, it finally turns out to be the data input pipeline problem. Following TensorFlow recommended data input pipeline, I adopt tf.train.string_input_producer() to feed the train data,

Its function is to output strings to a queue for an input pipeline, with the optional parameters to control its working behaviour. string_tensor can be either a single tfrecords file name or a list of tfrecords file names. Please note that Shuffle only randomly shuffles the file names, not the instances stored within each tfrecord file. In my case, I split the 128w images into 100 shards, each shard stores about 1.2w images. As is discussed earlier, the order of 1.2w images within each shard cannot be changed once being created. That, it is highly possible that your training within several consecutive steps are conducted within a single shard. If the image instances within the shard are not randomly shuffled enough, your traning process may suffer from large label bias – For example, all the labels within the current shard are overall 0, but are 1 within the next shard.

This is where the bug lies, I did not shuffle the train.txt when generating tfrecord files. Thus, it looks like I train with a bunch of labels with a fixed learning rate, but with another bunch of labels when the learning rate decays, inevitably resulting in the loss curve increase. Another partial reason is that when training a neural network from scratch, a smaller learning rate easily leads to gradient explosion.

Solutions

Two solutions recommended here.

pre-shuffle train.txt if possible

create as many shards as possible when producing tfrecords.

]]>2018-11-22T01:40:01+08:00www.heyuhang.com/blog/2018/11/22/tensorflow-weird-test-resultI have recently encountered a weird bug with TensorFlow. Here I record what the problem is and the way to solve it.

Probelm Description

I use TensorFlow implemented MobileNet V1 for example to elaborate the problem. It is naturally to set is_traning = True in train mode but is_traning = False in test mode,

Pretty weird! Similar results have been encountered by multiple TensorFlowers. When I digged into the source code, I gradually found the root cause lies into the BatchNorm layer. According to my own experience, BatchNorm is often the curse of many deep learning bugs

Bug Traceback

In a nutshell, for the input and output pair $(x,y)$, BatchNorm layer normalizes the input with a mean value $\mu$ and variance value , plus a learnable scale parameter and a shift parameter ,

During training mode, $\mu$ and $\sigma$ are calculated within each minibatch, while in the test mode the accumulated $\mu$ and $\sigma$ are utilized to to normalize the input. That is, train mode and test mode use different $\mu$ and $\sigma$ value. Actually, TensorFlow exploits exponential moving average to calculate $\mu$ and $\sigma$ for test mode,

The decay rate $decay$ is recommended to be near 1, such as 0.999, 0.997 and 0.90. This means $\sigma$ and $\mu$ for test mode are quite different than the $\mu$ and $\sigma$ value in train mode during the train progress, but the difference gradually fakes while the traning processes. Please note that $\mu$ and $\sigma$ are initialized as $0$ and $1$ respectively, which easily leads to relatively fixed $\mu$ and $\sigma$ value during the early training stage. This is why early trained models predict all images as the same class.

Solution

Two solutions are recommended here:

Set the decay rate as zero. It means $\mu$ and $\sigma$ in test mode use temporarily calculated value within the minibatch. Although it guarantees the avoid the aforementioned bug, it is not recommended here.

Set the decay rate to a relatively smaller value and train the whole model for more steps, as more training steps reduces the gap between the two values between the train and test mode.

]]>2018-07-14T17:48:04+08:00www.heyuhang.com/blog/2018/07/14/tensorflow-get-shape-vs-tf-dot-shapeIt is desirable to delve into the bolts and nuts of get_shape() and tf.shape() as both of them can be utilized to get the shape of a tensor but careless usage of them easily leads to error. Since both of them deal with Tensor, it is naturally preferable to have a brief understanding of TensorFlow’s Tensor.

What is a Tensor in TensorFlow?

You can think that “TensorFlow deals with two things: Tensor and Operator”. In a nutshell, Tensor can be treated as N-dimensional array, storing data of various types, like tf.int32, tf.float32 etc. A tensor contains three attributes:

name, the name of a Tensor is used as an index for the tensor.

shape, describing the dimension information of the tensor.

type, showing what kind of data stored in Tensor.

Actually, TensorFlow Tensor contains two kinds of shape: static shape and dynamic shape. In TensorFlow FAQ, it says: In TensorFlow, a tensor has both a static (inferred) shape and a dynamic (true) shape. The static shape can be read using the tf.Tensor.get_shape() method; this shape is inferred from the operations that were used to create the tensor, and may be partially complete. If the static is not fully defined, the dynamic shape of a Tensor can be determined by evaluating tf.shape(t). Two things can be obtained from the FAQ:

tf.Tensor.get_shape() is a member function of TensorFlow Tensor. The shape information inferred by it may be incomplete.

tf.shape() is a TensorFlow operator, returning the dynamic shape of a tensor. Of course, the returned shape is explicitly well-defined.

if a Tensor’s shape is well-determined, tf.Tensor.get_shape() and tf.shape() return the same shape value.

Note that we don’t need a tf.Session() to get the static shape. My understanding is that since all Tensors are created during graph construction and tf.Tensor.get_shape() is a Tensor inherent member function, no session is needed to run this function. Let’s continue to read another piece of code,

We can observe that, when the shape of the tensor cannot be fully inferred during graph construction, the relevant static shape would be set as None or ?. This is why tf.Tensor.get_shape() cannot be used as method to get a tensor’s shape and further do something with this shape. Once the static shape can be full determined, however, tf.Tensor.get_shape() can be successfully exploited to build the subsequent graph without explicitly running session. (One obvious application is the classification task as the batch size is known in advance.)

tf.shape()

First of all, keep in mind that tf.shape() is a TensorFlow operator, which receives a tensor as input and outputs another tensor. That is, you can set a session in order to get output tensor’s value. You have to use tf.shape() under the circumstance of that a tensor’s shape cannot be completely inferred with tf.Tensor.get_shape().

]]>2018-07-14T08:49:48+08:00www.heyuhang.com/blog/2018/07/14/tensorflow-reisizing-image-by-keep-aspect-ratioIt is very common to resize image by keeping aspect ratio in vision community. The python relevant code is intuitive and simple, which looks like:

Before it runs as you expected, the following bug would be thrown out:

raise TypeError(“Using a tf.Tensor as a Python bool is not allowed. “. TypeError: Using a tf.Tensor as a Python bool is not allowed. Use if t is not None: instead of if t: to test if a tensor is defined, and use TensorFlow ops such as tf.cond to execute subgraphs conditioned on the value of a tensor.

It seems TensorFlow does not support python-like if numeric comparison. Instead, we must turn to tf.cond function to solve this problem. First, let’s take a look at part of the tf.cond official doc:

12345

cond(pred,# A scalar determining whether to return the result of true_fn or false_fntrue_fn,#The callable function to be executed if pred if truefalse_fn,#The callable function to be executed if pred if false)

With is in mind, tf.cond leaves us three things to do:

pred, a boolean function determining which side of image to resize. Usually tf.greater(), tf.less() are the choices. If the combination of several unary predictors together is necessary, we might need to use tf.logical_and()

true_fn, please note that the parameter true_fn is a callable function, not the return value of a function.

false_fn, the same as true_fn

We naturally want true_fn() returns the resized image tensor while false_fn() returns the original image. Thus, the following conversion is just what we want:

1234567891011121314

max_len=1024ifmax(img_h,img_w)>max_lenandimg_h>img_w:new_h=max_lennew_w=int((new_h*img_w)/float(img_h))img_tmp=cv2.imresize(img_tmp,(new_h,new_w))#convert to the following codedefresize_img_tensor_accord_h(img_tensor):new_h=tf.constant(1024,dtype=tf.int32)new_w=tf.cast(tf.div(tf.multiply(new_h,img_w),img_h),dtype=tf.int32)img_tensor=tf.image.resize_images(img_tensor,(new_h,new_w))returnlambda:img_tensorimg_tensor=tf.cond(pred=tf.logical_and(tf.greater(img_h,max_len),tf.greater(img_h,img_w)), \
true_fn=resize_img_tensor_accord_h(img_tensor), \
false_fn=lambda:img_tensor)

Similarily, the other if conditional statement can be rewritten as:

12345678910111213

elifmax(img_h,img_w)>max_lenandimg_w>img_h:new_w=max_lennew_h=int((new_w*img_h)/float(img_w))img_tmp=cv2.imresize(img_tmp,(new_h,new_w))#converted to the following codedefresize_img_tensor_accord_w(img_tensor):new_w=tf.constant(1024,dtype=tf.int32)new_h=tf.cast(tf.div(tf.multiply(new_w,img_h),img_w),dtype=tf.int32)img_tensor=tf.image.resize_images(img_tensor,(new_h,new_w))returnlambda:img_tensorimg_tensor=tf.cond(pred=tf.logical_and(tf.greater(img_w,1024),tf.greater(img_w,img_h)), \
true_fn=resize_img_tensor_accord_w(img_tensor), \
false_fn=lambda:img_tensor)

Please pay careful attention to lambda, which guarantees the return of resize_img_tensor_accod_x() is a function. Otherwise extra errors will come out.

Farewell Word

It seems TensorFlow holds large difference with Python language. Spending more time to read the official doc and sample code becomes necessary!

]]>2018-02-13T17:45:04+08:00www.heyuhang.com/blog/2018/02/13/the-hidden-compartmentalizationPeople naturally get compartmentalized when they are newly involved in an institution or organizaiton, either conciously or unconciously. Compartmentalization happens everyday without being obviously noticed. We can hardly criticize or blow it although we can easily turn to psychology or scociology for a perfect analysis and reasoning.
]]>2018-01-31T18:17:23+08:00www.heyuhang.com/blog/2018/01/31/the-secret-of-tf-dot-print-in-tensorflowDoes anyone get frustrated by Python print() function when you want to print out some tensor value during the TensorFlow graph construction? Often we want to inspect the itermidiate value for debugging purpose, but Python print() merely prints out the side information about this tensor, such as its name, shape information (so long as it can be accurately infered from the constructed graph), which is not the right information we want to get.

Neither the TensorFlow nor the print() is to be blamed. The reason for this awkardness is simple: TensorFlow splits graph construction and computation apart, Python print() merely gives a node within a graph with its basic information, such as its shape (if available), name and data type information. Let’s dive deeper with one simple example: $ e = c \times (a + b) $. The basic script should look like:

It works as the way as we expected: no tensor value is outputed! Fortunately, TensorFlow already takes care of this issue: tf.Print(). First, let’s take a look at its constructor:

12345678

Print(input_,data,message=None,first_n=None,summarize=None,name=None)

In concise, it receives the input_ tensor, outputs the data information with the prefixed message information. Before directly entering into the final correct snippet, I would like to highlight two rules TensorFlow always obeys:

tf.Print() is capable of printing out all the tensors it has access to. Here “all the tensors” means the tensors the data flows through until the current tf.Print() operator. That is, in this example, tensor a and tensor b can be accessed by tf.Print().

To calculate the defined loss value, TensorFlow would choose the most directly related operators to run by a session. That is, any “dangling” or “irrelevant” operator would not be executed.

Beneath these two rules, two things gradually become clear: First, tf.Print() can print out all tensors the data flow has covered while tf.Print() operator has been reached. Second, instead of randomly putting tf.Print() as a “dangling” or “irrelevant” operator, we have to add the tf.Print() to the place of the graph where the data has to flow through.

With aforementioned discussion, here I provide the correct code snippet:

123456789101112131415

importtensorflowastfg=tf.Graph()withg.as_default():a=tf.constant(value=1.0,dtype=tf.float32,name='a')b=tf.constant(value=2.0,dtype=tf.float32,name='b')d=tf.constant(value=3.0,dtype=tf.float32,name='d')c=tf.add(a,b)#print( c )c_output=tf.Print(c,[c,b,a],"begin to print out c, b, a respectively")e=tf.multiply(c_output,d)withtf.Session(graph=g)assess:init_op=tf.group(tf.global_variabls_initializer(),tf.local_variables_initializer())sess.run(init_op)e_val=sess.run(e)print(e_val)

The final outout of this snippet is:

12

begintoprintoutc,a,brespectively[3][1][2]9.0

The following picture illustrates the graph without and with “dangling” tf.Print() operator.

Believe it or not, if the graph is constructed with the tf.Print() operator being “dangled”, tf.Print() won’t print out anything! Thus, we should be careful when utilizing tf.Print() for either debugging or visualization purpose.

]]>2017-12-20T12:35:01+08:00www.heyuhang.com/blog/2017/12/20/tensorflow-data-reader-queueTensorflow “sucks”! There are so many pitfalls along the way you try to learn it. Tensorflow provided multiple thoughtful and foundamental wrappers to boost your development efficiency. One representative is data reader mechanism.

To better digest Tensorflow data reading mechanism, I recommend to read this blog. As recommended in this blog, the first test code you are eager to experiene looks like:

12345678910111213141516171819

importglobimporttensorflowastfinput_img_list=glob.glob('dir/*.jpg')file_name_queue=tf.train.string_input_producer(input_img_list,shuffle=False,num_epochs=10)reader=tf.WholeFileReader()key_tmp,val_tmp=reader.read(file_name_queue)# note that read() just reads one instance per timewithtf.Session()assess:init_op=tf.group(tf.global_variables_initializer(),tf.local_variables_initializer())sess.run(init_op)output_dir='my_output_dir'threads=tf.train.start_queue_runners(sess=sess)indx=0whileTrue:img_tmp=sess.run(val_tmp)withopen(output_dir+'d%.jpg'%indx,'wb')asf:f.write(img_tmp)indx+=1

Believe it or not, you would definitely encouter bug report looks like

FIFOQueue ‘_1_input_producer’ is closed and has insufficient elements tf.train.string_input_producer

What strange is that if you set num_epochs = None (the default value), it succeeds. Various online debugging suggestions overwhelmly recommend to run TensorFlow local or global value initializer. All of them do not help.

The true reason is that queue_runners always works with Coordinator together. Coordinator provides a robust thread manager, it either stop bad thread or throw exceptions when a program needs to be stopped. As a consequence, the right code snippet works as:

]]>2017-10-01T16:10:18+08:00www.heyuhang.com/blog/2017/10/01/materialism-or-anti-materialismIt is a safe bet that no body wants to be criticized as being too “materialism” as materialism is regarded as plain bad always in most cases. Putting aside those superficial controversies which inevitably involve personal perference, gregariousness, when we dig deeper into materialism and anti-materialism, things turn out to be not that easy. The truth is that most of are in somewhere stuck between these two versions, which is pretty uncomfortable. In our daily life, we are forced to deal with materialism to increase our possessions, at the the same time, we are also encouraged to feel rather bad or even guilty about it, which is internally disturbing.

Materialism doesn’t fly in the face of spiritual things

An appropriate example to illustrate this is the unusual route: religion because we all see them focus exclusively on spiritual things. It is unbelievely suprising when we realize how religion deeply correlates with materialistic staffs. The clergy people spend a lot of time thinking about how to decorate the temples, shrines, monasteries and various ceremonies. They care about these things for one simple reason because they want material things to serve as their highest and noblest purpose: the development of our souls and the cultivation of religious ideas. They have historically recognized that we are incarnate sensory bodily beings and the way to get to our souls has to at least in part through our corporal bodies, rather than merely through our imagination or intellect. In Christianity, the Jesus is the combination of the highest spiritual being and the flash blood person on the earth. Actutally, numerous material possessions involve a kind of transubstantiation where they are both practical and physical and embodied to allue to a positive personality or spirit.

We are stucking between them

“The things you own end up owning you. It's only after you lose everything that you're free to do anything.” ― Chuck Palahniuk

Exactly the same concept discussed above applies many areas outside religion. Let’s take a look at the luxury domain. The watch designed by Dieter Rams, to the outside it is just an ordinary timepiece. At the psychological side, it steps further and serves as a kind of transubstantiation: it tells the time but it also hints psychologically the ideals of simplicity, purity and harmony floating around the watch, nudging us towards being a certain sort of person. Thus buying and using these materialized staff can also match our inner evolution and gives us a chance to get closer to our better selves.

The indispensable intertwining of materialism and anti-materialism does not necessarily mean that all consumerism turns out to be great. In most cases, it depends on what kind of the materialized object it is and who uses this object. A materialized staff can be transubstantiated to the very worse side of human nature, greed, calousness etc. Also definitely, it can be the very best side of human nature. What matters is that we should be careful and not to decry or celebrate materialism easily. We have to ensure shine the very best side of materilism and hide its dark side.

]]>2017-08-28T20:09:11+08:00www.heyuhang.com/blog/2017/08/28/does-creative-work-equals-to-create-sth-newMy interest in this topic is immediately ignited while reading a GRE verbal article. While it is nearly unanimously accepted that creative work has to create something new, or at least transcend or challenge the established one. Few of us really notice that this conclusion applies to some particular domain, especially in science and technology, but it is suspicious when applying to other aesthetic domains, like the arts. Above all, I can’t help presenting this article here verbatim, it is so so … beautiful.

Extraordinary creative activity has been characterized as revolutionary, flying in the face of what is established and producing not what is acceptable but what will become accepted. According to this formulation, highly creative activity transcends the limits of an existing form and establishes a new principle of organization. However, the idea that extraordinary creativity transcends established limits is misleading when it is applied to the arts, even tough it might be valid for the sciences.

Difference between highly creative art and highly creative science arise in part from a difference in their goals. For the sciences, a new theory is the goal and end result of the creative act. Innovative science produces new propositions in terms of which diverse phenomena can be related to one another in more coherent ways. Such phenomena as a brilliant diamond or a nesting bird are relegated to the role of data, serving as the means for formulating or testing a new theory. The goal of highly creative art is very different: the phenomenon itself becomes the direct product of the creative art.

Furthermore, the article illustrates the difference between art and science by pointing out that their relevant goals are different, while the goal of science is to produce a new proposition that never existed before, the goal of an art sometimes is the art itself.

Shakespeare's Hamlet is not a tract about the behavior of indecisive princes or the uses of political power; nor is the Picasso's painting Guernica primarily a propositional statement about the Spanish Civil War or the evils of fascism. What highly creative artistic activity produces is not a new generalization that transcends established limits, but rather than aesthetic particular. Aesthetic particulars produced by the highly creative artist extend or exploit, in an innovative way, the limits of an existing form, rather than transcend that form.

More examples are provided to testify that artistic works are aimed at extending or exploiting, rather transcending any existed form.

This is not deny that a highly creative artist sometimes establishes a new principle of organization in the history of an art field; the composer Monteverdi, who created music of the highest aesthetic value, comes to mind. More generally, however, whether or not a composition establishes a new principle in the history of music has little bearing on its aesthetic worth. Because they embody a new principle of organization, some musical works, such as the peras of the Florentine Camerata, are of signal historical importance. but few listeners or musicologists would include these among the great works of music, On the other hand, Mozart's The Marriage of Figaro is surely among the masterpieces of the misc even though its modest innovations are confined to extending existing means, It has been said of Beethoven that he toppled the rules and freed music from the stifling confines of convention. But a close study of his compositions reveals that Beethoven overturned no fundamental rules. Rather he was an incomarable strategist who exploited limits - the rules, forms and conventions that he inherited from predecessors such as Haydn and Mozart, Handel and Bach - in strickingly original ways.

I wasn’t able to fully understand the true difference between science and art until this article cleared the air.

]]>2017-08-27T11:28:12+08:00www.heyuhang.com/blog/2017/08/27/what-is-amazonWhen talking about Amazon, what is the first term jumping into your mind? While most of you would say the worldwide e-commerce platform, today’s Amazon has far already reshaped itself into cauldron of various high-techs.

It is amazingly interesting that in its early days, Amazon just sold books, far has not yet been today’s omni-product e-commerce platform. When reexamining Amazon today, it is the giant acrossing e-commerce, logistics, intelligent hardware, data storage/processing and even robotics. Obviously, it is internally inadequate to define it as merely an e-commerce medium, although hardly can we accurately and comprehensively give it a name.

Flywheel: philosophy behind the whole Amazon business

Amazon CEO Jeff Bezos often cites “flywheel” as the operation philosophy beneath the Amazon system, the term was originally formulated by Amazon business consultant Jim Collins back in the early days of Amazon. It operates as, to make the flywheel circling, the company cuts the prices to attract more customers and consequently increase the overall sales. With the attracted huge customer base reciprocation, the company benefits from the sales economy such that it can cut prices again, spinning the flywheel anew.

Traditional bricks-and-mortar storefronts suffer most from the price-war launched by Amazon (of course, Amazon should not be responsible for it at all) as they are more fragile in resilience of the negative impacts of low-price. Amazon itself paid for price-war bog too. Tracing back Amazon history, while launched in May 1997, Amazon bled money for the next six years, followed by another decade with barely profit eked out. It is hard to imagine how Jeff Bezos managed to convince Wall Street and Amazon share holders to disregard Amazon’s bad financial performance over years and ignore the lackluster profit enarnings. Fortunately, all the expenses earlier eventually begins to pay back. At last count, Amazon membership program Prime has an estimated 85 million subscribers, nearly equivalent to two-thirds of American households. An astonishing number that marks Amazon’s great success and its unprecedented market occupancy.

Amazon Diversifies: Don’t look down Amazon

Customer first and exceptional customer service are two criterions Jeff Bezos use to make a decision or persuade the dismayed investors or shareholders. But achieving this goal is not easy. As the soul pillar at Amazon, you have to constantly flatter your investors and shareholders spiritually, internally you need to encapsulate yourself as the desirable and energetic leader with your workers’ religious reverence and willingness to work for you. Externally, you need to create various services or facilities, either practical or formal, to stop your customer’s querulous mouth and to satisfy their nearly perverted needs. What Amazon chooses to cope with these sophisticated situations is simple: Diversify.

Amazon chooses to branch out its inherent business. Any tiny and novel business that relates to Amazon’s original business could be settled and launched by Amazon, regardless of its potential financial feedback and even the risks of being an outsider. For example, the keen need to high efficiency of item delivery as well as production, Amazon steps into robotics industry, bringing a fleet of trucks to mechanically sort items within or between warehouses, drastically escalating the overall productivity and at the same time minimizing the human-prone errors. Also, to reduce the gap between the customer’s ordering the item and the item delivery, Amazon invest on drone to deliver the item through the “air traffic”. Although relevant regulations have banned Amazon’s experiments on drone item delivery in US. As an alternative, Amazon moved the bulk of experiments to UK and the first drone’s delivery to an actual customer in Cambridgeshire countryside was achieved in December this year, which is really a milestone to Amazon’s automation process.

Amazon’s massive success rely more on its complex logistics empire. A recent survey shows that Amazon’s delivery infrastructure network contains more than 180 warehouses, 28 sorting centers and 59 local package delivery stations, covering about 44% of US population who are guaranteed to live within 20 miles of at least an Amazon warehouse or delivery station. Just imagine how the logistics empire enpowers the whole Amazon when comparing with any of its existing or potential competitor.

Diversification process is not always direct to Amazon’s core business. It allows to reinvent new business. The Amazon cloud service, known as Amazon Web Service (AWS), was incubated within Amazon in the early days by a small group of engineers, who keenly sensed the huge market need for standardized and basic-infrastructure ready could service. Today, AWS conquered more than 34% of the worldwide computing service, compared with a combined of 24% share by Google, IBM and Microsoft. The biggest advantage of AWS is that it releases startups from heavy, repetitive basic infrastructure refactoring work so that they can more focused on the products and core technology that truly stand them out above the bruising competition pool.

Amazon also seems to be interested in creating the unknown. Who has ever imagined the e-book reader kindle which tries best to simulate the real paper reading experience is brought into reality by Amazon, after all Amazon per se is assumed to be just an e-commerce platform. The recent released Amazon Echo reaffirms Amazon’s ambition to become an indispensible role in building the world with the forthcoming AI era.

What is Amazon?

It seems to be unlikely for someone to turn back Amazon’s ambitious march to be diversified and to be more sophisticated. The opportunities as well as the risks await Amazon is unpredictable. Rather than directly answering what is Amazon, we prefer to discuss Amazon with topics on the ground. Amazon is …, of course, it tries to cover and conquer every aspect of our life, it attacks and retaliates its competitors without any hesitation, it hides its monopoly suspect currently by letting the customer happy ingeniously and wisely. Don’t forget Jeff Bezos said: If we take care of customers, the stock will take care of itself in the long term.

Disclaimer: the above picture is taken by Ted S. Warren. This blog is referenced the online blog, it just reflects my own thoughts and some numbers might be incorrect.

]]>2017-08-10T15:55:35+08:00www.heyuhang.com/blog/2017/08/10/genchi-genbutsu-toyota-production-systemI recently came across an article introducing the Toyota production system twelve pillars operating the whole system efficiently and innovatively. One pillar, dubbed Genchi Genbutsu , left me deep impression.

Genchi Genbutsu, the long-standing practice of Japanese Toyota production system. A more intuitive understanding of it is “Go and see for yourself” in English. The basic idea beneath Genchi Genbutsu is to drag all leaders from their high perches down the ground production floor, so that they can actively engage with front-line workers, assembly line, production workflow. Whatever a potential problem arises, it can be solved shortly or at least alleviated ahead of time before becoming worse. Genchi Genbutsu creates a broader mix of minds bringing their talents, innovation, creativity together to speed up the company’s development, ignoring company’s organizational structure cliche.

However, the nature of this phrase is less about the physical act of onsite visiting but more to do with a personal understanding of the full implications of any action within an environment as a whole (from UK Toyota website).

]]>2017-06-18T21:20:14+08:00www.heyuhang.com/blog/2017/06/18/delving-deeper-into-inception-moduleOne contribution Google has made to the deep learning community, as I assume, is the Inception network architecture family, including GoogleNet, Inception-v3 and Inception-v4. The core component of Inception family is, of course, the inception module. You cannot avoid figuring out theoretical side of Inception module, further the difference between GoogleNet inception module and Inception-v3 inception module.

GoogleNet Inception Module

]]>2017-06-18T00:02:27+08:00www.heyuhang.com/blog/2017/06/18/delving-deeper-into-convolution-operationConvolutional Nerual Network (CNN) has been serving as the hallmark for various vision tasks. But do we really understand the nuts and bolts of image convolution operation? Let’s ignore the mathematical analysis of convolution operation, which has been fully covered by a variety of online blogs. Instead, the question that haunting my brain is the way how convolution works, especially when it encounters input with multiple channels but output with also multiple but different channels:

For example, the input Blob is $N \times C_1 \times H_1 \times W_1$, the output Blob after convolution is $N \times C_2 \times H_2 \times W_2$. While the Blob height and width are easy to understand, how does convolution operation change $C_1$ to $C_2$ ?

I turned to Google for help but nothing useful has been found. Finally, I decide to read the Caffe the source code, which is the most prestigious guide to figure out any problem, to dig out the answer.

How Caffe Address it Step by Step?

The core code snippet for Caffe convolution operation is in conv_layer.cpp and base_conv_layer.cpp (we just consider the weight $w$ and ignore the bias $b$ for conciseness)

From the above two code snippets, we can learn the convolution operation pipeline: within a mini-batch, in each iteration, we feed a bottom instance, which is multiplied by a weight matrix to get the relevant top instance. That is, the bottom blob is mapped to the top blob via a weight $W$. $W$ has conv_output_channel_ rows, which is responsible for mapping the bottom blob to top blob’s channels.

Graphical Anatomy

Reading source code gives us machine biased and rigorous analysis, graphical illustration, on the other hand, provides us with an intuitive and direct understanding. From my perspective, the best way to absorb convolution operation is through graphical visualization.

The overall graphical illustration is given below:

]]>2017-05-30T11:13:12+08:00www.heyuhang.com/blog/2017/05/30/how-to-deterimin-labelCaffe stores all input dataset within a mini-batch by data_ and label_ blobs, both of which are declares in Batch class:

12345

template<typenameDtype>classBatch{public:Blob<Dtype>data_,label_;};

However, label_ blob is not always prerequisite. That is, not every input data corresponds to a label (or several labels). Under this circumstance, a question arises: How to instantiate or ignore the label_ blob?

Good question. Let’s dive into the source code. Actually, the BaseDataLayer class holds an variable bool output_labels_, controlling whether the DataLayer should outputs label_. Further more, the value of output_label_ is automatically determined by the by the DataLayertop blob size via the following code:

12345

if(top_size()==1){output_labels_=false;}else{output_labels_=true;}

That is, you have to explicitly pinpoint how many top blobs you want to extract from the bottom datum. An typical data layer usually looks like

While you instantiated two top blobs: data and label. Caffe system automatically assumes you need label_ blob, thus would calculate it. OK, problem solved: if you instantiate two tops in your data layer, Caffe would output label_, otherwise Caffe ignores it.

A Little Bit More

A natural extension of aforementioned question is that: what if we have to output more than two blobs in our data layer, i.e. 3, 4 and more. Great question. But remember you have to write many codes to achieve it. First, please take a glance at the Batch class definition. It currently supports only two blobs. Once you want to involve more blobs, you have to modify it to hold more blobs. One example might look like this following one

With the new defined Batch class, you enjoy much flexibility to arrange your dataset.

Hope you enjoy it!

]]>2017-05-03T21:19:31+08:00www.heyuhang.com/blog/2017/05/03/how-to-add-data-layer-in-caffeCaffe is initially designed for classification. That is, an image corresponds to an integer label. In many real-scenario applications, however, we want the deep neural network to accept multiple images as well as multiple float/integer labels. To this end, we have to write a new data layer, which is actually complex as you have to rewrite or change the source code in many places. Here I give a step-by-step hands-on guide to achieve it. (although many alternatives exist, such as python layer, image data layer and other temporary schemes, they are not perfect solutions, especially when we deal with large amounts of images and we have to convert them into LMDB/LEVELDB dataset to speed up training.)

How Caffe Data Layer Work

Before directly writing codes, we’d better figure out the details of how caffe data layer works. The following figure illustrates the data layer dependency with other relevant layers.
Note that the latest Caffe abandoned the DataReader Layer (still I don’t know why), so it might not work well with your caffe. Anyway, here I try to clear the way of the black box of how Caffe converts image input (in most cases) into caffe-acceptable data format. First, let’s look at the anatomy of these layers.

DataReader Layer

DataReader Layer is responsible for reading the new data format you defined in the caffe.proto and further converting them into caffe Batch dataset. Before directly entering into how DataReader handles your defined data format, let me first inset how Caffe uniformly stores and manages these data. Actually, Caffe purposely defines a class Batch to handle this (in base_data_layer.hpp):

12345

template<typenameDtype>classBatch{public:Blob<Dtype>data_,label_;};

Yes, Batch stores whatever data format you defined into two Blobs: data_ and label_. Keeping this in mind helps you to jump out of the redundancy as well as the uncertainty of caffe code. All you have to do is to convert whatever your defined data to the data_ and label_ blobs respectively (if label_ is necessary). DataReader Layer reads one of your defined data via full().peep() and full().pop() each time or drops it via free(). DataReader inherits from InternalThread and BlockingQueue layers, both of which are somewhat difficult to fully understand as they guarantee to read dataset sequentially and correctly without thread blocking.

BaseData Layer

DataReader just reads your defined data and he doesn’t know how to transform it to data_ and label_ blobs appropriately. For example, how to determine the four dimensions $(n, h, w, c)$? how to split your defined data into data_ and label_ respectively? BaseData Layer is responsible for this task. Specifically, BaseData Layer involves a BasePrefetchingDataLayer, which holds the class variable transformed_data_ and class function load_batch() and data_transformer_. data_transformer_ is used to infer the data_ blob shape as well as to transform the original input data. transformed_data_ is an intermediate datum that stores the temporary transformed datum. load_batch() is responsible for loading your defined data one by one until reaching the batch size, forming the final data_ and label_ blobs.

Take a deep breath, it’s hard to fully understand the working mechanism without reading the code line by line. So, do not expect to skip the irritating code snippets chewing when you truly want to add new data layer! In sum, there are five steps to add a new data layer:

Summary

Actually, I do not expect you can grasp the capability to add a new data layer by just reading this blog. What I hope for this blog is to provide you with an intuitive understanding or a clear guide that can help to quickly figure out what you should do step by step to achieve the final goal.

Hope you enjoy it and don’t feel that obscure.

]]>2017-05-03T15:22:40+08:00www.heyuhang.com/blog/2017/05/03/shell-string-truncationAny time you have to truncate a string in shell scripting environment? Don’t be panic or even turn to Python. Here are roughly eight ways of string truncation in shell scripting.

% truncation: delete the sub-string in the right side, but retain the sub-string in the left side.

12

var=Hand_bag-114_cluth_0.jpg
echo${var%.*}

%.* means starting from the right side, delete all characters until the first . is met. Note that the first . is also deleted. The final result is: Hand_bag-114_cluth_0

# truncation: delete the substring in the left side, but retain the sub-string in the right side.

12

var=Hand_bag-114_cluth_0.jpg
echo${var#*_}

#*_ means staring from the right side, delete all characters until the first _ is met. Similarly, the first _ would also be delted. The final rsult is: bag-114_cluth_0.jpg

## and %% truncation: unlike the two aforementioned commands, the two comannds scan until the last (yes, not the first) pattern is met.

123

var=Hand_bag-114_cluth_0.jpg
echo${var%%_*}#Handecho${var##*_}#0.jpg

:n1:n2 truncation: starting from the left side with the location n1, keep n2 characters and delete all others.

12

var=www.google.com
echo${var:0:3}#www

:n1 truncation: starting from the left side with the location n1, keep all characters until the string ends. Note the character of location n1 would be kept.

12

var=www.google.com
echo${var:4}#google.com

:0-n1:n2 truncation: what if we need to keep characters from the right side. This is a simple trick to do this: just adding 0- to the start point. Note that, since counting from the right side, we start from 1. After find the start point, we continue to keep characters from left-to-right

12

var=www.google.com
echo${var:0-10:6}#google

:0-n1 truncation: similar to the previous one. But as n2 is omitted, we keep all characters until the string ends.

12

var=www.google.com
echo${var:0-10}#google.com

One more thing: how to assign the truncated string to a new variable. Here is it: var2=\backquote echo {var:0-10}\backquote

]]>2017-03-19T19:20:32+08:00www.heyuhang.com/blog/2017/03/19/clothing-pattern-1-plaidRecently, I have been working on fashion pattern tagging, resulting in the necessity of familarizing common fashion patterns. Then, the first pattern emerging in my mind is “plaid” (cause I am a programmer? HAHA). Here let’s talk some basic info about it.

Plaid: Definition

Plaid, which is often called in North America, is synonymous with tartan in Scotland where tartan is often used as kilt accessory or blanket on the bed. Visually, plaid is the fabric woven of alternating bands of color in crisscross (or simply horizontal and vertical direction) manner. In general, a plaid pattern is knitted with both warp and weft at right angles. If two different colors meet, a mixture of the two colors is created. If the same color meets, a solid color would be generated.

Traditionally, plaid was once been banned by the government for the low-class to access to. Nowadays, it is adopted by the Scotland as the symbolic national dress instead.

Plaid: Construction Rule

Suppose two threads, one in the warp the other in the weft at the right angles. If they meet and carry different color, a mixture of color would be generated. Otherwise, a solid color would be generated (if they carry the same color). Thus, a base of two colors creats three different colors, including one mixture color. The total number of the colors w.r.t the base color number increase quadratically. That is, if the base color number is $n$, a total of $\frac{1}{2}n(n+1)$ colors would be created.

Plaid: Fashion

Traditionally, plaid has always been catalogued as fashion representation, especially in Vectorian and Edwardian eras. By then, the plaid shifted from being mainly for woman’s clothing to became an important part of woman’s fashion. Plaid has traditionally been used to rebel and discontent with the ruling class, especially for unorthodox usage of plaid. In this way, plaid has be treated as an anti-establishment symbol. Besides, as plaid is intimately associated with British aristocrasy and military, plaid also developed an air of dignity and exclusivity.

]]>2017-01-14T16:43:52+08:00www.heyuhang.com/blog/2017/01/14/revisit-sigmoid-and-softmax2017-01-03T18:26:28+08:00www.heyuhang.com/blog/2017/01/03/calcu-two-sum-with-n-complexityIn leetcode practice, one simple practice is that given an array of integers, return the two indices of the two numbers such that they add up to a specific number. For example, a[] = {2, 3, 6, 9} and target = 6, we should return 1, 2(the indices of 3, 6). An intuitive algorithm is to write two for loops, like,

However, it is of $O(N^2)$ computation complexity, which is unbearable in practice. Yes, a much more fast algorithm is desirable and essential. So, let’s step further and think that what if the array a[] is pre-sorted so that the second for loop is ignorable and the computation complexity callapses into $O(N)$? Sorting an array usually requires $O(\log N)$ complexity\,(like fast sorting). Therefore, pre-sorting the array reduces the complexity to $O(N \cdot \log N)$,

No, that is not enough. Does the sorting $O(\log N)$ complexity can be further eliminated? Remember that the retrieval complexity of harsh table is $O(1)$, why not first storing those numbers into a harsh table so that the overal complexity is reduced to $O(N)$.