Post navigation

Here we record some basic data structures which are easily involved in coding.

PriorityQueue
Sometimes, we want to sort hashmap by its value, we can introduce PriorityQueue to override its compare function to reach the goal. (Laster, I will add more compare function to popular PriorityQueue’s field.)

For recursion method, we might need to copy the last result to current result. For example, for f(i-1), its result is left (List<Integer>) and for f(i), its result is to merge left with current value.
Wrong version:

The problem of the wrong version is that it will change left’s content. For example, if left=[[], [1]] and nums[1]=2, the wrong version’s output is [[2],[1,2],[2],[1,2]]. For the right version, it creates a new ArrayList to physically build a new list, so later add method won’t change original content. So the right output is [[], [1], [2], [1,2]].

Stack

Stack<String> s = new Stack();
s.push("zara");
s.peek();
s.pop();

String

int integer = 1;
String t = String.valueOf(integer);

The string is very common, so the only special point is to convert to char[] when needed.

The more LeetCode questions we practice, the more rules we find. We can split all questions to serval categories, such as Stack, Map/HashMap, Dynamic Programming, etc. It looks good to use these existing data structure to improve the solution, but sometimes, we still need to build our own suitable data structure to fit some questions’ special scenario.

Question:

Given a binary tree, find the maximum path sum. For this problem, a path is defined as any sequence of nodes from some starting node to any node in the tree along the parent-child connections. The path must contain at least one node and does not need to go through the root.

Analysis:

For any node, it has two cases, one is single side(not circle), the other is go-through-circle. We need to record the two cases for each node and then to be provided to parent’s node to evaluate. So it is quite clear to solve the problem. Now you can use int[] which contains 2 elements to code it, but we all know it is not the good one and hard to read. The better solution is to build our own data structure to describe the two cases clearly.

Here we conclude above two cases to two parameters: singlePath is to record the maximum path for single side(not circle) and maxPath to record the maximum value for the current node, including circle case and non-circle case. So the whole codes are here:

Meanwhile, we comparing singlePath with zero, to determine whether to involve it or not.

Another clever point is to return Integer.MIN_VALUE, not zero for the empty node. This is very clever to solve the case, like only one node with a negative value. Because in this case, this negative value is still the maximum value for this node.

Today we talk about overflow. This is a very shy problem when interviewing. And it is also a very good way to distinguish whether you are a good programmer or just so-so programmer.

When I interviewed in a big company (I forgot its name, maybe eBay), the interviewer asked me to write a sum function. So easy?! In fact, No. You have to consider overflow for your solution.

Concept of Overflow and Underflow

First, let’s understand what’s overflow and what’s underflow. Overflow and underflow are related to a data type. Every data type has its own range. For example, int, in Java, you can use Integer.MIN_VALUE and Integer.MAX_VALUE to know its range. In Java arithmetic operators, it doesn’t report overflow and underflow problem and never throws an error exception. It just simply swallows it.

int operators:
When the value of an operation is larger than 32 bits, then the low 32 bits only taken into consideration and the high order bits are discarded. And when its most significant bit(MSB) is 1 then the value is treated as negative.

floating point operators:
Overflow will result in Infinity and underflow will result as 0.0.

Solutions to avoid overflow or underflow

Second, there are several ways to avoid it.

using long to replace int

using uint to replace int

using double to apply intermediate variable

use the mod method to avoid it, This is a very common solution in LeetCode, within the question, it already obviously reminds you to use mod 100000007 if the answer is very large.

Examples

Finally, I list my solution for LeetCode-576 (Out of Boundary Paths) (Here I don’t list the content of the question, you can read it from LeetCode’s website. ) to help you understand how to avoid overflow issue. Meanwhile, it is a good way to see how to use three dimensions array to fix the problem with dynamic programming method.

For next following posts, I will write down some tips when coding. These are all summaries while I practice on LeetCode. I know there are so many LeetCode answers of questions online, but that is a just question-to-answer model. If we just read the question and then answer it, we never grow up fast. We need to know why others’ algorithm is faster than mine and what’s the differences between them; in future, how can we apply others’ good places into our own. So, let’s start!

Example-1:

We all know Map (key-value pair) is a good data structure to optimize algorithm speed sometimes. Because Map’s get speed is O(1) with well-designed hash function. But if we don’t write it well, it still might lose its advantage.

Better Version:

For small volume of data, this kind of advantage might be not too obvious, but for big data, the second version is much better than the first one.

It is quite easy to understand its improvement. For the first version, it has “addAll” to insert the whole list to another one. From physical side, we know it needs to handle a lot of things, such as allocating space. In fact, ArrayList.addAll already did some optimization, like allocating enough space to add every single element in one step, rather than having to reallocate multiple times if adding large numbers of elements. To avoid these things, the second version uses an empty data (” new ArrayList<>()” ) to well get rid of space allocation issue.

Example-2:

The Question requests to design a set data structure that supports all operations in average O(1) time.

We see the great difference between the two version is that we introduce a Map to store each value’s index; for remove function, we swap the target with the last element of the list and then finally delete the last one.

The reason we improve like this is to reach O(1) purpose. ArrayList’s indexOf and remove method are opposite with this target.

remove(int)
This method removes a single element at given position. After that all elements from the right are shifted to the left via System.arraycopy.all, so this method has O(n) complexity. Each call to remove the last element would not invoke System.arraycopy all, so such method call complexity would be O(1).

remove(Object)
This method removes the first occurrence of a given element from the list. It iterates all list elements, so it has O(n) complexity. This method would access all array elements in any case – either read them while looking for requested element or move them from one position to the left by System.arraycopy call after requested element was found.
Never call remove(Object) when you know at which position you can find your element.

contains(Object), indexOf(Object)
The first method checks a given object whether it’s present in the list (and defined as indexOf(elem)>=0). The second method tries to find the position of given element in the list. Both methods have O(n) complexity because they are scanning an internal array from the beginning in order to find given element.

Summary of List:

add elements to the end of the list

remove elements from the end

avoid contains, indexOf and remove(Object) methods

even more avoid removeAll and retainAll methods

use subList(int, int).clear() idiom to quickly clean a part of the list

Each SSL Certification has its own valid date. When your server’s certification is expired, your website will be not visitable. In this case, you need to renew your expired certification. It is quite simple and easy. But Here I just write it down to record its steps.

Step1: check its valid date

openssl x509 -in domain.crt -noout -enddate

Step2: copy the new certificate files to your server

This step depends on your service, I mean which SSL service you get. For me, I get this service from Godaddy. I need to go to Godaddy to get new crt files. There are two crt files which you need to download.

And then you need to use SCP command to copy these files to your server. If you forget the target location, you just need to go to your Nginx’s conf file to check this parameter: ssl_certificate and you will know where to copy to.

par
If we have N tasks which don’t have any relationship between each task, like order, shared variable, we can consider using the parallel collection to fasten computation.
The parallel collection will make use of max-currency depending on the number of cores to execute which will greatly improve function’s efficiency.

Note:
Sometimes, the sequential implementation might have better performance than parallel implementation. That’s because using parallel collection has some overhead for distributing(fork) and gathering(join) the data between cores. Thus one can conclude having heavy computations, parallel collections can be of great performance improvement.

Future
Future is the same with par which can reach to the same purpose.

Avoid unnecessary loop
Even though we have the parallel method to fasten steps, avoiding unnecessary loop is still needed. For my code, I use the random method to handle tokens selection problem to make sure the resources are fair to every customer. This is benefit from the probability theory.

print out thread name to debug parallel running status

println(Thread.currentThread().getName)

Separate ExectureContent
If you don’t want to influence default ExectureContent, you can create additional one to separate it.

newCachedThreadPool vs newFixedThreadPool
newFixedThreadPool:
Creates a thread pool that reuses a fixed number of threads operating off a shared unbounded queue. At any point, at most nThreads threads will be active processing tasks. If additional tasks are submitted when all threads are active, they will wait in the queue until a thread is available. If any thread terminates due to a failure during execution prior to shutdown, a new one will take its place if needed to execute subsequent tasks. The threads in the pool will exist until its is explicitly shutdown.
newCachedThreadPool:
Creates a thread pool that creates new threads as needed, but will reuse previously constructed threads when they are available. These pools will typically improve the performance of programs that execute many short-lived asynchronous tasks. Calls to execute will reuse previously constructed threads if available. If no existing thread is available, a new thread will be created and added to the pool. Threads that have not been used for sixty seconds are terminated and removed from the cache. Thus, a pool that remains idle for long enough will not consume any resources. Note that pools with similar properties but different details (for example, timeout parameters) may be created using ThreadPoolExecutor constructors.
If you have a huge number of long running tasks I would suggest the FixedThreadPool. Otherwise, please choose CachedThreadPool.

As we all know, there is a huge DDoS attack recently which influences lots of websites. At first beginning, our server didn’t get influenced by it. But last week, I allow ssh server by password.(In the past, we only allow the user to use the public key to ssh server. But last week, we just want to allow a user to log in simply and fast. We open it temporarily and forget to close it) This week, the server is totally attacked. Anyway, the final problem is that our server is blocked by increasing useless threads and bandwidth is used up. So here I list how I find these problems and how we try to fix it.

Step1: check network status

According to check network status, you will find bandwidth is too high which influences other normal customers to use this server’s resources.

sudo apt-get install nethogs
sudo nethogs
sudo nethogs eth0 eth1

Step2: find exact threads

Except knowing the bandwidth status, you also need to know thread status. By viewing real-time thread status, you will know there are many malicious threads which take too many resources. For my case, I find there are 300 malicious threads which are created every 2 minutes. It is not hard to understand that the final server is slow enough to undertake these increasing malicious threads.

sudo apt-get install htop
htop

Step3: kill useless threads

After known these malicious threads, we need to kill them. In fact, killing them can solve this problem by root cause. Because for now I only know its effect, not the root cause. Luckily we find a bash script which causes it. So I kill this bash script together. Until now, it looks like we already finish everything. But things haven’t done. After 8 hours stable network, the server is attacked again. I can’t ssh into the server. So final solution is to shut down and rebuild the server. So for now, I don’t know the root cause.

sudo kill -9 $(pgrep <useless_threads_main_name>)

Step4: add your own public key to ssh

In order to avoid the attack happens again, I involve public key back. Because I’m sure this attack happened when I open password login.

Step5: disable password to ssh

Conclusion:

Using high-level security configuration is needed to avoid this kind of attack. Once this malicious attack happens, the best way is to backup your data as soon as possible and rebuild your server. (During investigation process, I also install several kinds of maldetect tools. I’m trying to use these tools to scan out the malicious scripts/codes/software. But unfortunately, they all failed.)

I still need more knowledge to help me understand and find out the root cause. Keep learning.