Skillset

In the last two days, I described on this page my efforts at cracking the Facebook Hacker Cup 2013 challenge – three separate puzzles. Today we will examine the final one: Find the Min. The actual problem is described below:

After sending smileys, John decided to play with arrays. Did you know that hackers enjoy playing with arrays? John has a zero-based index array, m, which contains n non-negative integers. However, only the first k values of the array are known to him, and he wants to figure out the rest.
John knows the following: for each index i, where k <= i < n, m[i] is the minimum non-negative integer which is *not* contained in the previous *k* values of m.
For example, if k = 3, n = 4 and the known values of m are [2, 3, 0], he can figure out that m[3] = 1.
John is very busy making the world more open and connected, as such, he doesn't have time to figure out the rest of the array. It is your task to help him.
Given the first k values of m, calculate the nth value of this array. (i.e. m[n - 1]).
Because the values of n and k can be very large, we use a pseudo-random number generator to calculate the first k values of m. Given positive integers a, b, c and r, the known values of m can be calculated as follows:
m[0] = a
m[i] = (b * m[i - 1] + c) % r, 0 < i < k
Input
The first line contains an integer T (T <= 20), the number of test cases.
This is followed by T test cases, consisting of 2 lines each.
The first line of each test case contains 2 space separated integers, n, k (1 <= k <= 105, k < n <= 109).
The second line of each test case contains 4 space separated integers a, b, c, r (0 <= a, b, c <= 109, 1 <= r <= 109).
Output
For each test case, output a single line containing the case number and the nth element of m.

The description of the challenge is not very good. I can tell because I solved the challenge and had a difficulty of understanding some of the things in the text that we’ll see later on. Basically, we need to read the input file two lines at a time; the first line constitutes the variables n and k, while the second line constitutes the variables a,b,c and r. We need to read all those numbers in the program and calculate some basic mathematics over it to get an array of numbers. At the end, we should display the last number in the array, which presents the actual result.

Solving the Challenge

I guess this challenge was the easiest among the three to solve; the main catch was understanding what the challenge was asking of us, but once we figure it out, writing the code is trivial. The first thing we need to figure out is the fact that the elements in the array are being calculated based on the value k, like this:

m[0] = a ; k=0

m[i] = (b * m[i – 1] + c) % r ; 0 < i < k

m[i] = min(m[i-k]…m[i-1]) ; k<=i<n

This is also the trick part to figure out because the challenge is not very clear about that. We can see that the first element of the array is always the number a, as it was read from the input file. The elements from the first element till the k’th element are calculated based on the following formula: m[i] = (b * m[i – 1] + c) % r. This is presented well in the text of the challenge. The tricky part is the understanding of the array elements, which have the index greater than k. The text of the challenge says the following, which describes how those elements should be calculated:

John knows the following: for each index i, where k <= i < n, m[i] is the minimum non-negative integer which is *not* contained in the previous *k* values of m.

This really isn’t a very detailed description of how the elements k<=i<n should be calculated. From the description above, it’s immediately evident that all those values should be calculated as follows:

min(m[k]…m[i-1])

But actually, this is wrong! We shouldn’t take whole range from m[k]…m[n-1] into consideration when calculating the next integer to append the array, but only the previous k characters! This is the important part that is left for the reader to figure out. I’m actually not very sure why this isn’t better presented; maybe it’s because the Facebook writers intentionally left some ambiguous sentences in there just to mess with the hackers solving the challenges. The actual formula should have been written like this:

min(m[i-k]…m[i-1])

There’s not a big difference, I know, but this is what decides whether we’ll be able to successfully solve the problem or not.

We’re reading the variables n,k,a,b,c,r into the program and using them to calculate the value we need. First we’re using the given formula to calculate the first k numbers and afterwards we’re also finding the minimum value of the last k elements in the array. The program works and I’ve tested it against the given input file and the output was correct. But here’s the catch: what if the program uses really high values of n,k,a,b,c,r? Well, the most important variables to be worrying about are n and k, which greatly add the cpu and memory usage of a program. Did I say this was the easiest of the three problems? Well, now I can bite myself in the tongue. It is very easy to write the program that can operate with small numbers, but if the n and k variables are really high, the program posted above will fail miserably. So in order to shorten the time period in which it calculates the right number m[n-1], we must optimize the program.

We’ll be operating on just one example. For clarity, we’ll be using the input file given below (this is the first case of the original input file):

1
97 39
34 37 656 97

If we print the first k values of the array m, we’ll get the following numbers:

There are exactly 39 elements in the array, which is exactly the number k (from the input file). From what we can understand now, this can’t really be optimized; the numbers must be calculated one way or the other, so it’s better to move on to the second part of the problem. We should now calculate the rest of the elements in the array, rather than wasting time, we’ll come back here and optimize further if necessary, but for now, let’s leave it as it is. Then let’s print every number returned by the calc_higher function. In our example, we get the following numbers:

The last number is 8, which is also the correct result of the first sample case, but let’s closely look at the elements returned. We can see that only the k+1 values are different, after that the values start to repeat themselves. So the number 0 it the first number and the number 39 is the last number of the k window. This looks like a good way to start optimizing our code. We can rewrite the program to only calculate the k+1 values and then just return the right number.

We’ve changed only the calculating of k+1 numbers after the initial k numbers, which can be seen in the code below:

Basically, we changed that only the range till k…k*2+1 should be calculated, instead using the range from k…n, which was used before. By using this, we successfully solved the problem if the variable n is very big, because we traversed the problem from n to k. Thus, we’ve successfully solved the problem for big n numbers. But what about big k numbers? Those are still a problem, but we’re one step closer to the solution.

The next thing we can do is optimize is the calc_higher and calc_lower functions. We don’t really need the calc_lower function since we only have one line of code to execute, plus using the function adds an overhead we just can’t afford right now. Additionally, we also want to get rid of the calc_higher function, but we’ll do it somehow differently. Instead of moving the code, we’ll implement it with heap sorting algorithm. In Python we can use the heapq module, which can be found here: http://docs.python.org/3/library/heapq.html. By using the heap sort, the elements pushed to the queue are automatically sorted, so the minimum element is in the front of the queue and the maximum element is at the back of the queue. Since we need the numbers that are not contained in the previous k elements, we need to insert the opposite numbers in it. Thus, if one of the previous k elements in the array is 12, then we shouldn’t store the element 12 in our heap queue. Let’s take a look at the whole program now, which correctly solves the problem:

We can see that we don’t have any function right now and we’re using heap sort queue to keep the right element at the beginning of the queue, so we don’t need to search for it every time. The code is properly commented, so I won’t explain it in detail, but I’ll just say that at the beginning, I’m iterating over the elements and appending the elements not contained in the previously calculated k elements into the queue, which is exactly what I need later on. Then I’m calculating the rest of the elements by appending the minimum element from the heap sort queue to our array and pushing the element that falls just out of the k window to the heap queue (if it isn’t already contained in there).

The algorithm works as expected, but it can also be greatly optimized. Let’s present the results of the input file that was appended to the challenge, we can see that on the output below:

We can see that the challenge is way harder than the previous two challenges, even though at the beginning it doesn’t seem to be so. I’ve had the most fun with this challenge, even though I didn’t completely optimize it as much as I could have.

Dejan Lukan is a security researcher for InfoSec Institute and penetration tester from Slovenia. He is very interested in finding new bugs in real world software products with source code analysis, fuzzing and reverse engineering. He also has a great passion for developing his own simple scripts for security related problems and learning about new hacking techniques. He knows a great deal about programming languages, as he can write in couple of dozen of them. His passion is also Antivirus bypassing techniques, malware research and operating systems, mainly Linux, Windows and BSD. He also has his own blog available here: http://www.proteansec.com/.

Your email address will not be published. Required fields are marked *

Comment

Name *

Email *

Website

Save my name, email, and website in this browser for the next time I comment.

− 5 =

About InfoSec

At Infosec, we believe knowledge is the most powerful tool in the fight against cybercrime. We provide the best certification and skills development training for IT and security professionals, as well as employee security awareness training and phishing simulations. Learn more at infosecinstitute.com.

Connect with us

Join our newsletter

File download

First Name

Last Name

Work Phone Number

Work Email Address

Job Title

Why Take This Training?

How will you fund your training?

What is your training budget?

InfoSec institute respects your privacy and will never use your personal information for anything other than to notify you of your requested course pricing. We will never sell your information to third parties. You will not be spammed.

Comments

What is Skillset?

Skillset

Practice tests & assessments.

Practice for certification success with the Skillset library of over 100,000 practice test questions. We analyze your responses and can determine when you are ready to sit for the test. Along your journey to exam readiness, we will:

1. Determine which required skills your knowledge is sufficient
2. Which required skills you need to work on
3. Recommend specific skills to practice on next
4. Track your progress towards a certification exam