The code below works fine for the sample inputs given there, but for input size as big as 109, this takes hours to return the solution.

Problem statement:

After sending smileys, John decided to play with arrays. Did you know
that hackers enjoy playing with arrays? John has a zero-based index
array, m, which contains n non-negative integers. However, only
the first k values of the array are known to him, and he wants to
figure out the rest.

John knows the following: for each index i, where k <= i < n,
m[i] is the minimum non-negative integer which is not contained in
the previous *k* values of m.

For example, if k = 3, n = 4 and the known values of m are [2, 3, 0], he can figure out that m[3] = 1.

John is very busy making the world more open and connected, as such,
he doesn't have time to figure out the rest of the array. It is your
task to help him.

Given the first k values of m, calculate the nth value of this
array. (i.e. m[n - 1]).

Because the values of n and k can be very large, we use a
pseudo-random number generator to calculate the first k values of
m. Given positive integers a, b, c and r, the known values
of m can be calculated as follows:

m[0] = a
m[i] = (b * m[i - 1] + c) % r, 0 < i < k

Input

The first line contains an integer T (T <= 20), the number of test cases.

2 Answers
2

1. Improving your code

If you used if __name__ == '__main__': to guard the code that should be executed when your program is run as a script, then you'd be able to work on the code (for example, run timings) from the interactive interpreter.

The name func is not very informative. And there's no docstring or doctests. What does this function do? What arguments does it take? What does it return?

func has many different tasks: it reads input, it generates pseudo-random numbers, and it computes the sequence in the problem. The code would be easier to read, test and maintain if you split these tasks into separate functions.

Reading all the lines of a file into memory by calling the readlines method is usually not the best way to read a file in Python. It's better to read the lines one at a time by iterating over the file or using next, if possible.

When computing the result, you keep the whole sequence (all \$n\$ values) in memory. But this is not necessary: you only need to keep the last \$k\$ values (the ones that you use to compute the minimum excluded number, or mex). It will be convenient to use a collections.deque for this.

Similarly, it's not necessary to build the set of the last \$k\$ values every time you want to compute the next minimum excluded number. If you kept this set around, then at each stage, all you'd have to do is to add one new value to the set and remove up to one old value. It will be convenient to use a collections.Counter for this.

The runtime of the program is approximately proportional to both \$n\$ and \$k\$, so extrapolating from the above timings, I expect that in the worst case, when \$n = 10^9\$ and \$k = 10^5\$, the computation will take about five months. So we've got quite a lot of improvement to make!

3. Speeding up the mex-finding

In the implementation above it takes \$Θ(k)\$ time to find the mex of the last \$k\$ numbers, which means that the whole runtime is \$Θ(nk)\$. We can improve the mex finding to \$O(\log k)\$ as follows.

First, note that the mex of the last \$k\$ numbers is always a number between \$0\$ and \$k\$ inclusive. So if we keep track of the set of excluded numbers in this range (that is, the numbers between \$0\$ and \$k\$ inclusive that do not appear in the last \$k\$ elements of the sequence), then the mex is the smallest number in this set. And if we keep this set of excluded numbers in a heap then we can find the smallest in \$O(\log k)\$.

However, we still expect that when \$n = 10^9\$ and \$k = 10^5\$, the computation will take around an hour, which is still far too long.

4. A better algorithm

Let's have a look at the actual numbers in the sequence generated by iter_mex and see if there's a clue as to a better way to calculate them. Here are the first hundred values from a sequence with \$k = 13\$ and \$r = 23\$:

Let's prove that this always happens. First, as noted above, the mex of \$k\$ numbers is always a number between \$0\$ and \$k\$ inclusive, so all numbers in the sequence after the first \$k\$ lie in this range. Each number in the sequence is different from the previous \$k\$, so each group of \$k + 1\$ numbers after the first \$k\$ must be a permutation of the numbers from \$0\$ to \$k\$ inclusive. So if the (\$k + 1\$)th last number is \$j\$, then the last \$k\$ numbers will be a permutation of the numbers from \$0\$ to \$k\$ inclusive, except for \$j\$. So \$j\$ is their mex, and that will be the next number. Hence the pattern repeats.

So now it's clear how to solve the problem. If \$n > k\$, element number \$n\$ in the sequence is the same as element number \$k + 1 + n \bmod (k + 1)\$.

If making the counter takes a long time because k is very large, you could consider making it in 'chunks', reading say the smallest 100 values from m initially then reading another 100 only when i gets larger than the smallest 100.

If that's still not fast enough you could try using deque also from collections, and manually updating the set in each cycle of the loop. You can also take advantage of the fact that the number removed from the top of the list on each cycle gives you a clue as to what the next number has to be (higher or lower); and can use a slightly simplified algorithm when dealing with numbers that the programme has added to the list (in which any sequence of k values can contain no duplicates), as opposed to the original pseudo-random list (which may). (Again, the following is untested)