Interview question

I was asked to code this during an interview for a medior java developer post (with main tasks to code front end logic and web services around some c backend engine). I think you will like it.

You are given a laptop with java 7 and eclipse juno, and a clean workspace. Give yourself 5 minutes to read the text, and then start your stopwatch.

The interviewer came back after 20 minutes (obviously was in a hurry), and i was stopped, and just got the ds and the isRepeated() method. While you are presumably not terrified normally, you should be able to get quite far in 20 minutes.
Can you get it done in time? Anyways, it is a nice indicator if you can handle the details.
I am curious to your solutions.

[START OF TEXT]

The context is that external applications log every query request of a user, many possible duplicates, and we want to extract only the relevant minimal information from those log data. Assume user and query are uniquely identified by their names, so case sensitive and no trimming needed. The company likes unit tests.

1. write a function

that returns true iff user and query not yet in memory.

2. write a function
that removes any query that is older than 15 minutes ago.

3. allow for users and queries to be added concurrently, such that the above two functions run thread safe.

Keep the solution simple, using an in-memory data structure. Code this in 30 minutes. Only the working code counts, writing everything half that does not work is not counted. If you are proud of your code, that is a bonus.

[END OF TEXT]

(dont read the solution below just yet!)

Later, at home, i wrote the main program in 1 hour and simulated 10 users firing queries out of a set of 50 queries and sysout-ed a top 5 of most ranked queries every minute, in a total of about 6 hours, but that is just to see the methods come alive and not part of the scope whatsoever.

Solution:
---------
Here is one of possible solutions.

After reading the above, you realize the following:
A) isRepeated(user, query):
For 1 this means that
- called for the first time, it should return false, for a given user1 and query1,
- but asked for the second time it should return true.
- Called with user1 and query2, it should return false.
- Called with user2 and query1, it should return false.
- so the isRepeated(·) should modify some data structure (ds), mapping the users and his queries...

B) Derive the starting point of the ds.
* there is a continuous stream of users and queries, do we need a ds for that,
too? The text suggests that some external environment fires those queries, and can be considered a given (so it is not of concern here).

* there are n users, each user queries m requests, so given a user we want to keep track per query if it has been asked before,
example:
the ds would be something like
which is ugly and should be written using a query class like this, later:

For 3 this means you should make the ds , and think about its constructor values later, such as load factor and #shards

C) For 2, it means however that we also need the time the query was asked by the user, this suggests a second ds

And 2 also means that we need to compare dates and remove elements from the ds,
so we will need a removeQueries(user, date), and a compare like tooOld(date, 15 minutes), where 15 minutes is 15 * 60 seconds = 15*60*1000 milliseconds
essentially a compare of
and return true if time1<time2, but since Date(long) doesn't really cut it it is best to use long
as the underlying metric. HashMap<..., long> does not work, so we use a Long.

D) Now the heart of the matter becomes more clear:
- code the ds initially as

Ideally we would for this ds
* create a class Log, default constructor, with methods
hasQuery(user, query),
either static or as a Singleton, but we don't have time for that much code.

- isRepeated() algorithm adds an entry in log_asked + log_time, and if it already was present,
updated the Boolean and the Long.
1. if (log_asked.is null or is empty) add user, query, set boolean to false, set time to now and exit.
2. if( log_asked not contain user) add user, query, set boolean to false, set time to now and exit.
3. if(log_asked for user not contain query) add query in ds, set boolean to false, set time to now and exit.
4. //last case: user and query in ds:
get query, set boolean to true, set time to now, and exit, returning true.
Given the above cases, the code would be most elegant in the order 4, 3, else (=1+2).

Note that if the user had 1 query and it is >15 min. old, the question is ambiguous if the user has to be removed or not.

E) Concurrency: use volatile Concurrent for all hashmaps, and note that prune() is not sensitive then to multiple threads but isRepeated() is. You could synchronize the last case. There is a chance that two threads concurrently enter the method with the same new user and the same query, so both will exit the method where the isRepeated() returns FALSE where it should return true. Given the nature of the
problem it does not seem such a problem.
There is a chance that when prune() is called, isRepeaded() is also running, at the same time removing and adding a user and his query, but case was not specified anyway in the question.

So: what ds and implementation did you choose?

All of the world's problems can be solved in a garden - Geoff Lawton. Tiny ad: