Question

Assume you are given a string, write a function that can remove consecutive repeated words in the string and return it. For example,

"Beginning with the first first first first manned Gemini mission in March 1965, commemorative commemorative medallions were prepared for the astronauts at their request. It is unclear who prepared these early medallions only only, only that each individual box containing a medallion bore the word Fliteline."

to

"Beginning with the first manned Gemini mission in March 1965, commemorative medallions were prepared for the astronauts at their request. It is unclear who prepared these early medallions only, only that each individual box containing a medallion bore the word Fliteline."

Make sure you preserving punctuations in the right positions.

Solution

I have written following Java code with various methods that solve variations of above question -

Note

Couple of points to note. Prefer removeConsecutiveDuplicatesBetter approach as it does not keep all data in memory. So lets say you have a huge file you need to do this operation on you read it line by line and then tokenize it to further process it.

Also note HashSet has add method which returns false if data is already present in the set.

Stack method is more if you want to retain the duplicate consecutive words which are farthest in a line. Irrespective of your approach you need to take care of following -

Don't keep all the data in the memory (data can be very huge). Using Stack may led to overflow.

Use minimum space and more efficient lookup. Hashset internally used HashMap to keep the data hence put/get is ~O(1).

Avoid String concatenation. Use StringBuilder instead.

As you must have already noticed above approaches do not take care of punctuations. You will need separate processing for that.