6 Answers
6

We use the substitution command on the whole file to change pattern into string :

:%s/pattern/string/

Here pattern is ^\(.*\)\(\n\1\)\+$ and string is \1.

pattern can be broken down like this :

^\(subpattern1\)\(subpattern2\)\+$

^ and $ match respectively a beginning of line and an end of line.

\( and \) are used to enclose subpattern1 so that we can refer to it later by the special number \1.
They are also used to enclose subpattern2 so that we can repeat it 1 or more times with the quantifier \+.

subpattern1 is .*. is a metacharacter matching any character except new line and * is a quantifier that matches the last character 0, 1 or more times.
So .* matches any text containing no new line.

subpattern2 is \n\1\n matches a new line and \1 matches the same text that was matched inside the first \(, \) which here is subpattern1.

So pattern can be read like this :a beginning of line (^) followed by any text containing no new line (.*) followed by a new line (\n) then the same text (\1), the latter two being repeated one or more times (\+), and finally an end of line ($).

Wherever pattern is matched (a block of identical lines), the substitution command replaces it with string which here is \1 (the first line of the block).

If you want to see which blocks of lines will be affected without changing anything in your file, you can enable the hlsearch option and add the n substitution flag at the end of the command :

:%s/^\(.*\)\(\n\1\)\+$/\1/n

For more granular control, you can also ask for a confirmation before changing each block of lines by adding the c substitution flag instead :

:%s/^\(.*\)\(\n\1\)\+$/\1/c

For more information on the substitution command read :help :s,
for the substitution flags :help s_flags,
for the various metacharacters and quantifiers read :help pattern-atoms,
and for regular expressions in vim read this.

Edit: Wildcard fixed a problem in the command by adding a $ at the end of pattern.

Also BloodGain has a shorter and more readable version of the same command.

Nice; your command needs a $ in it, though. Otherwise it will do unexpected things with a line that starts with identical text to the previous line, but has some other trailing characters. Also note that the basic command you gave is functionally equivalent to my answer of :%!uniq, but the highlight and confirmation flags are nice.
– WildcardNov 5 '15 at 21:12

You're right, I've just checked and if one of the duplicate lines contain a different trailing character, the command doesn't behave like expected. I don't know how to fix it, the atom \n matches an end of line and should prevent this but it doesn't. I tried adding a $ just after .* with no success. I'm going to try and fix it, but if I can't, maybe I will delete my answer or add a warning at the end. Thank you for pointing this problem.
– saginawNov 5 '15 at 21:37

You should consider that $ matches end of string, not end of line. This is technically not true—but when you put characters after it other than a few exceptions, it matches a literal $ instead of anything special. So using \n is better for multi-line matches. (See :help /$)
– WildcardNov 5 '15 at 22:05

I think you are right in that \n can be used anywhere inside the regex whereas $ should probably be used only at the end. Just to make a difference between the two, I've edited the answer by writing that \n matches a newline (which instinctively makes you think that there is still some text after) whereas $ matches an end of line (which makes you think that there is nothing left).
– saginawNov 5 '15 at 22:24

As with saginaw's answer, this uses Vim's :substitute command. However, it takes advantage of a couple of extra features to improve readability:

Vim lets us use any non-alphanumeric ASCII character except backslash (\), double-quote ("), or pipe (|) to divide our match/replace/flags text. Here, I selected semicolon (;), but you can choose another.

Vim provides "magic" settings for regular expressions, so that characters are interpreted for their special meanings instead of requiring a backslash escape. This is helpful to reduce verbosity, and because it is more consistent than the "nomagic" default. Starting with \v means "very magic," or all characters except alphanumeric (A-z0-9) and underscore (_) have special meaning.

The meaning of the components are:

%for the whole file

ssubstitute

;begin substitute string

\v"very magic"

^beginning of line

(.*)0 or more of any character (group 1)

(\n\1)+newline followed by (group 1 match text), 1 or more times (group 2)

$end of line (or in this case, think next character must be a newline)

I really like your answer, because it's more readable but also because it made me better understand the difference between \n and $. \n adds something to the pattern : the character new line that tells vim that the following text is on a new line. Whereas $ doesn't add anything to the pattern, it simply forbids a match to be made if the next character outside of the pattern isn't a new line. At least, it's what I've understood by reading your answer and :help zero-width.
– saginawNov 6 '15 at 14:35

And the same must be true for ^, it doesn't add anything to the pattern, it just prevents a match to be made if the previous character outside of the pattern is not a new line...
– saginawNov 6 '15 at 14:37

@saginaw You have it exactly right, and that's a good explanation. In regular expressions, some characters can be though of as control characters. For example, + means "repeat the preceding expression (character or group) 1 or more times," but does not match anything itself. The ^ means "cannot start in the middle of the string" and $ means "cannot end in the middle of the string." Notice I didn't say "line," but "string" there. Vim treats each line as a string by default -- and that's where \n comes in. It tells Vim to consume a newline to try to make this match.
– BloodgainNov 6 '15 at 18:52

If you want to remove ALL adjacent identical lines, not just Hold, you can do it extremely easily with an external filter from within vim:

:%!uniq (in a Unix environment).

If you want to do it directly in vim, it's actually very tricky. I think there is a way, but for the general case it is very tricky to make it 100% functional and I haven't worked out all the bugs yet.

However, for this specific case, since you can visually see that the next line that is non-duplicate doesn't start with the same character, you can use:

:+,./^[^H]/-d

The + means the line after the current line. The . refers to the current line. The /^[^H]/- means the line before (-) the next line that doesn't start with H.

While the substitute and global Vim commands are good exercises, calling uniq (either from within vim or using the shell) is how I would solve this. For one thing, I'm pretty sure uniq will handle lines that are blank/all spaces as equivalent (didn't test it), but that would be much harder to capture with a regex. It also means not "reinventing the wheel" while I'm trying to get work done.
– BloodgainNov 6 '15 at 19:06

2

The ability to feed text through external tools is why I usually recommend Vim and Cygwin on Windows. Vim and shell simply belong together.
– DevSolarDec 3 '15 at 10:33