File Portability

Fortunately for Subversion users who routinely find
themselves on different computers with different operating
systems, Subversion's command-line program behaves almost
identically on all those systems. If you know how to wield
svn on one platform, you know how to wield it
everywhere.

However, the same is not always true of other general classes
of software or of the actual files you keep in Subversion. For
example, on a Windows machine, the definition of a “text
file” would be similar to that used on a Linux box, but
with a key difference—the character sequences used to mark
the ends of the lines of those files. There are other
differences, too. Unix platforms have (and Subversion supports)
symbolic links; Windows does not. Unix platforms use filesystem
permission to determine executability; Windows uses filename
extensions.

Because Subversion is in no position to unite the whole
world in common definitions and implementations of all of these
things, the best it can do is to try to help make your life
simpler when you need to work with your versioned files and
directories on multiple computers and operating systems. This
section describes some of the ways Subversion does this.

File Content Type

Subversion joins the ranks of the many applications that
recognize and make use of Multipurpose Internet Mail
Extensions (MIME) content types. Besides being a
general-purpose storage location for a file's content type,
the value of the svn:mime-type file
property determines some behavioral characteristics of
Subversion itself.

Identifying File Types

Various programs on most modern operating systems make
assumptions about the type and format of the contents of a
file by the file's name, specifically its file extension.
For example, files whose names end in
.txt are generally assumed to be
human-readable; that is, able to be understood by simple perusal
rather than requiring complex processing to decipher. Files
whose names end in .png, on the other
hand, are assumed to be of the Portable Network Graphics
type—not human-readable at all, and sensible only when
interpreted by software that understands the PNG format and
can render the information in that format as a raster
image.

Unfortunately, some of those extensions have changed
their meanings over time. When personal computers first appeared,
a file named README.DOC would have
almost certainly been a plain-text file, just like today's
.txt files. But by the mid-1990s, you
could almost bet that a file of that name would not be a
plain-text file at all, but instead a Microsoft Word document
in a proprietary, non-human-readable format. But this
change didn't occur overnight—there was certainly a
period of confusion for computer users over what exactly
they had in hand when they saw a .DOC
file.
[10]

The popularity of computer networking cast still more
doubt on the mapping between a file's name and its content.
With information being served across networks and generated
dynamically by server-side scripts, there was often no real
file per se, and therefore no filename. Web
servers, for example, needed some other way to tell browsers
what they were downloading so that the browser could do something
intelligent with that information, whether that was to
display the data using a program registered to handle that
datatype or to prompt the user for where on the client
machine to store the downloaded data.

Eventually, a standard emerged for, among other things,
describing the contents of a data stream. In 1996, RFC 2045
was published. It was the first of five RFCs describing
MIME. It describes the concept of media types and subtypes
and recommends a syntax for the representation of those
types. Today, MIME media types—or “MIME
types”—are used almost universally across
email applications, web servers, and other software as the
de facto mechanism for clearing up the file content
confusion.

For example, one of the benefits that Subversion typically
provides is contextual, line-based merging of changes received
from the server during an update into your working file. But
for files containing nontextual data, there is often no
concept of a “line.” So, for versioned files
whose svn:mime-type property is set to a
nontextual MIME type (generally, something that doesn't begin
with text/, though there are exceptions),
Subversion does not attempt to perform contextual merges
during updates. Instead, any time you have locally modified a
binary working copy file that is also being updated, your file
is left untouched and Subversion creates two new files. One
file has a .oldrev extension and contains
the BASE revision of the file. The other file has a
.newrev extension and contains the
contents of the updated revision of the file. This behavior
is really for the protection of the user against failed
attempts at performing contextual merges on files that simply
cannot be contextually merged.

Warning

The svn:mime-type property, when set
to a value that does not indicate textual file contents, can
cause some unexpected behaviors with respect to other
properties. For example, since the idea of line endings
(and therefore, line-ending conversion) makes no sense when
applied to nontextual files, Subversion will prevent you
from setting the svn:eol-style property
on such files. This is obvious when attempted on a single
file target—svn propset will error
out. But it might not be as clear if you perform a
recursive property set, where Subversion will silently skip
over files that it deems unsuitable for a given
property.

Beginning in Subversion 1.5, users can configure a new
mime-types-file runtime configuration
parameter, which identifies the location of a MIME types
mapping file. Subversion will consult this mapping file to
determine the MIME type of newly added and imported
files.

Also, if the svn:mime-type property is
set, then the Subversion Apache module will use its value to
populate the Content-type: HTTP header when
responding to GET requests. This gives your web browser a
crucial clue about how to display a file when you use it to
peruse your Subversion repository's contents.

File Executability

On many operating systems, the ability to execute a file
as a command is governed by the presence of an execute
permission bit. This bit usually defaults to being disabled,
and must be explicitly enabled by the user for each file that
needs it. But it would be a monumental hassle to have to
remember exactly which files in a freshly checked-out working
copy were supposed to have their executable bits toggled on,
and then to have to do that toggling. So, Subversion provides
the svn:executable property as a way to
specify that the executable bit for the file on which that
property is set should be enabled, and Subversion honors that
request when populating working copies with such files.

This property has no effect on filesystems that have no
concept of an executable permission bit, such as FAT32 and
NTFS.
[11]
Also, although it has no defined values, Subversion will force
its value to * when setting this property.
Finally, this property is valid only on files, not on
directories.

This means that by default, Subversion doesn't pay any
attention to the type of end-of-line (EOL)
markers used in your files. Unfortunately,
different operating systems have different conventions about
which character sequences represent the end of a line of text
in a file. For example, the usual line-ending token used by
software on the Windows platform is a pair of ASCII control
characters—a carriage return (CR)
followed by a line feed (LF). Unix
software, however, just uses the LF
character to denote the end of a line.

Not all of the various tools on these operating systems
understand files that contain line endings in a format that
differs from the native line-ending
style of the operating system on which they are
running. So, typically, Unix programs treat the
CR character present in Windows files as a
regular character (usually rendered as ^M),
and Windows programs combine all of the lines of a Unix file
into one giant line because no carriage return-linefeed (or
CRLF) character combination was found to
denote the ends of the lines.

This sensitivity to foreign EOL markers can be
frustrating for folks who share a file across different
operating systems. For example, consider a source code
file, and developers that edit this file on both Windows and
Unix systems. If all the developers always use tools that
preserve the line-ending style of the file, no problems
occur.

But in practice, many common tools either fail to
properly read a file with foreign EOL markers, or
convert the file's line endings to the native style when the
file is saved. If the former is true for a developer, he
has to use an external conversion utility (such as
dos2unix or its companion,
unix2dos) to prepare the file for
editing. The latter case requires no extra preparation.
But both cases result in a file that differs from the
original quite literally on every line! Prior to committing
his changes, the user has two choices. Either he can use a
conversion utility to restore the modified file to the same
line-ending style that it was in before his edits were made,
or he can simply commit the file—new EOL markers and
all.

The result of scenarios like these include wasted time
and unnecessary modifications to committed files. Wasted
time is painful enough. But when commits change every line
in a file, this complicates the job of determining which of
those lines were changed in a nontrivial way. Where was
that bug really fixed? On what line was a syntax error
introduced?

The solution to this problem is the
svn:eol-style property. When this
property is set to a valid value, Subversion uses it to
determine what special processing to perform on the file so
that the file's line-ending style isn't flip-flopping with
every commit that comes from a different operating
system. The valid values are:

native

This causes the file to contain the EOL markers
that are native to the operating system on which
Subversion was run. In other words, if a user on a
Windows machine checks out a working copy that
contains a file with an
svn:eol-style property set to
native, that file will contain
CRLF EOL markers. A Unix user
checking out a working copy that contains the same
file will see LF EOL markers in his
copy of the file.

Note that Subversion will actually store the file
in the repository using normalized
LF EOL markers regardless of the
operating system. This is basically transparent to
the user, though.

CRLF

This causes the file to contain
CRLF sequences for EOL markers,
regardless of the operating system in use.

LF

This causes the file to contain
LF characters for EOL markers,
regardless of the operating system in use.

CR

This causes the file to contain
CR characters for EOL markers,
regardless of the operating system in use. This
line-ending style is not very common.

[10] You think that was rough? During that same era,
WordPerfect also used .DOC for their
proprietary file format's preferred extension!

You are reading Version Control with Subversion (for Subversion 1.6), by Ben Collins-Sussman, Brian W. Fitzpatrick, and C. Michael Pilato.
This work is licensed under the Creative Commons Attribution License v2.0.
To submit comments, corrections, or other contributions to the text, please visit http://www.svnbook.com/.