(Edit: Please make sure that you check out this link and also this one and this. The gist of these discussions is that letting Rust automatically close a
file descriptor without checking the return value of the close system call is not a good idea from the
perspective of building reliable systems - I had not thought about it while writing this post).

Garbage Collection is
one of the big ideas in the field of programming languages. Languages which employ some
form of GC (Java, Python, Ruby, almost all modern languages) free the programmer from the
tedious and error prone task of manually allocating and deallocating memory (the kind of stuff
you do using malloc/free in C). But memory is not the only resource which a program has to
manage - open file descriptors, network sockets, temporary files etc are also resources which
have to be managed properly to prevent leaks.
Unfortunately, GC is not a solution to this problem; it is easy to leak resources in a GC’d
language. We will see that the compile time strategies adopted by Rust provide a more elegant
solution to the general problem of acquiring and releasing all kinds of resources.

A Python program which leaks file descriptors

# save this as leak.pydefuse_file():f=open('data.txt')whileTrue:use_file()

The open function in Python returns a high-level file object (or a so-called file handle) - this object will have
embedded in it a file descriptor. A file
descriptor is simply an integer value which system calls like read, write etc use to
access the file. The operating system (Linux, in this case) places a limit on the number
of open file descriptors which a program can have at any given point in time (this limit can
be seen and modified using the ulimit command - in my case, running ulimit shows me that
my program can only have a maximum of 1024 open file descriptors).

In the case of the above program, each invocation of the open function in Python will result
in a new file object (and an associated new file descriptor) getting created. If you call
the close method of a file object:

f.close()

the associated file descriptor will get closed and the operating system can re-use this file
descriptor the next time.

But we are NOT calling f.close() and our program is still working properly! This seems to be
strange!

The answer lies in something which is purely an implementation detail of the CPython (the most commonly used implementation of Python) virtual machine.
CPython performs garbage collection
using a strategy called reference counting. The idea
behind refcounting is simple - all Python objects have a count associated with them which gets incremented
when a new reference to the object is created and gets decremented when that reference is gone.

a=[1,2,3]# refcount = 1, only one reference "a"b=a# refcount = 2, both "a" and "b" point to the same listc=a# refcount = 3c=0# refcount is now 2b=0# refcount is 1a=0# refcount is 0, the list [1,2,3] now gets deallocated

In the case of our program which keeps on opening a file in an infinite loop,
the only reference to the high-level file object, the variable f, disappears
when the function use_file terminates - this will result in the memory
associated with the file object getting deallocated. CPython will also close the
associated file descriptor at the same time.

The key idea here is that this is an implementation detail of the CPython virtual
machine. The Python language doesn’t guarantee that the file descriptor will get
closed deterministically unless you call f.close().

The correct implementation of use_file should look like this:

defuse_file():f=open('data.txt')# do something with "f"f.close()

Or, even better:

defuse_file():withopen('data.txt')asf:#do something with fpass

Python guarantees that f.close() is always called when the with block
terminates.

Experimenting with PyPy

PyPy is an alternative implementation of Python focused
on speed. Let us run leak.py using PyPy:

PyPy does not use reference counting - it uses some other garbage
collection strategies which are not guaranteed to free up an object
as soon as no references are pointing to it. The program basically uses
up all available descriptors before the garbage collector steps in.

Rewriting the code in Rust

The file handle f is a Rust structure and it holds a file descriptor. The ownership rules of Rust guarantees that the structure is dropped at
the point where the variable f goes out of scope. A destructor function gets called automatically and this function takes care
of closing the file descriptor associated with the file handle.

The clean up happens deterministically. The code required to perform the clean up is inserted by the compiler
into the executable precisely at the point where the variable goes out of scope - and this is completely determined
at compile time.

Moving a file handle

usestd::fs::File;fnuse_file(f:File){// do something with f}fnmain(){letf=File::open("data.txt").unwrap();use_file(f);println!("good bye!");}

Rust move semantics guarantees that f is no longer usable in main - the ownership of the file handle has
effectively been transferred to f in the use_file function. At the point where this function ends, f
goes out of scope and the associated file descriptor is closed.

Running strace on the executable generated by the above program shows that this is indeed the
case:

Cloning a file handle

What if you wish to share a file handle between two functions in such
a way that both handles refer to exactly the same offset in exactly the same
file?

A solution is to have one function borrow the file handle from the other one.

Here is another way to do this:

usestd::fs::File;fnuse_file(f:File){// do something with f}fnmain(){letf=File::open("data.txt").unwrap();use_file(f.try_clone().unwrap());println!("good bye!");}

The try_clone function creates a clone of the original file handle. So
we now have two independent file handles: one in the main function and the
other one in use_file.

Both the file handles will refer to exactly the same location in the file
data.txt. If you read 5 bytes in use_file and then try to read 3 bytes in main,
you are sure to get the next 3 bytes from the file.

This means the file descriptors embedded in both file handles should be the same.

But there is a problem. When use_file terminates, f gets dropped and the file
descriptor is closed.

That means you will not be able to read from the file in main.

Also, when main terminates, the f in main gets dropped; this will result in an attempt
to close an already closed descriptor.

How does Rust solve the problem? We have strace to our rescue. Let’s run the executable
under strace and examine the output:

The file descriptor 3 gets duplicated and we now have another file descriptor, 4,
which refers to exactly the same offset of the file referred to by descriptor 3.

File descriptors 3 and 4 are two independent descriptors which share a single
offset to the same file - both can be used independently
and also closed independently. Closing 4 doesn’t affect 3. You can still access the
file using 3 and close it once you are finished.

That is exactly what happens here. The cloned file handle has embedded in it a
duped file descriptor!