os.scandir() opens a file descriptor and closes it only in its destructor. This can causes file descriptor leaks in Python implementations without reference counting and if the scandir iterator becomes a part of reference loop or long living object. Since the number of open file descriptors is limited, this can leads to problems.
We need to add the close() method to the scandir iterator (as in files and generators). It would be useful also to make it a context manager.
In 3.5 we have to add a warning about this behavior.

Hi,
If I recall correctly, this issue was discussed in the long review of os.scandir(): issue #22524.
"os.scandir() opens a file descriptor and closes it only in its destructor."
Hopefully, it's incorrect: it's closed when the iterator is exhausted. See how ScandirIterator_close() is used.
"This can causes file descriptor leaks in Python implementations without reference counting"
Destructors are part of the Python language. Are you aware of a Python implementation *never* calls destructors? I know that PyPy can call destructors "later", not necessary when the last reference to the object is removed. Do you think that we may reach the file descriptor limit because of that?
"We need to add the close() method to the scandir iterator (as in files and generators)."
Is it part of the iterator protocol? Or do you mean to explicitly call close()?
"It would be useful also to make it a context manager."
If we decide to add a close() method, it like the idea of also supporting the context manager protocol.
"In 3.5 we have to add a warning about this behavior."
Yeah, maybe we can elaborate the doc to explain how the file descriptor / Windows handle is used.

I'm not sure this is actually a leak, because (looking at the code) ScandirIterator_close() is called not just in the destructor, but also at the end of iterating, just before raising StopIteration. Is the issue that if an exception is raised or you stop iterating before the end, then it's less determined when the destructor/close is called?
I think we could fairly easily add an explicit close method to the iterator and make it a context manager (add __enter__ and __exit__ methods to the iterator). Is this what you're thinking of:
# manual close
it = scandir.scandir(path)
first_entry = next(it)
it.close()
with scandir.scandir(path) as it:
first_entry = next(it)

Yes, it's what I mean. Add methods close, __enter__ and __exit__ to the iterator. The scandir iterator is not just iterator, it is like file object.
And as in file object, we perhaps have to emit a ResourceWarning in the destructor if close() or __exit__() were not called.

Note there's also a nasty corner case related to generators and GC. If a generator contains a with-block or finally-clause, and the generator is not run until its end because the caller hit an exception on one of the items returned, and the generator object is somehow kept alive (either because it's stored in some longer-living state or because the Python implementation doesn't use reference counting) then the close() call in the finally-clause or in the __exit__ method doesn't run until the generator object is garbage-collected.
IOW even making it a context manager doesn't completely solve this issue (though it can certainly help).

Guido are you saying in the following code, the “finally” message is not guaranteed to be printed out? Or just that you cannot limit a ResourceWarning to garbage collection?
def g():
try:
yield "item"
finally:
# Run at exhaustion, close(), and garbage collection
print("finally")
gi = g()
try:
item = next(gi)
print(item / 2) # Oops, TypeError
finally:
# Should be run as the exception passes through. GeneratorExit is raised inside the generator, causing the other “finally” block to execute. All before the original exception is caught and a traceback is printed.
gi.close()
But as I understand it, os.scandir() is not a generator, so none of these problems would apply.

It was a bit more subtle. I think like this:
def f():
with some_lock:
yield 0
yield 1
def g():
with another_lock:
it = f()
for i in it:
raise
We determined that another_lock was freed *before* some_lock. This is because the local variable 'it' keeps the generator object returned by f() alive until after the with-block in g() has released another_lock.
I think this does not strictly require f to be a generator -- it's any kind of closable resource that only gets closed by its destructor.
The solution is to add a try/finally that calls it.close() before leaving the with-block in g().
Hope this helps.

I definitely support adding a close() method, and a ResourceWarning if the iterator is not exhausted when it is deallocated. Adding context manager support is probably worthwhile as well, though there is one disadvantage: it would be hard to implement scandir() as a plain generator, because generators don’t support the context manager protocol. But C Python’s implementation is in C, and even in native Python you could implement it as a custom iterator class.
I suggest using this issue for adding the new API(s), presumably to 3.6 only, and Issue 26111 for adding a warning to the existing 3.5 documentation.

Since the close() method can be added only in 3.6, but we have leaks in 3.5 too, I suggest to closed file descriptor if error happened during iteration (issue26117). This will allow to write safe code even in 3.5.

Contextlib.closing() cannot be used in general, because it doesn’t inherit the iterator protocol from the wrapped generator. So I think you really need a class that implements __exit__(), __iter__(), etc at the same time.

Context manager protocol, close() method: it looks more and more like a file. In this case, I suggest to emit a ResourceWarning in the destructor if it's not closed explicitly. It mean that the scandir() must always be used with "with" or at least that close() is always closed.
We probably need to update the code to be more explicitly on that.

Adding a ResourceWarning even if the generator is run to completion? That seems... dev hostile. I mean, yes, probably best to document it as best practice to use with with statement, but something simple like `files = sorted(os.scandir('.'), key=lambda x: x.stat().st_mtime)` to get files ordered by modification time (which cleanly runs the generator to exhaustion immediately) should not be raising ResourceWarning in 3.6 when it didn't do so in 3.5, and has minimal risk of leaking in any event.

I would be in favour of adding a ResourceWarning in 3.6 if the iterator is garbage collected without being exhausted. But as Josh says, it might be overkill emitting a warning when we already know the iterator has been finished and cleaned up.

Josh Rosenberg added the comment:
> Adding a ResourceWarning even if the generator is run to completion?
I'm ok to only emit the warning is the generator is not exhausted.
The warning would be emited in the destructor if the generator is not
closed. In practice, the generator is *already* closed on the last
file, so the warning would not be emited if the generator is
exhausted.

About testing that list(iterator) is empty after the iterator is closed, IMO this is an implementation detail. It would be equally (or more) sensible to raise an exception, like proposed for “async def” coroutines in Issue 25887. I suppose the main purpose of those tests is to ensure there is no leakage or warning, so maybe add some more filterwarnings() checks in those tests instead?
I left some other review comments.