Conceptually, coverage.py is pretty simple.
First, using the sys.settrace
facility in Python, record every line that is executed. Then, after
the program is done, report on those lines, and especially on lines
that could have been executed but were not.

Of course, the reality is more difficult. During execution, to record
the line, we have to find the file name, which we get from the stack
frame. Later, we look for that file by name to create the report.
Sometimes, the file isn't a Python file!

One reason this can happen is if the file was actually created by a
tool, and the tool provides the original source file as the reported
name. For example, Jinja
compiles .html files to Python code, and when the code is running, it
claims to be "mytemplate.html". When coverage.py tries to report on
the file, it can't parse it as Python, and things go wrong.

Originally, this error would be reported to the user. There's a -i
switch that shuts off all errors like this, but it seemed dumb for
coverage.py to get confused by something like this. So I changed it
to not trace files named "*.html".

Of course, the world is more varied than that, so I got a
report
of someone with Jinja2 files named "*.jinja2" which now trip the error.
So I need a more general solution.

I figure there are a couple of possibilities:

Don't measure files at all if they have an extension that isn't
".py". This will let us measure extension-less files, and .py
files, and will ignore all the rest, on the theory that any other
extension implies that we won't be able to parse it later anyway.

Measure all files, but during reporting, if a file can't be parsed,
ignore the error if it has an extenstion that isn't "*.py".

(Shudder) Make a configuration option about what extensions to
measure, or which to ignore.

Some people want "ignore errors" to be the default, but if a file
is missing for some reason, it's important to know, because it will
throw off the reporting, and that shouldn't happen silently.

Do people ever name their Python source files something other than
"*.py"? Are there weird ecosystems like this that I'll only hear about
if I make one of these changes?

Comments

Option (2) seems like much the best: it makes a best effort to produce useful output, doesn't bother the user with pointless error messages, and doesn't require a configuration option.

There are a couple of good reasons to give your Python file a non-standard extension: (a) because of an extension-based policy, for example on a web server where only files with the .cgi extension get executed as CGI scripts; (b) for command-line tools where the user prefers to type "foo" rather than "foo.py".

How about option (5) — measure all files; try to parse them as Python; if that fails, report naïve (line-based rather than code-based) coverage metrics for them. This might give useful results even for Jinja's .html templates.

Tornado's templates generate code similarly to Jinja's, but we set the fake filename to "mytemplate.generated.py" (after several iterations) because this gives the best stack traces on errors. However, since these files never exist on disk we have to turn on ignore_errors in our coverage reports (this is different from the jinja issue, where the file exists but is not python). A narrower version of ignore_errors might be nice, either a filename filter (as in #3), or the option to ignore files that don't exist without ignoring other errors.

Another more ambitious option would be to grab the generated source at runtime: Tornado templates support the PEP 302 loader protocol so linecache works on them.

James Thiele 1:50 PM on 28 Mar 2012

I use/distribute executable python files without the *.py by using the "#!/usr/bin/env python" idiom. So please have any solution take into account that a file may be python code without any extension.

What about projects that use config files that are just Python source files rather than yaml or xml or whatever? I know that's fairly common. Usually they just use the .py extension though rather than another extension but not always.

All: When I said, "if it has an extension that isn't *.py", I wasn't including "no extension" in that. Extension-less files are safe!

@Artem: "hooks" are an interesting idea, but I don't know if tool makers would be able to perform the back-mapping.

Anyone else have specific cases of files with unusual extensions?

Adam Collard 12:44 AM on 29 Mar 2012

BuildBot has master.cfg which is Python, not sure it's useful to measure coverage in though. There are classes of "configuration" files which are actually Python in disguise.

Lennart Regebro 2:42 AM on 29 Mar 2012

To me it seems that if it can't parse a file, it should output an error message saying "File could not be parsed, it seems that it does not contain valid Python". If you have a lot of these, you could shut them up with -i.

Possibly you could treat files without extension or an extension starting in ".py" differently, but I'm not sure there is a need.

Lorenzo Gatti 3:40 AM on 30 Mar 2012

File extensions fall into three classes:
1) Must be Python code, exactly like ".py" files, and it should be reported as an error if it cannot be parsed satisfactorily (unless silenced by general error suppression). For example: ".py2" & ".py3" or similar conventions, fancy extensions for application scripts or CGI-like setups.
2) Could be Python code or not; there's no way to tell in advance, and no error should be reported if it isn't. For example: files with no extension, which on POSIX systems might be executable Python scripts or executable scripts for some other interpreter or something completely different.
3) It isn't expected to be Python code, never try to parse it as it would be a waste of time. For example, the mentioned extended-HTML templates.

I suggest a safe default (".py" in class 1, extension-less in class 2, anything else in class 2 or class 3) and two or three optional commandline options to override the default (maybe "-pythonextension", "-maybepythonextension", "-nonpythonextension").

This policy about the "deluxe" treatment of Python sources could be combined with option 4 (reporting missing files), as checking that a file exists and contains the lines referenced in Python bytecode doesn't require parsing it. Another commandline option would be needed to reverse the default.