Why do you use perl for unit tests?

The "functional tests" (to keep them distinct from the "unit tests" which use NUnit) use Perl for a few reasons. Mostly historical, but also practical.

The main driver script RunAll.pl, has roots going back to the '90s, and some form of it has been used for testing C#, VB, F#, and other languages at Microsoft for many years. In fact, it was borrowed from the C# team when F# first started making a home in the
Visual Studio organization around VS 2010 timeframe - that's what the C# and VB team used, so it made sense for F# to use that same thing. With Roslyn, C# and VB made a clean break and rebuilt their entire testbed in XUnit, which is what you see today in their
open source repo.

Of course, it might seem odd to use Perl for testing today, for a managed language. We are well aware of various actual or perceived shortcomings:

It isn't "fashionable"

Mostly just a perception issue, but when fewer developers have experience with the language, you get real issues such as...

It's difficult to maintain

The intersection of experienced F# and Perl developers isn't especially large, so fixing or updating our test infra isn't easy. Not to mention RunAll.pl itself is in general a house of horrors...

It's slow

Almost totally a perception issue, at least for our purposes. The overall design of the tests is what makes execution slow, not the Perl driver.

It doesn't natively integrate with the technology under test

It would make things much simpler if we could use F#/.NET components directly in the test infrastructure.

Using Perl does afford us some legitimate upsides:

It's cross-platform

With a few environment tweaks, the majority of the FSharpQA test suite can be (and has been) run on Linux/Mono using the same Perl script drivers.

It allows for fast iteration

Using a script-based system allows us to edit and re-execute tests much more quickly than if all tests were compiled into an NUnit or XUnit test assembly. The "core unit tests" today take about ~10 seconds to compile on my (pretty strong) laptop.
The set of functional tests is 3-4 times larger in terms of number of test cases, but those test cases are generally much more involved. Compiling them all into an NUnit-style test assembly would take at least 2-3 minutes, probably more.

The one reason why I asked was the additional dependency. Of course this alone is not really a problem, but there are really a lot of additional steps for this project. From your comment I get this might not be the easiest part to improve ;-)

It's a priority to make our repo usable for developers, and that includes an effective test system. The current system is useable, but not easy to get started with. Migrating to a better (and standard, hopefully) system is definitely a possibility (and
something we have been looking at internally for a while), but will take time.

For now, to address your individual comments:

The current "unit tests" are focus on validating the runtime functionality of what's exposed by FSharp.Core. Runtime functionality is only one slice of what's captured by the phrase "basic tests". Validating syntax, compiler
diagnostics, generated IL, execution via FSI vs compiled, etc are still probably "basic tests", but don't really fit in the current unit tests. But yes, it's definitely useful to fill in gaps in runtime functionality testing, as your recent PRs have
been doing, and that indeed belongs in the unit tests.

Eventually even "complicated tests" could be handled by some kind of unit test framework, even if that means shelling out to start external scripts. But for now, the current system does ok.

Using a standard test driver (e.g. nunit, xunit) would be useful as an eventual goal, due to widespread familiarity, support, potential integration into IDEs, etc. I don't personally see much to be gained from replacing perl + custom perl scripts with a
new custom .NET driver with 100% duplicate behavior. I'd rather see us put that effort into migrating to a standard framework.

The test runner drops detailed failure logs to disk (FSharp_Failures.log, FSharpQA_Failures.log), but only outputs basic pass/fail info to the terminal. That's in line with most test frameworks, in my experience. Did you have a different scenario in mind?

The basic idea is to write a "dynamic" NUnit test assembly which does not actually include any of the code you want to test. Instead, you implement a TestCaseSource backed by code which discovers the tests at run-time (i.e., when the test assembly
is loaded by NUnit) and creates a TestCase for each one. Once that's done, you implement a test or tests which use this TestCaseSource and invoke each test as a separate process, checking the process' exit code (and it's stdout/stderr streams, if applicable)
to determine whether the test passes or fails.

This approach would be very compatible with the current approach, since it already launches the tests in a similar way via the command-line and checks the exit code to determine pass/fail. The benefits of my proposed approach over the existing approach are:

The test setup should be much less fragile than the existing setup.

It doesn't require Perl to be installed on the machine in order to run the tests. The only non-built-in tool required for the tests is the NUnit runner, which can even be fetched via NuGet.

Running the tests via NUnit means we'll be able to take advantage of the plethora of tools which provide NUnit integration (e.g., TeamCity). NUnit also outputs all pass/fail information into a single XML file, so it'll be easy to perform any additional
processing of the results if there's a need to do so.

Jack - This is very interesting. I was not aware of native support for dynamic generation of test cases in NUnit - do I understand correctly that your test DLL is loaded, the "Source" elements are constructed up front and executed (sniffing around
and returning some collection of test cases, which NUnit keeps track of and exposes via UI), then you can execute any of those individual tests on demand? If so, that's pretty slick :-)

That could be a really nice approach, thanks for bringing it up. The main thing keeping my wary of moving to full NUnit was the compilation cost, but this works around that issue nicely. I'm not sure I agree with all your advantages, though. I don't see why
it would be particularly more or less fragile than the current situation - it operates in the same basic way. And it's just replacing the requirement to install one 3rd-party tool with another. Though I grant you that NUnit is likely an easier pill to swallow
than Perl.

For our case, we would not want to execute test cases in separate processes. Requiring spinup and teardown of fsc.exe thousands of times was one of the big perf drags in the current system, which is now mostly mitigated by our hosted compiler infrastructure.
We'd want to keep everything in-proc as much as possible, and able to support parallel execution.

The big downside, of course, is that almost all of the Perl driver/parsing/execution code would need to be reimplemented in F#, which would take time and lead to instability for some period. But any overhaul will likely have a similar cost...

Lincon - Yes, that's correct. When your test assembly is loaded by NUnit, it'll see that you have a test method (marked with [<Test>]) which is also annotated with [<TestCaseSource(...)>]. The test runner creates an instance of the type you
specify as the argument to [<TestCaseSource(...)>], loads all of the test cases from it, then executes the test method once for each TestCase returned by the TestCaseSource. This means you can run arbitrary code to dynamically build up the list of test
cases you want to run against a test method. In the current test setup for Facio, I recursively traverse the directory structure of the TestCases folder in the Facio repository to find all of the *.fsl and *fsy files then return each one as a TestCase; the
test method constructs the command-line arguments for the tool from the data in the TestCase, then runs the tool against each file in a separate process and determines whether the test passes or fails based on the exit code of the process. This approach makes
it easy to build a test infrastructure where you can just drop in repro cases for bugs or stress tests for the compiler into some folder and they'll automatically be included in the next test run without having to do any additional work.

I made a mistake when I said the current setup for the fsharpqa tests was fragile. I confused them with the tests in the 'tests/fsharp' folder, which are constructed as a series of batch (*.bat) files; when these tests were merged in, I had a heck of a time
trying to get them to run correctly on my machine (they couldn't locate certain tool paths, for example), and even once they did, it wasn't straightforward to comprehend the results of the tests. The upside to the approach I described above (using NUnit) would
make it straightforward to have a single, robust test setup that incorporates any kind of tests you want to run -- whether they be standard unit tests compiled into the test assembly, snippets you want to run through a specific part of the compiler infrastructure,
or bug repro cases you want to invoke the full compiler on from the command line.

You don't have to execute the test cases in a separate process if you don't want to. The approach I'm proposing makes the test methods are generic (in the general sense of the word) and they're essentially lightweight test-runners in their own right; you could
implement two versions of the test method, one which runs all of the tests in-proc and another which runs each test in a separate process, then apply the [<Category(...)>] attribute to them (e.g., [<Category("InProcess")>], [<Category("OutOfProcess")>])
so you can choose which tests to run (or not) at run-time. As for running tests in parallel -- I don't think NUnit supports this yet, but xUnit and MbUnit have some support for it (though their support for dynamically discovering test cases doesn't seem to
be as complete as NUnit's).

I agree that the downside of all this is that it'd take some non-trivial amount of time and effort to implement such a setup. However, it would provide an excellent opportunity for the F# community to contribute to the compiler / core libraries, especially
since it doesn't require contributors to have a working knowledge of compiler implementation. IMO it's worth it overall, because in the end we'd have a much more streamlined approach to testing the compiler and libraries, which means it'll be easier to integrate
contributed repro cases into the test suite and more likely that everyone will run the full test suite (as they should) when contributing changes to the compiler.