XSum

XSum - SUM with error compensation
The accuracy of the sum of floating point numbers is limited by the truncation error. E.g. SUM([1e16, 1, -1e16]) replies 0 instead of 1 and the error of SUM(RANDN(N, 1)) is about EPS*(N / 10).
Kahan, Knuth, Dekker, Ogita and Rump (and others) have derived some methods to reduce the influence of rounding errors, which are implemented here as fast C-Mex: XSum(RANDN(N, 1), 'Knuth') is exact to all 15 digits.

OUTPUT:
Y: Double array, equivalent to SUM, but with compensated error depending
on the Method. The high-precision result is rounded to double precision.

METHODS: (speed and accuracy compared to SUM)
- Double: A single threaded implementation of Matlab's SUM. At least in Matlab 2008a to 2009b the results of the multi-threaded SUM can differ slightly from call to call. Equivalent accuracy. 1.1 to 2 times slower than SUM.
- Long: Accumulated in a 80 bit long double, if the compiler support this (e.g. LCC v3.8). 3.5 more valid digits, 2 times slower.
- Kahan: The local error is subtracted from the next element. 1 to 3 more valid digits, 2 to 9 times slower.
- Knuth: As if the sum is accumulated in a 128 bit float: about 15 more valid digits. 1.4 to 4 times slower. This is suitable for the most real world problems.
- Knuth2: 30 more valid digits as if it is accumulated in a 196 bit float. 2 to 8 times slower.
- KnuthLong: As Knuth, but using long doubles to get about 21 more valid digits, if supported by the compiler. 2.5 times slower.

I should add that without all these string comparisons, which evidently works slightly differently in the mac c compiler, this runs beautifully (if I just decide to stick with the default Knuth method and comment out the part determining the method).

Thanks for the response, and for putting together a useful utility in the first place. I don't have your email address so let me try one more comment here. When I compile the new version, I get a single error message this time (I have the standard C compiler that comes with X-code, I can also install the Intel compiler if needed).

This looks potentially extremely useful, but does not compile for me (Matlab 2014a on a mac). The errors are "implicit declaration of function 'strncmpi' is invalid", "implicit declaration of function '_control87' is invalid", and the use of use of undeclared identifiers, such as 'MCW_PC', 'PC_24' etc. Any help would be appreciated, though it is understandable if this works only for older versions of Matlab. Apologies also if these are simple issues, my C is rusty (non-existent, really).

This is just awesome. I cannot comment in detail as I am not an expert of the underlying methodology. However for me it just worked fine in returning finite results for sums of huge numbers. Thanks very much!

@Ahmed: Perhaps strncmpi will work instead of strnicmp. As far as I understood from a diskussion about another C-mex file, Linux drives the processor with 80-bit precision ever. In addition some Windows compilers do not benefit from the higher precision, e.g. MSVC. So I suggest to disable the SetPrecision() subfunction completely under Linux.
I'm going to inlcude this is a new submission soon.

Hints on compiling in Linux would be appreciated, from Jan or others. After redefining strnicmp with strcmp, gcc 4.4.1 fails because of undeclared symbols, PC_24, PC_64, PC_53, _control87, and MCW_PC. Thanks.

Dear Bruno! Thanks for the comment!
You've observed, that your 32 bit system has a higher accuracy than your 64 bit system for "Double" and "Long". (For the MSVC++ 2008 compiler long doubles are doubles, so both methods reply the same result.)
It seems, that your 32 bit system accumulates the sum (e.g. in DoubleDim1()) in a 80 bit register, while your 64 bit system applies standard 53 bit rounding. Please check your mexopts.bat files for the compiler flags /arch:SSE2 and /fp:[precise | fast | strict]. Perhaps a "#pragma fenv_access (off)" in a header?!

You hit two important points:
1. The results of the trivial "for (i = 0; i < N; Sum += X[i++])" depends critically on the compiler directives. I've tested XSum with LCC2.4, LCC3.8, BCC5.5, OpenWatcom 1.8 and MSVC 2008 SP1 and found 5 different behaviours for "Double", "Long" and "KnuthLong". The accuracy of "Knuth" and "Knuth2" was not significantly affected by the compilers in my tests.
2. If the compiler has such an influence, it is hard to assess the quality of the results automatically. I did not found a way to let the test function draw user-friendly conclusions about the accuracy.

Summation is numerically instable and even the trivial algorithms of XSum(Double) and XSum(Long) are fragile if different compilers and floating point environments are applied (I have to stress this more explicite in the help. Thanks!). Therefore I use XSum(Knuth) and XSum(Knuth2) to improve the stability.