Fractured YEARFRAC and Discounted DISC

There are a few things that a spreadsheet can get wrong. You can be hard to use. You can be ugly. You can be slow. You can even crash occasionally. These are all annoyances. With a monopoly, annoyances are a way of life. However, there is one place where a spreadsheet can never be wrong, and that is in the area of calculations, especially financial calculations. A business uses spreadsheets to model and track its business. It makes decisions based on these spreadsheets. If the spreadsheet gives the wrong answer, then the business makes the wrong decision. And a business that makes the wrong decisions does not stay in business for long.

So it was with much lament last year that I reported that OOXML, then under ballot in ISO/IEC JTC, had many egregious errors in its spreadsheet formula definitions. In addition to enumerating these errors in my blog, I submitted them for consideration by INCITS V1, the US SC34 mirror committee, and these became part of the bundle of comments with which the US accompanied its ballot.

Although the ballot wasn’t due until September 2nd, the extravagant “sit and do nothing” provisions in INCITS lead to our technical review being cut off in July. Because of the lack of review time, I was not able to do an in-depth review of all the spreadsheet functions, but only a cursory review. But the existence of such errors as I did identify, in the already-approved Ecma Standard, was disquieting. It should have lead Ecma TC45 to conduct a more through review of the spreadsheet functions. But this did not appear to happen.

When we received the response from Ecma to the NB comments, in January, INCITS V1 was asked to go through all the responses (over 1,000 of them) to determine whether they were acceptable. This review period was again insufficient.

As for the BRM in February, this was a travesty, as I and others have noted.

I have a theory concerning committees. A committee may have different states, like water has gas, liquid or solid phases, depending on temperate and pressure. The same committee, depending on external circumstances of time and pressure will enter well-defined states that determine its effectiveness. If a committee works in a deliberate mode, where issues are freely discussed, objections heard, and consensus is sought, then the committee will make slow progress, but the decisions of the committee will collectively be smarter than its smartest member. However, if a committee refuses to deliberate and instead merely votes on things without discussion, then it will be as dumb as its dumbest members. Voting dulls the edge of expertise. But discussion among experts socializes that expertise. This should be obvious. If you put a bunch of smart people in a room and don’t let them think or talk, then don’t expect smart things to happen as if the mere exhalation of their breath brings forth improvements to the standard.

So the the BRM ended a different committee than it started, and the mode of operation it was lead into caused it to act like a very stupid committee indeed. I don’t say this to be accusatory. I’m just making an observation about crowd behavior. When a committee of experts ceases to be a deliberate committee, then you will achieve subpar results.

One of the ways the BRM was stupid is that it approved changes to OOXML that have totally broken SpreadsheetML’s financial calculations, rendering the resulting calculations both mathematically incorrect as well as inconsistent with what Excel actually calculates. More about this later.

Here be dragons.

One of the persistent problems with OOXML has been in the area of “day count conventions” as used in SpreadsheetML.

Why should day counting be complicated? Just count how many days between the two dates and you are done, right? Indeed, everywhere but in finance it is simple. Some of the complications are for historical reasons, to try to make hand calculations easier in the pre-computer era. Also, these conventions made the calendar more regular, so financial instruments were less distorted by calendar irregularities like leap years or variable length months. If you assume a year is exactly 360 days in 12 months with 30 days each, then some things in life are simpler. Of course, this makes other things in life more complicated, including defining spreadsheet functions.

The devil is in the details. You must get these count conventions right. Being wrong by just one day in a year may only be 0.3%, but in a million dollar transaction, that is $3000. Not many people can afford to routinely ignore a $3000 error.

Different financial organizations have developed their own different day count conventions. We have the Banker’s Rule, the LIBOR rule, the NASD rule, the ISDA rule, ICMA rule, SIFMA rule, etc.

In Excel, these conventions are defined by a function parameter called the “basis” and this basis is used by many financial spreadsheet functions. In fact, none of the following functions are adequately defined unless the day count basis values are adequately defined:

ACCRINT()

ACCRINTM()

AMORDEGRC

AMORLINC

COUPDAYBS()

COUPDAYS()

COUPDAYSNC()

COUPNCD()

COUPNUM()

COUPPCD()

DISC()

DURATION()

INTRATE

MDURATION()

ODDFPRICE()

ODDFYIELD()

ODDLPRICE()

ODDLYIELD()

PRICE()

PRICEDISC()

PRICEMAT()

RECEIVED()

YEARFRAC()

YIELD()

YIELDDISC()

YIELDMAT()

There are five basis conventions defined in OOXML, with values 0-4:

Basis 0 = US 30/360

Basis 1 = Actual/Actual

Basis 2 = Actual/360

Basic 3 = Actual/365

Basic 4 = European 30/360

In this convention X/Y indicates that X is the assumption for month length, and Y is the assumption for year length. Note that there is no external authority given as a reference for any of these conventions. So it is not clear, whether the Actual/Actual convention is following, for example ICMA rules or ISDA rules. In fact, it may be following neither rule. So lacking any external authority, all we have to go on are the definitions provided in the specification itself. OOXML is its own authority for day count conventions.

As I reported last July, the definitions provided by OOXML did not sufficiently define the behavior of the conventions that lie behind most of the financial functions in Excel. Unfortunately, Microsoft/Ecma have failed to fix this problem in their proposed Resolutions, and in fact have made it worse. Further, the DIS 29500 BRM, in its negligent bulk approval of Ecma’s responses merely advanced these serious errors into the text which was then approved as an International Standard. So essentially, my work in pointing out errors in the spreadsheet language was for naught. Microsoft just shoved it all though anyways.

This puts me in a delicate situation. On the one hand, the ODF TC really would like to finish its work on ODF 1.2, and part of that is completing the OpenFormula work. A key remaining part of OpenFormula is to ensure that our financial functions synch up with how Excel works. What OOXML says is irrelevant, except to the extent that it accurately tells how Excel defines these functions.

However, inquiries to Ecma on these day count conventions, inqueries made months ago, have received no response. Also, the final DIS text of OOXML has not been made available, not even to SC34 members and liaisons. And there is not mechanism in place yet in SC34 for collecting defect reports on OOXML. So we’re stuck.

Or maybe not.

David Wheeler, Chair of the OpenFormula subcomittee of the OASIS ODF TC, has been trying to nail down the behavior of Excel’s spreadsheet functions for over a year now. One of the last remaining pieces is to nail down the day count conventions. After waiting and waiting for this to be clarified in OOXML, David took matters into his own hands and decided to solve the problem by brute force, enumerating millions of test cases, indeed a comprehensive set of date pairs over a 6 year period, to try to determine exactly how the date bases in Excel work. You can read David’s conclusions on his blog.

What strikes me in David’s report is that not only are the OOXML definitions incomplete and inconsistent, but they do not accurately reflect what Excel actually calculates. So either Excel is wrong, or the OOXML standard is wrong when calculating almost every financial spreadsheet function. This is quite an embarrassment for an ISO standard, and an unnecessary one, since I have been talking about how poorly defined these functions are for almost a year now.

Let’s work through some of David Wheeler’s test cases by hand, to get a better feel for how OOXML is broken.

Let’s take the YEARFRAC() function as the simplest example. YEARFRAC() takes two dates and a basis value as inputs, and returns the fraction of a year that is between those two dates. So logically, the calculation is like:

That is the logical definition. The only complication is the 5 date bases, and what exactly the year length is. This last point is something that stuck out in my mind when I first reviewed the draft of OOXML last summer. You might think the year length is 365 days. But what about leap years? And what about date ranges that straddle normal years and leap years? This is the key fact that this function requires in is definition. This is the problem I reported on my blog last July. This was the problem INCITS V1 submitted with our ballot comments last September. This is the problem that Ecma responded to in a severely flawed way in January. This is the problem that the BRM refused to discuss and merely agreed with Ecma’s flawed changes. And this is the problem that is now in the final DIS version of OOXML.

Take a look for yourself in this brief extract from the final DIS version [pdf] of OOXML, provided for purposes of critical review and commentary.

(Yes, I now have a complete copy of the final DIS version of OOXML. If you think that this is unfair — and I would agree with you on that — then maybe you should ask ITTF why I was able to get a copy of the final DIS, but no one else in SC34 was.)

Let’s take a look at the ISO OOXML definitions and try some test calculations to reproduce some of David’s findings.

First, let’s take the date basis 1 first, Actual/Actual, since that is the easiest. But we immediately run into a problem. The standard says two different things. In the “description” table it says:

Actual/actual. The actual number of days between the two dates are counted. If the date range includes the date 29 February, the year is 366 days; otherwise it is 365 days.

However, later, when defining the return value, the standard says this:

If the Actual/actual basis is used, the year length used is the average length of the years that the range crosses, regardless of where start-date and end-date fall in their respective years.

There is absolutely no way in which these can both be correct. This would have been easily fixable at the BRM, if the BRM had been allowed to do its job. But I wasn’t even allowed to open my mouth and point out this problem. So now this fatal ambiguity sits in the text of OOXML, as authorized by the BRM experts and approved by JTC1 NB’s. Gotta love it.

But let’s forge on and assume that we really have two algorithms, the first (from the description) which we will call basis 1, and the second basis (from the return value section) which we will call 1′ (“one-prime”). We’ll calculate it both ways.

Let calculate YEARFRAC(DATE(2000,1,1),DATE(2001,1,1),1)

With basis 1, this is simple. 2000 was a leap year (since it was a century divisible by 400) so the interval between the two dates is 366 days. Similarly, since the date range includes a February 29th, the length of the year used in the calculation is 366. So the returned value returned by YEARFRAC should be 366/366 or 1.0.

With basis 1′, this is also simple. The date interval is still 366. But the length of the year is now the average of the year lengths crossed by the date range. So the average of 366 (for 2000) and 365 (for 2001), or an average of 365.5. So using basis 1′, YEARFRAC should return 366/365.5 or 1.00137.

Let’s try another, with basis 1. What is YEARFRAC(DATE(2000,1,1),DATE(2002,1,1),1) ?

In basis 1, the interval is 731 days (366 days in 2000 plus 365 days in 2001). The year length is 366, since the interval includes a February 29th. So YEARFRAC should return 731/366 = 1.997.

With basis 1′, the interval is also 731 days, but the year length is the average of 366 (for 2000), 365 (for 2001) and 365 (for 20002). So YEARFRAC should return 2.0009

Just to make sure we have this down, let’s try another example:

What is YEARFRAC(DATE(2000,1,1),DATE(2000,1,2),1) ?

A one day interval? Yes, please humor me.

OK. With basis 1, the interval is one day. Since it does not cross February 29th, the year length is 365 days. So it will return 1/365 or 0.0028.

With basis 1′ the interval is also one day, and the year length is 366, So it will return 1/366 or 0.0027

OK. That was too easy. One more example, to make sure that you have it down.

What is YEARFRAC(DATE(2000,1,1),DATE(2004,1,31),1) ?

Hmm…. this one will require more thought. I might have to take off my shoes and count using my toes as well.

With basis 1, the interval is 1491 days = 366 + 365 + 365 +365 +30.

The year lengths are:

2000=366 (since the range cross February 29th)
2001=365
2002=365
2003=365
2004=365 (since the date range does not cross February 29th)

So we have 4 full years plus 30 days of a 365 day year. YEARFRAC should return 4 + 30/365 = 4.0822.

With basis 1′ we treat the 2004 as having 366 days and average the years in the interval, so average year length = (366+365+365+365+366)/5 = 365.4. So YEARFRAC should return 4 + 30/365.4 = 4.0821.

Now that we’re done with the examples, we can throw them into a table, and compare them to what Excel 2007 calculates for these same parameters:

start-date

end-date

basis

ISO Value

Excel’s Value

Excel Correct?

2000-01-01

2001-01-01

1

1.0000

1.0000

Yes

2000-01-01

2001-01-01

1′

1.00137

1.0000

No

2000-01-01

2002-01-01

1

1.9972

2.0009

No

2000-01-01

2002-01-01

1′

2.0009

2.0009

Yes

2000-01-01

2000-01-02

1

0.0028

0.0027

No

2000-01-01

2000-01-02

1′

0.0027

0.0027

Yes

2000-01-01

2004-01-31

1

4.0822

4.0805

No

2000-01-01

2004-01-31

1′

4.0821

4.0805

No

Let your mind linger on these results for a bit. Let it sink in. Look at this table until you recognize its significance and cringe in disgust. In some cases, Excel seems to be using the first definition of date basis 1. In other cases it is using the second definition of date basis 1. And in one case it is using neither definition. In other words, this is a lot more screwed up than at first it appears. This is not just a simple ambiguity. The OOXML definition of date basis 1 is totally wrong.

That is just the Actual/Actual date basis. The other 4 conventions are more complicated. I encourage you to read David’s write up in full to see how 3 of 5 basis conventions defined in OOXML differ from what Excel actually calculates. David also shows how he believes Excel really calculates these day count conventions, based on his extensive tests.

Now if this was just a matter of one function in Excel, just YEARFRAC(), then this would not be a big deal. But this is flaw is inherited into most financial functions in OOXML. Let’s take an example at random, DISC(), which calculates the discount rate for a security, given settlement and maturity dates, as well as par and redemption values. You can read the definition of DISC() from the final DIS text here [pdf].

You don’t need to be an Wall Street quantitative analyst to see some obvious problems here. First, in the formula given, the 2nd term is divided by “SM”. What is SM? There is no “SM” defined here. There is a “DSM” defined, however. Is that what is meant? Let’s assume so.

We can try a test calculation, using the example given in the text of OOXML:

What is the value of: DISC(DATE(2007,1,25),DATE(2007,6,15),97.975,100,1) ?

B = 365, since the date range does not include February 29th. (Note that DISC has a single definition for year length in basis 1, not the two conflicting definitions we saw in YEARFRAC)

DSM = 141 days = 6 in Jan+28 in Feb+31 in March+30 in April+31 in May+15 in June

par = 97.975, which was our input parameter

redemption = 100, which is another one of our input parameters

You can do the calculation given in the text, and find the return value of DISC should be 5.35037%. Plug the example into Excel and you will get the answer 5.24202%.

So Excel is off by 2% or so. Do we really care. It’s just money, right?

The problem is that the function in OOXML is defined incorrectly, from the financial perspective. The discount rate is the discount from the redemption value, not the discount from the purchase price. So the first term in the formula should be (redemption-par)/redemption, not (redemption-par)/par. If you make this change, then the calculated value matches the value Excel gives.

Does anything strike you as odd in the above? Do you have a chill running down your spine? Do you have renewed sense of dread? You should, because I just illustrated another grave problem with the OOXML standard: The spreadsheet examples in OOXML are a fraud.

You might have mistakenly been reassured by these numerous examples in the spreadsheet formula, that these actually had some relationship to the standard, that they were examples of how the calculations should be done, that they were evidence of some sort of quality assurance, that they may even be of assistance to an implementor to see whether they implemented the function correctly.

But they aren’t.

What would be normal practice would be to take the definitions, as given in the OOXML text, and to calculate the values according to the definition provided in the text, and then to compare the resulting values with what Excel returns. That would provide a useful double-check of the definitions in the text. But OOXML doesn’t do that. The examples here are mere fluff.

The discrepancy here also indicates that no one has actually reviewed these formulas for accuracy. Errors like this are immediately evident, but only if you look. The fact that things like this have escaped the notice of Microsoft, Ecma TC45, their Project Editors, 80 NB reviews, the BRM experts, and the eagle eyes of ITTF, should make one have considerable concerns over the the sufficiency of the Fast Track review and approval process.

Let’s try one more example before we wrap this up.

What is the value of: DISC(DATE(2000,1,1),DATE(2004,1,31),97.975,100,1) ?

B = 366, since the date range includes February 29th

DSM = 1491 days = 366 + 365 + 365 +365 +30.

par = 97.975, which was our input parameter

redemption = 100, which is another one of our input parameters

So, using the stated formula in the OOXML text, the answer would be 0.50736%. Using the “correct” formula, in the financial sense, the answer would be 0.49708%. What does Excel say the value is? Well, 0.49627%, of course.

So with the DISC() function we found:

The given text provided a formula that referred to a non-existant “SM” variable. This appears to be a cut & paste error.

After accounting for that, we found that the formula was incorrect according to recognized financial standards. Securities are discounted from their redemption values, not from their purchase prices.

Even correcting for the formula errors, we find that the given definition of DISC() does not match what Excel returns, due to errors and ambiguities in the day count conventions, errors that David Wheeler delves into more deeply in his report.

The examples given in the standard are bogus. They are not actually examples of the defined function.

My understanding is that legally, Microsoft Excel is rather like one of the Harry Potter books; a work of fiction, they wrote it, and they claim the exclusive right to sell copies of it.

On that scale, the question of how closely it represents reality, and whether it can be improved to represent reality more closely, are very much secondary considerations.

If you want something you can chew on and improve, you go for ISO26300 ODF XML and OpenOffice.org . But you’re unlikely to get rich selling copies of OpenOffice, because (thanks to the public licence) no-one has the exclusive right to sell it.

Fractured YEARFRAC ? Does that mean if you’re preparing consolidated financial statements, you’re better off using pencil and paper ?

We’ve pretty much finshed with the era of Lotus 1-2-3 and other commercial spreadsheet software. Go get a copy of IBM Lotus Symphony and, as it says on the tin, “Be Free. Work Smart.” I’m told quarter of a million flew off the shelves in the first week, and (thanks to the very special price of $0, and probably helped by the Superbrand name) they are growing and spreading like dandelions wherever there is fertile ground.

Commercial spreadsheet software has gone the way of Card Punches, Typewriters, Magnetic Core, and VHS Videotapes. All the rage in their time, growth markets with good profits, but all of them fallen to commoditisation, market saturation, and the onward march of technology.

Ah, we cross in the blogosphere. My feed reader hasn’t picked this up, but a commenter on my limited report did.

I just touched the surface on this. Thanks for the deeper analysis and the great explanations. The bar on this stuff has been too low, whatever the origins of it.

I agree this is a terrible situation. undermining reliable document interchange and interoperability among products.

I think there should be no match of these Excel functions in OpenFormula. Something else is needed, but not those and the names shouldn’t be borrowed either. It is time to raise the bar. Ideally, Microsoft would come and play at having a successful resolution.

Actually, it’s probably more likely that that will need to be com.microsoft.office.excel.2007.YEARFRAC

Anyone have any other versions of Excel lying around to figure out whether if what they do is any better? It seems odd that all versions could have been so wrong for 20 years and no large financial institutions ever picked up on it and “politely requested” MS to fix it so it works before they’d consider buying a substantial number of licenses.

Just looking at this and David Wheeler’s paper, something does occur to me.

Wheeler claims that the Basis 1 yearfrac for 2000-01-01 -> 2001-01-01 should produce 1.001… due to the date range being 366 days long and the end date is in a different year to the start date.

But, the date range is only 366 days long if the date 2001-01-01 is not a member of the range. If 2001-01-01 is a member of the range, then the range has 367 members.

The general idiom in CS is to specify ranges with inclusive start and an exclusive end, so that the number of elements in a range can be calculated from the difference between the end markers. Consistently excluding the end marker from calculations helps prevent off-by-one errors, which is a possible explanation of what Wheeler has done to find that error.

Of course, the question now becomes, does DIS 29500 specify whether ranges should be start-inclusive end-exclusive, does it specify start-inclusive end-inclusive, does it specify something else, or does it fail to specify range end behaviour at all?

The OOXML text merely says the function calculates the “fractional number of years represented by the number of whole days between two dates, start-date and end-date., according to basis”.

So it is ambiguous for multi-year periods, especially when it says “If the date range includes the date 29 February, the year is 366 days; otherwise it is 365 days.”

I calculated the year length on a year-by-year basis for multi-year intervals. But the text really doesn’t say to do that, does it? I could just as well said that the interval included 29 February, so the year length is 366, and left it at that.

The traditional method, before computers, would be to factor out the number of whole years and then to consult a table of day numbers (January 1st=1, etc.) and subtract the day numbers to find the date interval for the fractional part. That is what I did in that last example.

To Karallen’s point about whether the interval is open, closed, half open, etc., nothing is stated in the text. So absent that information, implementors may will different, incompatible assumptions.

So although I called out basis 1 and basis 1′ as two possible interpretations, based on the contradictory statements in the YEARFRAC text, there are likely also several additional possibilities, 1”, 1”’, etc. based on other ambiguities.

“I was able to get a copy of the final DIS, but no one else in SC34 was”

Now, I don’t know squat about it, but I thought I read here that disemination of the final DIS text to SC34 members was a requirement?

I also read that one of the objections that can be raised in these first two months, is that the final DIS text doesn’t match the amendments discussed in the BRM.

Surely the SC34 members should now start objecting on this basis, after all the text categorically doesn’t match – the initial draft consisted of 6000 pages, the final version as distributed to SC34 would appear to consist of zero pages!

Tim, the intent of the table is to summarize the examples I worked out above. They were not intended to be additional examples. But due to the typos I introduced while transcribing into the table, it might look that way. Let me know now whether you disagree with any of the calculations. (Btw, in case it is not clear, the dates are given in YYYY-MM-DD format.)

Alan, the point of this excercise was to help us craft accurate and unambiguous formula definitions for ODF 1.2. We were hoping we could copy OOXML’s definitions for the day count conventions, and in fact we’ve been waiting for the release of the edited/approved OOXML for that purpose. But it appears that this wait was in vain.

So the debate we are currently having on the ODF TC, is how do we want to define date counting to work in ODF:

1) Copy how OOXML defines it?

2) Copy what Excel does, based on our reverse engineering, regardless of what OOXML says?

3) Go back to the ultimate authorities, ISDA, NASD, etc., and copy their definitions, even if they result in different answers than what Excel (or OOXML) gives?

4) Do it all. There is no limit to how many date basis values we have. We could have 0-4 match what Excel actually does, and have 5-9 do what OOXML says, and additional ones match financial authorities.

I think what would help us the most right now, would be some clear indication from Microsoft, who controls both Excel and OOXML, on whether they intend to address the discrepancies by changing Excel to match the OOXML standard? Or changing the ISO standard to match Excel?

Thanks for changing the table. Yes, what you’ve got now agrees with what I thought you meant.

But I still think some of the ISO numbers are wrong. (But I realise now that you’ve used a different assumption to me – I’d be interested to see how you think the spec says we should handle the case where both dates are in the middle of a year – would you use a different year length for the fractional parts at each end?)

Row 1 and 2 – 2000-01-01 to 2001-01-01 – I agree with your calcs.

Row 3 and 4 – 2000-01-01 to 2002-01-02 – Again I agree.

Row 5 and 6 – 2000-01-01 to 2001-01-02 – Disagree with basis1.

I get 1 day between the dates, 365 days for basis1 year and 366 days for basis1′ year. So I get 0.002739726 for basis1 and 0.00273224 for basis1′. Basis1 is greater than basis1′ but not by enough to round to your result.

Row 7 and 8 – 2000-01-01 to 2004-01-31 – Disagree with both.

I get 1491 days between the dates, 366 days for basis1 year and 365.4 days for basis1′ year (366+365+365+365+366)/5. So I get 1491/366=4.073770492 for basis1 and 1491/365.4=4.08045977 for basis1′

With the exception of the very first example, my numbers appear to agree with Excel using basis1′ based on the numbers in your table.

While I can see how you’ve done your calculation for the four year example, if you’re going to remove whole years first then I think you should do the same for the one and two year examples too. So the first range should be 1 (+ 0/something) and the second range should be 2 (+ 0/something).

I’d start with option #3. Go to the financial authorities (ISDA, NSDA, etc.) and then, if the need arises, add some definitions that match OOXML and/or Excel. After all, if you’re doing finanical calculations, I figure it’s more important that you get the *right* answer than that you get the same answer as someone who’s doing it *wrong*.

“Do it all. There is no limit to how many date basis values we have. We could have 0-4 match what Excel actually does, and have 5-9 do what OOXML says, and additional ones match financial authorities.”

As much of a headache as that probably is for implementers, this seems to me to be hands-down the best option. Particularly if your definition allowed for a “Default Basis” so that one could change the basis for an entire spreadsheet in one swoop. I have to imagine that some people would like to go through legacy sheets with decidedly incorrect calculations for discount rates, and correct them by switching the basis.

Alan: There are no spreadsheet formulas defined in IS 26300. David Wheeler is out to resolve that with the OpenFormula addition for ODF 1.2 and this is why he is pulling hair out around IS 29500 and the Excel disparities. OO.o is on its own at the moment, although I presume that they have made a serious effort to be able to convert reliably to/from Excel for serious practical reasons.

Rob: To further explore how these ambiguities about the nature of the interval hurt us, I did an experiment in OO.o 2.3 with the following values (basis 0 .. 4):

2007-12-31 to 2007-12-31: 0 as expected

2007-12-31-to 2008-12-31: same as Excel 2003 to 2007 (I know that Excel silently changed some of this around 2000/2002, but I haven’t checked with my copy of Office 2000.)

2007-12-31 to 2008-01-01 Same as Excel and the reciprocal of the value is 365 for basis 1 and 3

2008-12-31 to 2009-01-01 Excel is same as for 2007-12-31 to 2008-01-01. OO.o 2.3 has a smaller number for basis 1 and its reciprocal is 366. OO.o 2.3 Novell edition is identical to OO.o 2.3 Sun distribution, as expected. (All comparisons are on x86 Windows versions.)

There is another deviation in OO.o 2.3. For basis 0 and 3, the reciprocals of the 1-day year fractions are presented as 360.000000000001. For basis 2, the 360 is exact to 12 places and that is the case for all of the Excel 360 results. I don’t know if this is an output conversion edge case in OO.o, but it would seem that there are small differences in the internal value that presents as the same 0.0002777777778 in all of the OO.o and Excel sheets I made.

I’d definitely vote for “Do it right” over matching the Excel/DIS29500 calculations. For a long time, Excel wasn’t particularly accurate when it came to Normal distribution tails – doing statistical analysis with Excel 97/ 2000/ XP was always a close approximation, rather than the expected value. No one expects converters from Excel XP file format to use the same inaccurate representations for the Chi-squared tests or Normal distributions.

If you must provide a complete compatibility layer, then appropriately named functions which clearly indicate that they are consistent with the Microsoft product, rather than with the rest of the world, lets everyone know precisely where they stand.

How about keeping the function names the same (since existing spreadsheets will complain if they don’t exist) but allow selection of Excel, OOXML, or financially correct behavior via a preferences option?

This is really creepy. I hope Controlling at my last employer knows about these … small discrepancies of their “precious” workhorse.

I’d say, do it the right way and go by the books and don’t stay compatible for the sake of it. Either Microsoft will change the function because of any reason so you’re off by 0,001 again or do it right and anyone with pen & paper can verify ODF’s numbers.Break it, before it breaks you!

With regards to how Open Formula should implement the calculations, I would suggest just offering options to use definitions from financial authorities, and from what Excel 2007 actually does.

There is no point in implementing the OOXML definitions, because nobody will care what the OOXML standard says. You should just provide a choice of the “right” answer, or the MS “legacy compatibility” (Office 2007) answer.

As to whether this should be done via an overall document “switch”, or by changing the “basis” parameter is perhaps a question that should best be answered by people who actually use these features. Whatever means is chosen should leave open the ability to add more basis definitions, to account for different standards used in different countries.

This is seems to be an area that that needs its own international standard that concentrates just on the mathematics of the calculations. The Open Formula standard could then concentrate on how to implement them in spreadsheets. That is a something for the longer term though.

I’m mostly agreeing with what Toby Haynes says in regards to doing it right, but am not sure about the need to add DIS 29500 or Excel compatibility to the ODF standard.

My suggestion would be to add an informative note of some kind to the ODF standard pointing out that DIS 29500 does things differently (with a reference to the relevant section(s) of that document, and possibly a reference to this blog post?) and that some legacy spreadsheet applications will also do things differently, with possible references to other documents/resources that make best-efforts to document legacy spreadsheet behaviour. The note might then also suggest that implementations may use QoI for DIS 29500 and legacy application support as a product differentiator/marketing bonus, but that such support is in no way part of ODF or required for ODF conformance/compatibility.

Rob, you have a calculation error or typo in one example, and I think you are misreading the specs in another.

Before I continue, I want to say that David Wheeler’s paper that you link to does not have these errors and provide a more complete explanation. People shouldn’t try to refute your entire argument because of a couple of errors that do not appear in the source material you refer to.

First, you say “So it will return 1/365 or 0.0028” and you also include that number in the table. 1/365 is 0.0027397, which rounds to 0.0027.

Next, in the calculation for YEARFRAC(DATE(2000,1,1),DATE(2004,1,31),1) you take the four whole years as a given and use the rules to compute the remaining fraction. But the DIS text for YEARFRAC says nothing about doing that. It only allows for taking the total number of days and dividing them by an average year length. The difference between Basis 1 and 1′ is in how that average year length is calculated, but you still would have to divide the resulting number into 1491 days. That would make Basis 1′ match Excel 2007 in this case.

Excel 2007 does seem to use the interpretation that you call Basis 1′ in most of the examples. The real contradiction, other than that in the language of the DIS text, is as David Wheeler points out that Excel 2007 seems to use a slightly different algorithm when the interval spans a year or less. That shows up in the example of the first two rows of your table.

[quote] This would have been easily fixable at the BRM, if the BRM had been allowed to do its job. But I wasn’t even allowed to open my mouth and point out this problem.[/quote]

There was plenty of time before the BRM to give feedback to Ecma which organised special on line meeetings which all BRM members could attend.Did you even attend one of them or did any of the BRM members working for IBM ?

Actually the solution is simple if you’ll think about WHY the standards are even exist. They exist to make user’s life better. They don’t always achieve this result, but that’s the goal.

Now what is the best scenario FOR USER when he/she deal with legacy documents? Is it implementation of the same old wrong function? Or correct one (agreed by financial authorities)? The answer is: “it depends”. If the document is opened for AUDIT – it’s VERY important to see the same numbers. Or else you can go to jail for fraud. But if you want to CHANGE document then you need to use CORRECT formulas.

So the solution is simple: either implement TWO sets (Excel and financial authorities ones, noone cares about OOXML) or just a single ONE (financial authorities one) PLUS some kind of flag to mark legacy documents. Then if you open some old archived document you’ll see nice message “this document uses incorrect version of functions ABC and XYZ, they don’t conform to requirements of FOO and BAR and so results can be different from what legacy application showed”. This is IMPORTANT: either user should see that calculations are changed and know WHY they are changed or user should see exact same numbers.

Why even bother with legacy definitions? Because they are not just legacy of MS Office. Today MS Office 2007 does things wrong, tomorrow OOo will do things wrong – if you’ll insist on keeping all kind of calculations around 100 years from now you’ll have 10’000 basises and applications of gigabyte sizes. But if you don’t flag roken documents – then how can will you distinguish broken and unbroken ones?

From what I understand regarding this article is that any business migrating to Excel 2007 stands the risk of failing a SOx (Sarbanes-Oxley act) audit. The SOx act was enacted into law to ensure that businesses, especially large businesses, report their financials correctly. A CEO that signs off on her financials, if the financials were produced by Excel 2007, stands the risk of hefty fines and even jail time. If Excel is going to produce faulty numbers then I think that anyone implementing it in a business environment is negligent. Although of course they can be excused for assuming that Excel actually works correctly. I don’t think people would be able to be blamed for assuming Excel is correct but it definitely introduces a financial reporting risk into their business and on this basis anyone implementing this piece of software should be alerted to this.

I wonder how much liability Microsoft will carry for this if the results of Excel 2007 adoption is the investigation and indictment of a large number of businesses and CEOs for incorrectly reporting or “falsifying” their financials?

There is an infinite amount of ambiguity regarding the endpoints of the range. Suppose it is meant that the time of day of each day is noon, for instance. Then the interval from 2003-12-31 to 2004-01-01 has half a day in a normal year and half a day in a leap year, which makes the YEARFRAC value 0,5 / 365 + 0,5 / 366…Similarly this influences whether a range “crosses” 29 february or not.

Upon further reflection as to what should be defined in OpenFormula, I’d like to suggest that you get away from even the ambiguity of 0-4. There’s no reason at all why the Basis couldn’t be something quite clear such as “NASD” or “ICMA.” That would give you the flexibility to optionally support Excel or OOXML’s rules as something like “Excel-0” or “IS29500-3.”

Again, I would urge you to allow in the specification a generalized Basis setting for the document as a whole, so that it would be easy to change in a single move.

As far as whether applications need to allow for an “Excel-compatibility” standard, I see that as absolutely vital. Unless someone is an avid reader of “An Antic Disposition,” they are unlikely to understand ambiguities and errors in Excel’s calculation. And if they open a .XLS in OpenOffice.org or Symphony, and see different result calculations than in Excel, the assumption is going to be that the new program is wrong.

I know that sucks, but it’s how the perception will work — guaranteed.

The best thing to do it highlight it in the application. “This spreadsheet uses an ambiguous standard for fractional year calculations, and therefore may misrepresent discount rates. Click here to select from available techniques for calculating fractional years.”

There are many sides to the Excel compatibility issue. Why on earth someone would want an incorrect calculation? I can think of two reasons.

One is archiving. The file that was historically used had incorrect calculations in it. This has to be preserved as is for the sake of fidelity to history. If ODF is used for archiving of current Excel files, then bug preservation becomes a requirement.

This raises the question of which Excel version was used in creating the file. We don’t know yet if the behavior of Excel was the same from version to version. This is not something I would take for granted. Tests are required.

The other is perception. People use Excel and just trust it to be accurate. If the numbers from another software disagree, then a natural reaction might be to assume Excel must be right because this is what people use and have been using for over a decade. Then the incorrect behavior becomes the truth because nobody cares about the actual truth.

Martin raises the possibility of failing SOX audits. But if the auditor uses Excel and believes it to be correct, then using the correct formula is what could land you in trouble. You are safe from a SOX perspective as long as everybody uses Excel.

This means the public relation aspect is essential. There is a need to teach the financial community about the bugs and their consequences. There is a need to teach auditors, CEOs and CFOs that if Excel is part of the financial process, then there is no SOX compliance.

This is not a small point. I know programmers that download in Excel database extracts, process them in VBA and the upload the result in the database. This kind of processing is the major reason for many users to want millions of rows in their spreadsheet. Otherwise they have to break their database extracts in small parts and stitch the result of partial computations afterwards.

martin and others: Although Wheeler singles out Excel 2007 (presumably because it implements ECMA-376 in its .xlsx files), the Excel 2007 implementation of the calculations appears to have been unchanged since Excel 2000.

I’m going to double check this with some of the good examples and edge-case tests that we now have, but I am pretty confident that this is not a problem to be avoided by sticking with pre-2007 versions of Excel. Also, it is not clear whether the non-OpenFormula versions of ODF implementations provide any salvation (say, up through OO.o 2.4 and other products using the same spreadsheet code base).

Rob, I have been nosing around some more and I can report on the examples now.

All but the last example for YEARFRAC in the IS29500 you have are identical to the ones in ECMA-376. They have not changed.

The final example, illustrating a case with year divisor of 365.33333, is new in IS29500.

All of the examples are reporting what Excel produces. I verified them with Excel 2003 and expect no difference in Excel 2007 and probably not Excel 2000.

My tentative conclusion is that the additional explanation of the parameters and basis values in IS 29500 are simply bogus (except for the easy cases of basis 2 and 3).

Another tentative conclusion is that there is tremendous confusion about which dates from the beginning to the end date are actually *in* the range. This is leading to a lot of mischief, but I’m not sure on who’s part. I notice a couple of places where Wheeler seems to have stumbled into that pitfall, and I’ll clean up my examples and discuss that with him. This bears on only a small part of his analysis in any case and I suspect it has tripped-up others even more.

opensourcerer: The press release is a bit more clear than that reading of Mahugh’s comments. The claim is that current Office 2007 is pretty much already IS29500 *transitional* and I don’t know that they’ll tweak it much with SP2 next year. My reading is that the next version of office is where we’ll see movement to IS29500 *strict*.

The exact quote is that “IS29500 … is already substantially supported in Office 2007, and the company plans to update that support in the next major version release of the Microsoft Office system, code-named ‘Office 14.'”

@Orcmid: I agree with your understanding about OOXML *Transitional” being substantially supported in Office 2007, and OOXML *Strict* being introduced with Office 14.

What strikes me is that the ambiguity about OOXML Transitional and OOXML Strict is introduced by Doug and Jason themselves. They are probably the most knowledgeable people at Microsoft about this nomenclature, but still Doug simply said “Microsoft would not implement the *final* ISO version of OOXML until Office 14 ships”.

Is this simply a negligence, or is it meant to open the door for a more drastic move by Microsoft at a later date ?In any case, I read this sentence as meaning that Microsoft will not tweak Office 2007 at all in SP2 to support the changes introduced by the BRM in OOXML Transitional.

Those words leave no place for approximating 2(a+b)/(c+d) as a/c + b/d.

Note that (c+d)/2 is the average year length and we are assuming an interval beginning in one year and ending in the next. The three-year case would involve

3(a+e+b)/(c+e+d) ≅ a/c + e/e + b/d,

with (c+e+d)/3 the average year length, and the N-year case would be

N(a+e+f+…+b)/(c+e+f+…+b) ≅ a/c + e/e + f/f + … + b/d.

These approximations might have been used in the old days, but they can’t be described as “actual.”

I gather that you have some influence over what goes into these standards? In that case, I would suggest that the standard ought to use definitions such as that given below for YEARFRAC, instead of words such as

“If the Actual/actual basis is used, the year length used is the average length of the years that the range crosses, regardless of where start-date and end-date fall in their respective years.”

Instead, it should say

“YEARFRAC is the number of days in the interval times the number of years which the interval includes, divided by the total number of days in those years. The earlier of the two dates specifies the first day of the interval, and the later of the two dates specifies the first day not included in the interval.”

Then, it should elaborate:

Let A be the earlier date and B the later date.

Let B−A be the (certainly non-negative) number of days from A to B, zero if A==B.

(As an aside, A is the first day of the interval, and B−1 is the last day (inclusive). The interval begins at the beginning of the first day and ends at the end of the last day. A zero length interval ends the instant before it begins, and a one day interval begins at the beginning and ends at the end of the its single day. How could an interval possibly mean anything else?

Well, we could say the interval ends at the beginning of the first day after the end, i.e., at the beginning of day B, so that a zero length interval ends at the exact instant it begins.

In any case, no reasonable definition would have the interval end at the beginning of the last day. Thus B is obviously the first day not belonging to the interval and no reasonable person could say otherwise. End of aside about interval definitions.)

Let x.y be the year of date x, x.m be the month, and x.d be the day. Let date(x.y,x.m,x.d) be identical to x for all dates x.

Let F be the first day of the first year of the interval, and let L be the last day of the last year included in the interval:

F = date( A.y, 1, 1 ), and

L = date( (B−1).y, 12, 31 ).

Let N be the number of years covered by the interval:

N = L.y + 1 − F.y.

Let D be the total number of days in the years covered by the interval.

D = L + 1 − F.

Obviously, the “average length ofthe years the range crosses” is

D / N.

Then clearly the description quoted above (from the “return value” section) cannot mean anything other than

YEARFRAC = N * (B − A) / D.

—-

The first description (from the “description” section) requires us to discard an obviously incorrect interpretation (that all years of the interval should be regarded as having length 366 if the interval crosses February 29). Also, it needs an additional primitive function, F29(X,Y), which is zero if the interval does not include February 29, and one if it does. This very need makes it extremely suspect, and I think the only reasonable interpretation of the standard it to regard that description as a holdover from an early draft which was corrected in one place but not another. However, if you exclude the obviously wrong interpretation, it does describe a well-defined calculation, as follows:

When does a year start? Is this based upon local time for the computer running a spreadsheet, or something like UCT? This could make a difference, especially if you are looking at time differences between events at different locations.

re your discussion of start and end – I’d like to point out that number theory implies the reverse of your recommendation – the last day should count – the first should be considered day 0 – not day 1. Here’s why.

1. The definition of ‘last day’ implies that it is part of the range and it is traditional practice to specify COB (Close Of Business) as the time on the last date.

2. Number theory starts counting from 0. This means that the difference between 3 and 5 being 2 (resulting from simple subtraction may seem ambiguous, but consider the border case of the difference between 0 and 3 (3 days delta) – again simple subtraction. By your logic I’d have to subtract 1 from 3 (because 3 is not part of the range), resulting in a difference of 2, then add one because subtraction inherently does not count the ‘first’ day. Finally, the difference between 0 and 0 (same day) is still 0. (Remember – it was the Phoenicians’ invention of 0 that finally made mathematics into a viable tool for counting and accounting work.)

Finally, consider payments on a loan. This is one use of the difference between two dates. Counting the first day as part of the range makes sense as does that last day and everything hangs together (we’re talking delta-days here – not absolute days) provided that you start with day 0 instead of day 1. Paying off a loan on the first day (as day 1) results in unnecesary complications to your equations which also results in more potential for error. Treating the first day as day 0 cleans up a lot of issues and simplifies your logic.

Agreed. Exclude the first day and include the last, especially if this is the standard practice. I was arguing against the possibility of having both the first and last day in the range, which as I recall some comments suggested was possible.

On the one hand, the ODF TC really would like to finish its work on ODF 1.2, and part of that is completing the OpenFormula work. A key remaining part of OpenFormula is to ensure that our financial functions synch up with how Excel works.

I can appreciate how you strive for accuracy. You really want Excel spreadsheets converted into ODF to return the same numbers which Excel returns. But Excel often gives wrong answers.

If I was writing a new standard, I would want that standard to produce correct answers. It’s ridiculous that leap year bugs for example should be preserved forever.

So why not do this: there should not be a single YEARFRAC function. Rather, there should be, for example, EXCEL98_YEARFRAC, EXCEL2007_YEARFRAC, DIS29500_YEARFRAC and so on. Each function would return exactly the values returned by the YEARFRAC function on the specified platform, bugs and all. You’d also write a ISO26300_YEARFRAC function which fixes all known bugs. Of course, if some bugs are subsequently found in your own function you can add a new ISO26300v2_YEARFRAC function which fixes the previous.

When converting a document from Excel 2007 or any other spreadsheet, function references to YEARFRAC are modified to refer to the platform-specific function name. That will ensure that imported spreadsheet calculations still return the same values.

The user (spreadsheet editor) will be able to make a conscious decision about whether to retain the platform-specific function names (and their bugs) or change to different functions which aren’t buggy.

Definitions of the EXCEL2007_* functions and so on seem to be hard to nail down. These could be documented as extensions to ODF, with the proviso that a document is ODF compliant only if it uses only published extensions.

Publication of the extensions can be via written specification and/or by a reference implementation. A reference implementation would contain code in some well-defined language which could be used as a plugin to any ODF compatible tool. The extension could be named by a URL and the tool would download the URL and compile (or interpret) the code directly.

Yes, it makes a much bigger function list. But the details can be published separately. And you need to be able to preserve the different bugs of Excel 98, Excel 2007 and so on, without crippling people who _don’t_ want to use functions containing those bugs.

I’ve worked on date codes at a financial software company and as a quant at a bank. You’d be mad to trust Excel to get date arithmetic correct. And there are way more day count basis types than the few that Excel supports. I can think of four different 30/360 types.

Try googling “On the accuracy of statistical procedures in Microsoft Excel” for another area that’s been broken for a long time.

Speaking as a front-office (dealing room) software developer of (blimey) about 20 years now, I can attest that I have never seen any of the functions mentioned used in anger in an investment banking context. Which appears to be a Good Thing.

As mentioned briefly above, the functions are inadequate in their coverage of day count bases and many other “features” of the real-world financial environment. Because some cases can’t be handled at all, there’s little interest in checking Excel’s accuracy at all in this area. There’s ample money at stake for the banks to invest in making sure they can handle everything, and to be able to extend the libraries as innovation occurs within the markets.

I’d suggest that this is a large part of why the problems have yet to make it into a feature list at MS – the users whose voice would be loudest in complaining aren’t interested.

Frankly, these calculations are far too important to be left to Microsoft, which is a good thing for the Redmond folk, I’d suggest.

Fact = Factor to multiply the annual coupon rate (CR) by to get the accrued interest (AI). It is often N / Den.

N = Number of “days” between Y1.M1.D1 and Y2.M2.D2. We say “days” because this number can vary depending on the day count convention (DCC). This is often referred to as the number of days in the accrual period.

Den = The denominator for conventions where the Fact is N / (some number of days). It is often referred to as “number of days in the year”, though this is a bit of a misnomer.

Nnl = Number of days between Date1 and Date2 that are not in a leap year.

Nly = Number of days between Date1 and Date2 that are in a leap year.

Y1.M1.D1
Starting date of the interval, aka Date1. For a security with periodic payments, this is the coupon payment date immediately preceding Y2.M2.D2.

Y2.M2.D2 = Ending date of the interval, aka Date2. For accrued interest on a trade, this will be settlement date. For end-of-day income/expense accounting, it is the day through which valuation is determined (often the business date following the valuation date).