I just wanted to say hello to the forum! As I am sure you realise, Paul and I have been working hard on the 64-bit version of FTN95 - but I confess I have stayed away from forum discussions so far! I am extremely grateful for Paul's efforts here.

I started work on the 64-bit version of the product back in the days of Salford Software. Back then, if you remember, there were two competing 64-bit architectures - Intel's Itanium (a horribly complex architecture IMHO), and AMD's x86-64 (or whatever they called it back then). This complicated our plans, and for a while we thought the answer was .NET - which would have targeted both architectures for free.

Eventually we did start on native compilation to the AMD architecture,

However, as many of you know, FTN95 was transferred to Silverfrost, and I moved on to consultancy. In mid 2014 Paul and I made the decision to finish the project in retirement - better late than never!

Hopefully we are close now - 64-bit CHECKMATE is almost ready for you all to test, and the optimiser will hopefully be available before too long!

Thanks for the update and thanks to both you and Paul for the work you have put into FTN95 /64. I have found the change from /32 to /64 to be very easy and surprisingly reliable very early on.

I am especially interested in the capabilities that /OPT will provide. I assume that features of the /32 /OPT version will be replicated where appropriate. Has there been a review of what optimisation features are provided in FTN95 vs those in other compilers? I am hoping that there may be some other optimisation features that are more suited to /64 that could be included.

I reviewed a number of the Polyhedron benchmark tests to see what was causing FTN95 to perform poorly.

In some it was just poor coding, especially for the use of array sections, some where temporary copies were being used more often by FTN95. (I must admit that iFort's use of stride to overcome temporary arrays for array sections scares me a lot, as it changes the old Fortran concept of subroutine arguments provide a memory address for contiguous arrays.)

Other common ones were X**real, eg X**2.0 which can be fixed.

Identifying the repetition of groups of calculation were another.
I think an area where FTN95 does not compare well is with long calculations which can require many registers, eg repeated lines in chemistry calculations. This may be due to identifying repeated calcs involving unchanged values, although I learnt to avoid this coding approach many years ago. The alternative approach of providing code that documents the formula and letting a smart compiler optimise is good for auditing code, as long as the compiler gets it right.

Another clean out I found was that there should not be large local or automatic arrays placed on the stack. They should be handled via a virtual ALLOCATE. Stack overflows should not happen due to local arrays.

I finally gave up with the review, as about a third I could clean up with better array structures, another third were identifying repeated or unnecessary calculations, but a significant proportion were just complex code that other smart compilers could pull apart. I was left thinking that re-writing this type of legacy code is a very bad approach and optimising compilers have a definite place for this style of code.

Vector instructions via /SSE and /AVX are a very good example of where significant performance improvement can be achieved with modern hardware. This can be easy with array syntax or identified inner loop calculations by other smart compilers. FTN95 should develop this capability where possible.

Unfortunately optimisation is an area that generates a lot of reported compiler bugs, be it the fault of the compiler or of non-conforming code that use to work.

I would be interested if there could be more discussion of the FTN95 /64 /OPT features and if there are possibilities of other enhancements that the /64 instruction set may readily provide.

John

ps: could an option /32 be provided ? While it is default, it could be a good form of documentation of the compile statement. also /net (or .net) could be another option.

Let me make a few general points about optimisation in the 64-bit environment.

We never introduced SSE instructions into 32-bit optimised compilations, because of a concern that the precision of the answers might change slightly from those obtained by the old coprocessor register stack. At 64-bits we have dropped support for REAL*10, which required the coprocessor register stack, and focused on the SSE instructions for all FP operations (except a few intrinsics). In /opt mode we certainly use parallel operations for certain loops - such as dot-product like situations.

Some of your comments such as x**2 seem to relate to a fairly early version of FTN95/64.

In general the old tree-level optimisations have been carried through to 64-bits but there is a new process that re-arranges the code within inner loops so as to keep as many scalars in registers as possible, and eliminate a lot of instructions.

My aim is to get to the point where /opt goes through all the code that we use for the main test suite. However, as you imply, optimisation is very sensitive to particular combinations of features, so it can be hard to remove all glitches!

As you may know, the 64-bit instruction set is huge, and grows over time! Although we have not yet encoded every instruction, the 64-bit CODE/EDOC in-line assembler feature is already pretty powerful.

BTW, I hope you understand that I'll probably only post occasionally on this forum, because I work best without too many distractions!

Glad to see you here and that you are working full speed on this great compiler, which started years ago with explosion of all Fortran business and since then kept being the best in many many ways. I remember how after getting FTN77 around 1990, I installing each new Fortran compiler of different companies thought that they all were made by Neanderthals. LOL.

I still remember how in your DBOS which stands from David Bailey Operating System, it was possible to parallelize tasks ...and that was in DOS era ! That was shocking. Well, of course this was not real paralelization but that was move in right direction. I also remember how I tried to switch to Cray supercomputers but found that their compiler was so dull and primitive so with FTN77 I achieved more. Then I moved to the heart of all supercomputer uses which often had all top 5, all the latest and greatest of them, and again still kept using Salford.

The only negative of it was its 2-3x price tag which made it elitist compiler. Also it was no attempts to penetrate the US market which likes everything "el cheapo" and "for 10℅ less then the major brands". That at the end translated into small user base and way too many bugs in each new introduced feature specifically moving from Fortran 77. My impression was that adopted design approach was like with Microsoft "shoot first, ask questions later", "release fast, fix bugs later"

Yes, Paul made a great efforts during these years. I think he has the right to be called "Mr.Fortran" taking it from Intel's Steve Lionel who kept it for previous decades.

If you guys finish 64bit, optimize it and make parallel this compiler will be used for many decades. Great would be if you start also looking at younger programmers to select the best followers which will keep the great genes of this compiler for next generations!