User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0
this is the test case:
http://seanwhalen.home.comcast.net/xslt_test/
that page basically just loads the file "Source20.xml", and transforms it with
"RevealBombs.xsl". The XML is 225 elements with 6 attributes each. In Internet
Explorer, it takes 1.9 seconds, in Firefox it takes about 9. The javascript in
the index.html page launches alerts that show the elapsed milliseconds.
The XSL uses a 9 part equasion with "OR" conjunctions, and I think the
slowness is in how that is processed. I tried using separate variables,and
union operators, and those didn't help. The times for Firefox and IE are always
the same.
this is the full graphical implementation of the data:
http://seanwhalen.home.comcast.net/sweeperscript/
which is more or less obviously an xslt version of MineSweeper.
Reproducible: Always
Steps to Reproduce:
1.just load http://seanwhalen.home.comcast.net/xslt_test/
2.then just wait for the Alert boxes to announce the XML and the elapsed time.
3.review in IE.
Actual Results:
that page basically just loads the file "Source20.xml", and transforms it with
"RevealBombs.xsl". The XML is 225 elements with 6 attributes each. In Internet
Explorer, it takes 1.9 seconds, in Firefox it takes about 9. The javascript in
the index.html page launches alerts that show the elapsed milliseconds.
The XSL uses a 9 part equasion with "OR" conjunctions, and I think the
slowness is in how that is processed. I tried using separate variables,and
union operators, and those didn't help. The times for Firefox and IE are always
the same.
this is the full graphical implementation of the data:
http://seanwhalen.home.comcast.net/sweeperscript/
which is more or less obviously an xslt version of MineSweeper.
Expected Results:
I'd like the times to be better b/c the game just isn't playable in Firefox
now.
restatement:
that page basically just loads the file "Source20.xml", and transforms it with
"RevealBombs.xsl". The XML is 225 elements with 6 attributes each. In Internet
Explorer, it takes 1.9 seconds, in Firefox it takes about 9. The javascript in
the index.html page launches alerts that show the elapsed milliseconds.

So on trunk, the test url spins the CPU for a good long time... I profiled it;
the results are:
Total hit count: 287165
Count %Total Function Name
18700 6.5 _PR_x86_AtomicIncrement
18533 6.5 nsGenericElement::GetAttrNameAt(unsigned, int*, nsIAtom**,
nsIAtom**) const
14208 4.9 nsAttrAndChildArray::GetSafeAttrNameAt(unsigned) const
13682 4.8 PR_AtomicIncrement
12244 4.3 _PR_x86_AtomicDecrement
9816 3.4 nsCOMPtr_base::~nsCOMPtr_base()
9754 3.4 __i686.get_pc_thunk.bx
9618 3.3 nsAttrAndChildArray::IndexOfChild(nsIContent*) const
8767 3.1 txXPathTreeWalker::moveToValidAttribute(unsigned)
8265 2.9 txNameTest::matches(txXPathNode const&, txIMatchContext*)
7673 2.7 AtomImpl::Release()
7508 2.6 AtomImpl::AddRef()
7179 2.5 nsCOMPtr_base::begin_assignment()
6343 2.2 nsAttrAndChildArray::MappedAttrCount() const
6072 2.1 txXPathNodeUtils::getLocalName(txXPathNode const&)
5363 1.9 PR_AtomicDecrement
etc.
More interestingly:
Total hits: 287165
Hits under AtomImpl::AddRef: 31302
Hits under AtomImpl::Release: 23173
Almost all the addrefs are from nsGenericElement::GetAttrNameAt (and the
releases are when the COMPtr we got the atom into goes away).
In fact, total hits under GetAttrNameAt: 71493
These break down as:
18533 in nsGenericElement::GetAttrNameAt
27586 AtomImpl::AddRef()
12652 nsAttrAndChildArray::GetSafeAttrNameAt(unsigned) const
8939 PR_AtomicIncrement
1155 nsAttrAndChildArray::MappedAttrCount() const
1049 __i686.get_pc_thunk.bx
815 _init
764 __i686.get_pc_thunk.bx
For a more top-down view, of the 287165 total hits we have 188281 under
FilterExpr::evaluate and 98804 under PathExpr::evaluate. These both end up
going through RelationalExpr::evaluate (286539 hits under here). Then I get
lost in the maze of XSLT code.
I'll attach a profile, but maybe we can start by deCOMtaminating getAttrNameAt?
Maybe it can return a struct that holds two nsIAtom pointers? Or return a weak
nsIAtom and out another weak one, with some clear documentation? Or even return
an nsAttrName* (and null for bogus incoming index)?

Some of this is bug 208172. But then we may want to have a good nonCOM method
to get the local name of an atom. I did an lxr search on call sites to
GetAttrNameAt, and far from all use the prefix, so having a non-addref
only-localname and namespace ID method might be worth it.
One thing that we may optimize right away inside tx code would be to get a
qname struct in txNameTest::matches. Because actually, we end up calling
GetAttrNameAt twice, once for the namespace, once for the localname.
So for each attr, we end up with 2(4) addrefs-release cycles instead of
- none with a special entry point
- 1(2) with a qname struct.
The numbers in braces are for namespaced attrs, which we don't have in this
case.

Actually, the GetAttrNameAt problem is worsened by the fact that module
txXPathTreeWalker::moveToNextAttribute will call GetAttrNameAt checking that the
attribute isn't an xmlns one.
That would be solved by txNamedAttributeStep, but an alternative solution is to
move the xmlns checks to a more optimized place making it the responsibility of
the caller of moveToNextAttribute to filter them out.

Not really.
http://seanwhalen.home.comcast.net/xslt_test/RevealBombs.xsl has expressions
like
"//click[last()]/@h"
"//square[@h=$hClick and @v=$vClick]"
"//square[ (@isRevealed = 0 and not (@sqID = $alreadyRevealed/@sqID )) ]"
so txNamedAttributeStep is just part of it. It may help a few parts, but the
real buggers are inside the // part and the simple predicates that don't need
the whole context.

In this nice little thing:
<xsl:variable name = "revealing" select = "$field[
@isBomb != -1 and @isRevealed = 0 and
(
(concat(@h -1 ,'/', @v ) = $zeros/@sqID)
or (concat(@h +1 ,'/', @v ) = $zeros/@sqID)
or (concat(@h -1 ,'/', @v -1) = $zeros/@sqID)
or (concat(@h -1 ,'/', @v +1) = $zeros/@sqID)
or (concat(@h ,'/', @v ) = $zeros/@sqID)
or (concat(@h +1 ,'/', @v +1) = $zeros/@sqID)
or (concat(@h +1 ,'/', @v -1) = $zeros/@sqID)
or (concat(@h ,'/', @v +1) = $zeros/@sqID)
or (concat(@h ,'/', @v -1) = $zeros/@sqID)
)] " />
I bet we would be immensly helped by txNamedAttributeStep since we'd in all the
attrsteps inside the concat would just end up calling getAttr rather then
moveToNamedAttr etc.
The @sqID call is trickier to optimize though since that could potentially
return a nodeset with many nodes, in which case we actually do need to walk to
the attribute node (note how comparisons with nodesets with multiple nodes work).
What we could do is to figure out the case when the nodeset only contains one
node. But that would be very tricky in this case and would have to be done at
runtime. I.e. we'd have to check that the variable only contains one node and
that that node does in fact have an sqID attribute. This seems like it in most
cases would just be a waste of cycles compared to actually just getting the nodeset.
Hmm.. one possibility is to have some function that returns either a nodeset if
the result contains multiple nodes. Or returns a string if the result will only
contain one node. Something like Expr::evaluateToMultiNodeSetOrString. Again,
might be adding a lot of code for a small gain.
A better approach is probably to make the @sqID evaluate as quickly as possible.
One thing to do is to have an early-bail in moveToNamedAttr that calls hasAttr
and bails if that returns false. Another thing would be to add
nsIContent::IndexOfAttr(PRInt32, nsIAtom*), that would help enormously but is
fairly xslt specific.
We could also of course try to detect that there are several identical
"$zeros/@sqID" calls and break that out and just evaluate once. This might be
hard to do without spending way too much time comparing expressions though. And
you could argue that in the most cases the author will probably do that if he
suspects that the expression is slow to evaluate so we'd rarly actually find
duplicated expressions worth bothering about.

This chunk is actually something that calls for a different optimisation, that is,
evaluating $zeros/@sqID, @h and @v once in advance and use pseudo vars inside
the predicate. I'd future thoughts like this. Let's focus on making the easy
parts fast first.

(In reply to comment #3)
> Some of this is bug 208172. But then we may want to have a good nonCOM method
> to get the local name of an atom. I did an lxr search on call sites to
> GetAttrNameAt, and far from all use the prefix, so having a non-addref
> only-localname and namespace ID method might be worth it.
How about we make GetAttrNameAt() return a struct by value and that struct would
hold weak nsIAtom pointers. No refcounting needed at all then, and no separate
API for the case where the caller wants one or both of the atoms.

That idea is as simple as it is brilliant :)
We even have such a struct: nsAttrName. We could just change the signature to be:
const nsAttrName* GetAttrNameAt(PRUint32 aIndex);
That should be almost as fast as IndexOfAttr and will benefit all callers of
GetAttrNameAt.

Lol, sorry, i missed that. There's too much going on these days...
So that along with bug 208172 will probably make a big enough difference here
that we should just do that and get a new profile and reevaluate once that is done.

Hmm.. wanted to get one more idea in here just so I don't forget. We could
change txXPathNode to instead of holding an index for the attribute hold an
atom+nsid. This way a @foo step would simply be calling HasAttr and setting the
atom+nsid. The downside with this is that it'll make the class bigger and cause
more refcounting when we're standing on an attribute. And we'd have to have some
other way of getting all the attributes for an element for @* type steps.
Actually, the more I think about it the less I like it. If it turns out that
@foo steps are still perfcritical after all else is done we should just add
nsIContent::IndexOfAttr.

I don't know if my commenting and voting for this bug would help, but I do hope it will. I too have been bitten by Firefox's taking some time to process the XSLT.
http://blogs.ebusiness-apps.com/dave/?p=45 shows a benchmark comparison between Firefox and IE for Ajax related data transformations.
And the following suggests on the neglected state of affairs of the XSL engine.
"It is because, the internal XSL engine (transformMix) didn't got any
improvement since Keith from Mitre gave it to the Mozilla project."
- http://www.biglist.com/lists/xsl-list/archives/200603/msg00086.html
And I have personally tried out Opera 9.0 Beta, which provides XSLT support, and it does perform significant better than Firefox. It is perhaps on par with IE, if not better; but I haven't gotten around to quantify that.
I hope that something can be done before Firefox 2.0 gets out. Even a partial or a quick fix would be something to be grateful for.

Someone claiming that the xslt engine hasn't gotten any improvments since Keith left the project does not know what they are talking about at all. Such a claim is just dumb.
Performance is steadily increasing by a lot. We still have the occational patch that doubles performance of certain common stylesheets (like the docbook one). So I hope your trying with a recent build when you complain about performance.
Frankly, in most cases performance is good enough that the xslt is no longer a bottle neck. The benchmark you are pointing at doesn't seem to be pure xslt vs. pure xslt, but adds scripting into the mix, no?
If you have problems with a stylesheet i strongly encourage you to file a new bug and attach your stylesheet. There is no lack of ideas of how to improve performance. There is only a lack of time and matter of prioritizing. The question is not what can we improve, but what is worth improving. That is much easier answered if we get good stylesheets to profile.

(In reply to comment #16)
Sorry, my mistake.
I didn't check when the transformation was completed and when the new document fragment was appended to the document. The transformation itself was fast, but appending the document fragment to the document was where the performance took a hit. At least for my application.

(In reply to comment #19)
> So what are we timing here? The time to the first alert? Or something else?
>
Technically the bug report is about the time between the 2nd and 3rd alert boxes. The 2.0 RC version of FF posted last week has a duration of about 8.5 seconds while IE 6.0.29 has a duration of less than 3 seconds.

> the time between the 2nd and 3rd alert boxes
Ah, ok. I didn't realize the third box had this time in it, because its top was always cut off over here.... ;)
So when I test Firefox 2.0RC2, I get times in the 20.7 second (plus/minus 0.2 seconds) range on my computer (P3-733).
When I test today's trunk Firefox nightly (think Firefox 3 pre-alpha), I see times in the 5.5 second (plus/minus 0.3 seconds) range. So if things scale linearly to your machine, you should be seeing times comparable to IE on trunk.
I'll attach a trunk profile of the time between the second and third alert.

Created attachment 242358[details]
Profile with current trunk
Highlights:
34239 hits under txMozillaXSLTProcessor::TransformToFragment (that's about 4-some seconds).
This seems to break down (digging down a bit; all hit counts are "under", not "in") as:
9023 RelationalExpr::compareResults
8711 txNamedAttributeStep::evaluate
6979 txNodeSet::addAndTransfer
2198 txAExprResult::Release
2384 txSingleNodeContext::recycler
(the Release() and recycler() calls have are about 50% of the time "in", btw).
compareResults spends about half its time under txXPathNodeUtils::appendNodeValue and the rest in things like nsSubsting::SetLength, etc.
txNamedAttributeStep::evaluate spends almost a third of the time under txXPathTreeWalker::moveToNamedAttribute (lots of calls to GetAttrNameAt, looks like). txResultRecycler::getNodeSet is a good bit too.
txNodeSet::addAndTransfer seems to be mostly under txNodeSet::add, which is back to txXPathNodeUtils::comparePosition. Lots of auto void array construction, IndexOf calls, etc.

It definitely looks like optimizing '//foo' type expressions will help a lot. That should decrease the last three numbers in the previous comment since we'll do a lot less node shuffeling.
One thing that is good news is that it seems like the recycler is working, because we spend next to no time in malloc.
Other than that nothing immideately jumps out at me. Possibly switching the recycler and some other code to use nsTArray since it's more inlined and some of the array operations might be faster.