During the 8.x cycle we've introduced several known performance regressions compared to Drupal 7, which we need to resolve before release so that Drupal 8 isn't slower than Drupal 7.

This doesn't mean every single regression needs to be individually optimized away - in some cases it might be necessary to do that, or trying to micro-optimize it won't be worth the extra effort.

However there are places where we've introduced something with the intention that it will allow us to make performance or scalability enhancements elsewhere (like blocks as sub-requests for example), and introducing the performance regression without getting the nice performance feature at the end of it puts us in a not very happy place.

Opening this as a meta-issue to try to track those regressions as they get committed, along with the issues that are attempting to resolve them - making this a critical task since I'm not prepared to release Drupal 8 with obvious performance regressions compared to Drupal 7, it was bad enough from 6 to 7 and we shouldn't do that again.

Criteria for critical performance issues

A performance issue is critical by itself if some of the following are true:

There is concrete performance issue identified by profiling (MySQL, PHP or browser equivalent) and a viable plan to resolve it

It can't be committed to a patch-level version (8.0.0 => 8.0.1)

Over ~100ms or more savings with cold caches and would have to be deferred to a minor version

Over ~10ms savings with warm caches and would have to be deferred to 9.x

Over ~1ms or more with the internal page cache and would have to be deferred to 9.x

Gets measurably worse with lots of contrib modules or large data sets (e.g. non-indexed queries) and would have to be deferred to a minor version

Other specific issues at branch maintainer discretion

Working spreadsheet

High priority issues

Several new core APIs have lost optimizations from Drupal 7 and earlier where multiple objects could be loaded with a single request from the database/cache (i.e. compare CMI to variable_init()), the following issues attempt to add some kind of multiple load/pre-loading/CacheCollector support to those systems:

All are related to performance. Is it possible we can reduce one or more to a "major" task (only 83 of those atm) and hinge them off this critical, rather than taking up 4 slots in the critical task queue?

#10 was viewing the front page with 5 node teasers. I just ran some numbers again (ab -c1 -n100) with no front page content at all (just hitting the front page immediately after a Standard profile install).

7.x HEAD: 61ms
8.x HEAD: 222ms (+264%)

If someone gets a substantially different ratio on a different machine, please share.

- I tried it again today, and got 230ms. Not sure if the extra 8ms is due to HEAD changes since #11, or random factors on my computer.
- If in _node_add_access(), I hard code a return FALSE at the top, that drops it down to 185ms. That's a way to isolate the effects of #1979094: Separate create access operation entity access controllers to avoid costly EntityNG instantiation and let us look for what other causes of regression there are.
- If in my settings.php, I uncomment the $settings['class_loader'] = 'apc'; line, that drops it down to 166ms. Yay for there being an easy way to remove autoloader inefficiency!
- 382 PHP files are loaded to show an anonymous home page with no content. And that's even with the early return in _node_add_access() mentioned above. Yowza! I thought that simply the require on that many files was a huge factor. But it turns out not to be. Changing my apc.stat configuration to 0 was able to shave off 5ms. And timing a script that just does a require on those files turned out to only be another ~10-20ms. I need to rewrite and rerun the script to get a more precise number, and will post that when I do, but the good news is that simply loading all that extra Symfony code and OOP Drupal code isn't where our biggest problems are.

@effulgentsia: I still can't reproduce your numbers, not even remotely. Can you provide some more information about your setup?

With uid 1 and no nodes on the frontpage, I get "Executed 137 queries in 10.14 ms. Queries exceeding 5 ms are highlighted. Page execution time was 134.94 ms. Memory used at: devel_boot()=4.61 MB, devel_shutdown()=13.81 MB, PHP peak=14 MB." That varies a bit, but not a lot. ab on frontpage is 96.918ms, a 404 page is 54ms.

- Is this a laptop, with/without power plugged in? (I have huge differences with and without power, @dawehner for example didn't)
- Is xhprof/xdebug enabled?
- How many queries, how long do they take? I do have a somewhat optimized mysql configuration and my queries are quite fast, given the number of them.
- When you test the front page, that means we still have to load and execute the view, and that's a considerably higher overhead than the old node_default_page() which was just a single query. A lot of that is one time overhead and is less and less relevant as you display more views/content. Might make more sense to compare a page that hasn't changed that much, e.g. 404.
- I'm also not seeing a big difference when I add the return FALSE to node_access(), possibly that's due to the entity field definitions cache that was commited today.
- Can you check how #1786490: Add caching to the state system and #1971158-15: Follow-up: Add loadMultiple() and listAll() caching to (cached) config storage affect those numbers? The second one only gets interesting with a lot of config files and configurations so you will probably not a see big difference with that but it's huge with real, large sites.

We could also compare Drupal 6 + views with an empty node view, to Drupal 7 front page, which would allow us to quantify more of the non views related changes (of course views is changed also in D7, but I think the performance profile is probably still pretty similar).

I have also used the login form as a comparative benchmark in the past - it does a bit more work than a 404.

But if we don't fear the results of a realistic comparison, we really should really define a number of representative configurations (e.g., first impression, basic site, typical site, feature-rich site, data-intensive site) plus a few targets plus two or three concurrency levels, and then start profiling all of these both continuously and automatically.

If we don't define fair profiling configurations ourselves, slightly simplistic comparisons that don't take D7 contrib into account, like the ones by Yannis or fgm, or by completely unexperienced people, will make performance parity an impossible goal, and might finally hurt our reputation regarding performance.
Even more now that Symfony2 hit the headlines for being an exceptionally slow framework. Would be nice to demonstrate that we're selectively leveraging "the best" from different PHP frameworks and are not bound to be even slower.

In the end, I'd really like to see a graph that nicely displays how performance improves from week to week, and in a few cases the D8 configuration would outperform the D7 one, in others it would stay behind, but altogether it would remain comparable. That should be our goal.

I'm currently working on automating Drupal 8 builds via Chef and Vagrant, and I should be done with that today or tomorrow. At that point I'll be building out a basket of representative D7 vs. D8 performance test sites over the coming weeks. My current targets include:

@Eronarn:
That really sounds awesome!
Out of the blue, I can't exactly say which configurations would be the most relevant and correct, but we should have at least one multilingual configuration that extensively uses Entity translation, i18n and all the additional stuff we don't need anymore in D8.

Generally, we should leverage some of the more popular contrib modules that have been included into D8 core or which aren't necessary anymore. Instantly, these come into my mind:
WYSIWYG + CKEditor, Date module, Entity API, Entity reference, Entity Translation, Views, Profile2, Context, Administration Menu, Diff, RESTful API... what else?

Thanks for the reminder about translation. That definitely wasn't on my radar, but is an important consideration. I will include a frontend performance monitoring component of this, so it should also be feasible to monitor node editing performance, including WYSIWYG.

Has anyone heard of Drush Make being ported to D8? Drush itself is fine, but the latter doesn't seem very functional right now. I could just tarball an entire site, but it'd be nice to something more easily versioned that I can point people to.

It would be great to have a benchmark target (or targets) that could be used for different benchmarking and instrumentation activities. I think the "Dynamic site" is probably the biggest win, since it is the kind of site that causes most scaling challenges (in addition to just page load performance) Brochure type sites rarely have scaling challenges in my experience (although .

Given the rate of D8 development, I was assuming a script to configure the content structure and populate dummy content is pretty much a requirement - I doubt a database snapshot will last for long before schema changes break it. Not sure I understand using drush make with D8 yet - are there sufficient stable & API chasing contrib modules to make this worthwhile?

I'd prefer scripting using Drush Make plus some post-processing setup script leveraging Drush because that means the build is more standardized and easier to contribute to. It's not a requirement by any means, just intended to make it easier for people other than me to contribute to the build (pretty annoying to do if it's a huge git tree with all of core in it). If anyone has other suggestions, I'm totally open.

I don't think there are many ported contrib D8 modules yet (it's a pretty miserable process - I did this for Tracelytics/TraceView for DrupalCon and it already needs extensive rewrites). However, I want this to be something that will be run over the course of several months (probably will start off with the alpha releases but maybe move to nightlies if enough people are interested in setting up nodes), and I'm hopeful that we'll see more D8 contrib alphas and betas by then.

EDIT: Note that drush already works with cron, devel generate, etc. in D8. So that part of the scripting will be trivial.

Awesome - totally agree that a scripted setup is what we need - probably devel generate will need some love. I think the make file will pretty much just be 2 lines that point at core (for D8 anyway, at least to start with), but it will do the job just fine :)

I wonder if it would be best to split performance test targets out as a separate issue (if there isn't an existing one), since this is supposed to be meta.

Like the memory usage?:)
(wasnt able to disable xdebug - had some nice segfaults once i did:P buggy 2.4.4 still so this should affect stuff)

I just did this for fun, mostly to check php5.5 and zend opcode cache..i found it interesting, so i posted it. i know that we cant actually compare vanilla d8 and d7 and also tests should be run with some content.
i am only posting it cause i found the memory usage interesting (which means that oop and autoloading stuff seems to work)

With Apache 2.4.4, I'm guessing that the req/s you're seeing here isn't Drupal delivering the page, it's the built-in cache of Apache 2.4 (similar to nginx or Varnish). I know different servers will get different results, but there's no way Drupal/PHP/MySQL is going to deliver 7800 req/s, even with Drupal's page cache.

Instead of, or in addition to APC, perhaps Zend Opcache should be tested, as the php team included opcache instead of APC into php core (meaning noone will use APC after 5.5 :D )https://blogs.oracle.com/opal/entry/using_php_5_5_s
I already isntalled php-opcache easily with yum for example for php 5.3.

ISTR seeing on php-internals discussion about how the APC allocator was specifically optimized for its opcode caching tasks, and was not a good fit for user caching because of the difference in cache access patterns. The currently existing commits do not appear to have changed the logic, focusing on the removal of opcode-related features and general cleanup.

@Damien Tournoud: the stream wrapper idea is nice in theory, but implementing one means lots of low-level methods to implement, although most of them are individually rather simple. But the very fact that they are needed suggests lots of inter-method (hence user-space) calls.

This is a much more involved interface than a class loader, and I do not see what other parts of our code base could make use of this particular name space to justify the extra code involved. Of course, as always, this would need to be benchmarked against the alternatives.

In the interest of keeping this issue focussed so we can work from it, I went ahead and removed some issues from the summary that were either no longer relevant or not directly related to a performance regression in d8.

FYI, a small D8 Performance team has started meeting weekly to make progress on these issues. We use this Google Doc to help us prioritize, assign, etc. You can see issues that we recently fixed in that doc. If anyone wants to join the meeting, please contact me.

A more recent performance comparison using Drupal 8 alpha10. This is includes a D7 standard install that is running "breakpoints, ctools, ckeditor, contact, edit, libraries, entity, views, views_ui" modules for a more accurate comparison.

- Those numbers claim that non-cached pages as anonymous user are slower than they are for an admin, that seems weird? If that is really the case then that sounds like there might be a bug.
- Did you actually replace the frontpage with a view in 7.x? Just enabling views isn't enough, if you want to do a fair comparison, you will also need to have a view on the frontpage instead of the much faster 7.x node listing.
- Did you have any content on that site? I think that with the render cache now enabled for 8.x this might give 8.x a chance to catch up, especially when #2099131: Use #pre_render pattern for entity render caching is done. Will only help when the page cache is disabled of course.
- For the page cache case, I started two issues in Szeged that allow to return from page cache without having to load configuration. that is #2228215: Remove module check in DrupalKernel and #1576322: page_cache_without_database doesn't return cached pages without the database.

Also, testing with concurrency is problematic, because it's heavily depending on how many CPU's and so on you have and many requests you can do in parallel, you're testing your system not Drupal. I suggest you don't do parallel request as that will give you a much better idea how long it takes to return one request.

Those numbers claim that non-cached pages as anonymous user are slower than they are for an admin, that seems weird? If that is really the case then that sounds like there might be a bug.

It does not suprises me, it's already what's happening in Drupal 7 in a lot of sites I had to profile, because admin users actually won't trigger a lot of access checks that anonymous or normal user would.

You're very unlikely to get a locking/concurrency issue with a read only test though. Needs authenticated users, posting of forms etc. to create the situations that can result in actual scaling issues. Also sometimes larger data sets etc. Can't use ab for that.

I've started bumping some individual performance issues to critical. I think we need some kind of criteria to define 'critical performance issue'.

Not differentiating between front end and PHP here for now, and as with any issue we need to balance impact vs. disruption.

Unless something is atomic-level in that it prevents actual usage of the site due to slowness or memory requirements, it wouldn't normally be critical in itself, however Drupal 8 performance is currently significantly slower than Drupal 7, and this is due to lots of major issues that combined are covered by this meta.

We know from various sources that 100ms affects user perception and we also know from experience that Drupal core rarely has individual issues that result in 100ms of saving just by themselves. Therefore we need to ensure that enough 'major' issues are fixed prior to 8.0.0 and/or can be fixed soon afterwards, otherwise this meta stays open indefinitely with no definite end point.

I think we should consider promoting a performance issue to critical in its own right only if the following is true:

There is concrete performance issue identified by profiling (MySQL, PHP or browser equivalent) and a viable plan to resolve it

It can't be committed to a patch-level version

Over ~100ms or more savings with cold caches and would have to be deferred to a minor version

Over ~10ms savings with warm caches and would have to be deferred to 9.x

Over ~1ms or more with the internal page cache and would have to be deferred to 9.x

This ensures we don't end up with known, unresolvable, high-impact performance issues for the entire 8.x cycle, and focuses attention on those vs. ones that could be fixed easily in an early patch release.

Closing this issue should probably still be based on some kind of comparison against 7.x. Should encompass most or all of rebuilds/module install, cold caches, warm caches, authenticated, anonymous, light and heavy pages and the internal page cache. We don't need to be equivalent everywhere but we should know what the status is otherwise it's impossible to know which critical performance issues might be lurking.

Yes that's missing from #65, we should definitely add "gets measurably worse with lots of contrib modules or large data sets and would have to be deferred to a minor version" as an extra bullet point. Any non-indexed query (except under /admin perhaps) would to fall into that for example.

A big performance leap forward over D7 would be examples and tutorials explaining best practices. D5, 6, and 7 had some excellent module developed with best practices at the start but the word did not spread and many D7 modules, event recent ones, present real problems.

The documentation would not have to be big, just point to examples in some of the optional core modules. The example code could contain references on the doc system to generate links into the current part of code. Can the comments have a #performance type tag. @performance?

I am happy to put the odd day here and there writing docs but the maintenance would be a real pain without something to tie the documentation and example together.

You should check opcache.max_file_size option. This option can set a maximum file size to cache. Thus, big files can be skipped by opcode cacher. However, it defaults to 0, meaning all files will be cached.

Next option to check is opcache.max_accelerated_files. For big projects with Twig and annotations default value 2000 is not enought. Consider to increase it.

And the last one is opcache.memory_consumption. I noticed, that after reaching this limit, opcache won't add new items into the cache. So, increase it to 256M or 512M.

One test not mentioned here is of the cache tagging system. What happens when the cache system searches the cache tag strings for a tag? Would it be better to have a separate tag index? A code sprint day could also teach performance measurement.

So... given we're tracking critical issues related to cacheability, performance regressions, etc. already, I'm a bit confused on whether we still need this issue, and if so, what is the path to get it to fixed?

That being said, yes, this issue is definitely *very* meta, because we don't have concrete performance targets. I.e. we don't have a target of e.g. 50 ms for the front page as an authenticated user, 150 ms for all admin pages, etc. If we'd have such concrete targets, this would be less "meta". But we've never set such targets, so I'm not sure if we'd want to start doing that now.

This issue comes down to what is acceptable performance for 8.0.x to ship with, and that hasn't really been defined.

Obviously anything the same or better than 7.x (or 6.x) we'd ship with.

In practice more or less everything is going to be slower.

Then it comes down to:

1. How much slower?

2. Are there any mitigating factors?

For example if internal page caching was 30% slower, but had a 10000% higher hit rate in a realistic load test scenario due to cache tags and cache_clear_all() nukage then that trade-off is pretty good. Or if 8.0.x is faster on PHP7 than 7.x is on PHP 5.4 on the same hardware, that also helps.

The outstanding issues in #87 (and X number of majors), once resolved, get us to an 8.x baseline for known improvements relative to current (and past) 8.0.x.

geerlingguyCreditAttribution: geerlingguy as a volunteer commented 16 May 2015 at 03:45

@Wim Leers - Will do; I'll rerun the tests in a bit, and update the comment above when that's done.

I've re-run all the tests using only concurrency=1, and the spread seems to be about the same as using -c 10. In general, it seems the numbers are fairly consistent (just more req/s) up to -c 40 or so... at least on this particular VM/configuration with Apache.

So of all the criticals, this one seems the most vague and least clear on what to do next and when we are done.

webchick summed it up pretty well there. I'd like to share my point of view, having led a Performance team for the past 1.5 years (#55).

IMO, we should demote this issue to Major. This issue achieved its main goal, which was to decide on criteria for prioritizing a Perf improvement as Critical. Those criteria are in the IS. Based roughly on those criteria, we currently have 7 critical issues tagged with Performance. Those stand on their own merit as Criticals, and I see little benefit to additionally keeping this Meta as Critical.

There are occasional calls to keep this open as a general check for D7 versus D8 performance regression. IMO thats a misguided ideal:

This issue was never about D7 => D8 comparison. It was about prioritization rules for potential speed improving issues

Any comparison between the two platforms is arbitrary and subject to a tiresome bikeshed.

Every time I profile a page, I still find actionable critical performance issues, such as #2494987: [meta-6] Reduce cold cache memory requirements (at least some of the sub-issues should probably be independently critical too). So while this issue in itself is not that useful as a meta, we still have critical-but-undiagnosed performance issues in core.

I could see putting this somehow into the pre-RC checklist - since we won't have a final idea what things look like until we're there. But performance is so bad in many places that we risk getting stuck with unresolvable issues due to API changes necessary to fix them.

webchickCreditAttribution: webchick at Acquia commented 29 May 2015 at 04:26

We talked about this on the core committer call today. I might get some details wrong, but here's what I remember:

1) We still do need a critical task (whether it's this one or #2470679: [meta] Identify necessary performance optimizations for common profiling scenarios or whatever) to do a "deep-dive" profiling and figure out where D8 is slow, especially areas that would necessitate a BC break to fix. We (well, everyone but catch) approved a D8 Accelerate grant for catch to do this work, hopefully next week. This doesn't negate the findings of the DevDays sprint, but we've learned some things since then, and also BigPipe/SmartCache is further along.

3) Once both of those are done, we should be able to file/elevate issues individually as critical where it makes sense, and close this meta out. Goal is to do that on or before June 17 (our next core committer call).

That being said, yes, this issue is definitely *very* meta, because we don't have concrete performance targets. I.e. we don't have a target of e.g. 50 ms for the front page as an authenticated user, 150 ms for all admin pages, etc. If we'd have such concrete targets, this would be less "meta". But we've never set such targets, so I'm not sure if we'd want to start doing that now.

Shall I still merge that other issue with this one, and migrate all child issues from there, to make this issue more actionable?