With this one, the system enters suspend mode, but upon trying to resume, the system is unresponsive: backlight doesn't come on, there is no display, keyboard is unresponsive, and system doesn't respond to pings on the network.

This failure is due to the lack of a sufficiently large virtual address range available in the vmalloc area to satisfy what the driver is asking for at suspend to store GPU objects. A quick Google search shows a lot of reports of nouveau being a vmalloc hog for normal operation, neglecting suspend, so it may be doing itself in here. Or there could other drivers contributing to heavy vmalloc usage. I don't know if there are any tools to analyze how the kernel vmalloc space is being used; I'll look to see if I can find any.

Realistically the only options are probably to either reduce vmalloc usage or increase vmalloc size. The simple solution is to follow the advice of the kernel and pass vmalloc=<size> on the command-line (maybe start with 128M and go from there). Another (much more complicated) potential solution would be to see if it's possible to unmap the driver mmio space at suspend and remap it at resume.

1- I tried increasing vmalloc (went as far up as 256M). What I see then is the system freezing when I do pm-suspend. It doesn't go into suspension, but the keyboard and mouse stop responding, only remaining option is to reboot.

2- Same behavior when using the 64-bit kernel (in fact, a whole new 64-bit installation), it freezes, becomes unresponsive and I have to reboot.

With these proprietary drivers, the system successfully suspends, and comes back from restore with some garbling on the screen, what I did was maximizing the terminal (F11) and that basically "sweeps" the display and it's usable, although the background itself turns white. So it's better and possibly usable, it might need some work done, but more importantly, confirms your diagnosis about nouveau keeping the system from successfully suspending.

/proc/vmallocinfo shows all the vmalloc mappings. We could take a look at that to get an idea of what's consuming the address space. The VmallocTotal, VmallocUsed, and VmallocChunk fields in /proc/meminfo would also be useful to look at.

Do you get any kind of panic message or anything else when the system freezes after you increase vmalloc? Try running pm-suspend from within vt1 to see if anything shows up on the screen. You might also try using magic sysrq when it's frozen, try alt-sysrq-p to get a dump of the current task's state.

That's over 80 MB from nouveau (and I omitted several others of a couple of pages each). Some other areas are using sizable amounts as well, the most notable of which is audio. meminfo shows the amount of vmalloc used as 138284 kB, and if it's anywhere near that on a 32-bit install it's not hard to see why it might start having problems.

OK, so I reinstalled the 32-bit kernel, however I'm unable to go to a vt to see if it gives a panic message while suspending, when I press say ctrl+alt+f1 the graphical cursor disappears but the rest of the desktop stays visible, looks like the display didn't get reset to text mode, so I can't see what I'm typing (and certainly no debugging messages). if I press alt+f7 I "go back" to graphical mode, the cursor reappears and the screen is responsive again.

The only way I found to get to a console was to use xforcevesa nomodeset, though of course that probably changes things in other ways (like not using the nouveau driver). The system enters suspend and of course, upon resuming, the screen is blank (with no backlight). However other than that, the system appears to have recovered, as I was able to ssh in and recover a dmesg file I'm attaching.

That's really strange that you can't switch vts. We may be looking at more than one bug here. Does the same thing happen if you run 'sudo chvt 1' in a console?

Are the meminfo and vmalloc dumps from a boot with xforcevesa nomodeset? Because it really isn't using much of the vmalloc area, so I'd be surprised to see the vmap failures from the original "won't suspend" problem in that situation.

It might be best to try to attack the various issues one at a time. Starting with the vmalloc problem, I guess the first thing is for you to tell me whether the meminfo/vmalloc information you just supplied is from a "xforcevesa nomodeset" or not. If it is, I'd be interested to see the same information with the default kernel command-line.

The next step is probably to test the 2.6.38.3 mainline build since that's closest to natty's kernel and will tell us which direction we need to start looking to track down the regression. You can grab this build at:

I will test the mainline build you suggested and report back as soon as I have something.

Finally, for the vt switching problem, I ran the command you suggested and I had the same behavior (i.e. can't switch to the vt). However, I've seen this on one other system (a Dell Vostro 3400 with dual graphics which is using the intel driver), so I think I'll do some more testing about that and report it as a different bug.

I tested the 2.6.38-3-natty mainline kernel. The system boots and is usable, enters suspend mode, but upon attempting to resume is unresponsive, the backlight doesn't come up, no display, keyboard is unresponsive and system doesn't respond to pings on the network. At some point I saw capslock flashing but it stopped after a bit.

I also tested v2.6.35.12-maverick, with this kernel the system boots but when trying to enter graphical mode becomes unresponsive, no keyboard, network ping or display (screen is black). Maybe some Ubuntu-specific modifications are what enabled the actual shipped Maverick kernel to work?

Okay, that vmalloc information is more like what I expected. Space is pretty tight.

It's interesting that the 2.6.38.3 mainline build doesn't have the same problems with suspend. So to start we can try to figure out what's different there that's making suspend fail in natty. Can you grab the vmalloc and meminfo dumps with the mainline kernel so we can check if anything there is significantly different? And I'll scan our patches on top of mainline to see if anything jumps out as potentially related.

The vmap differences can be almost completely attributed to a patch we're carrying to the snd-hda-intel driver to increase the audio buffer size to "improve the audio experience." I'll look into it but I'd guess there's a good reason for the patch.

Moving on to the next problem. I'd like to see what's going on with the suspend hang when you increase the vmalloc size, but having working vt's might be helpful. Did you file a bug for the vt problem?

Since you don't have vt's, you could try booting in recovery mode (boot with no_console_suspend) and run pm-suspend to see if it works there and if you get any interesting output. You can also try some of the steps in the following wiki pages to see if they yield anything useful.

I also tried no_console_suspend in combination with vmalloc=256M, when I issue pm-suspend from a terminal (still can't switch to a vt) the system "freezes" (same behavior as in #6). I see no useful messages :(

Thanks for testing. If that PCI id is accurate then it seems we're still looking for some kind of problem with the nouveau driver. When you see the hang, is your caps lock led blinking?

I think my wording for one of my suggested test cases was confusing. I think it would be useful if you could boot into recovery mode by holding left-shift when booting to get the grub menu and selecting the "recovery mode" boot option. Also modify the kernel command-line when you boot into recovery mode to include "vmalloc=256M no_console_suspend". This should boot you to a text-mode terminal where you can run pm-suspend without the graphical UI in the way. Chances are that you'll either get working suspend-resume or won't see any useful output, but it's worth a try.

I repeated the suspend to get the system to hang again, there's no flashing caps lock :(

Also, I tried your suggestion to boot in single-user text mode. There seems to be some problem initializing the display, I'm attaching a picture of what I see, and nothing I type shows up. What's odd is that the system is "alive", again, I can blind-type and I can even issue commands to reboot the system, bring up X (which comes up just fine) or make it suspend in that state. Of course, this is not useful as I still can't see actual text on the console :( I tried using the nosplash kernel parameter but there was no change in the display behavior.

Boy, that machine just has all sorts of problems with graphics. Kind of makes it hard to decide what to pound on first.

I checked, and we aren't carrying any patches to nouveau in the natty kernel. So I'm a bit puzzled why you see different behavior with natty (with vmalloc size increased) versus mainline unless it's due to something external to nouveau. Looking closer at the pm_trace it seems that it really only traces device resume and not suspend, so it's a bit interesting that pm_trace showed anything at all. Makes me wonder if something went wrong elsewhere in suspend and then the machine hung in the nouveau code while trying to back out of the failed suspend.

Probably the most effective thing to do at this point is to open upstream bug reports against the issues you see when running a mainline build. If you don't mind filing the bugs yourself it might be easier since you actually have the hardware, otherwise I can do it. The appropriate location for the upstream bug reports is:

No updates. My suggestion in comment #24 was to start working with upstream on the issue. I offered to help with filing the bug upstream if needed but haven't seen any response from Daniel. If possible I think it's more efficient for him to interface directly since I don't have the hardware, but I'm willing to be an intermediary or at least file the initial bug report.

Expected result:
- The system enters sleep mode and upon pressing the power switch, resumes successfully

Actual result:
I tried this with the following kernels (all mainline):

2.6.39-999.201104080911 - The system enters suspend mode, but upon trying to resume, the system is unresponsive: backlight doesn't come on, there is no display, keyboard is unresponsive, and system doesn't respond to pings on the network.

2.6.38-3-natty mainline kernel. The system boots and is usable, enters suspend mode, but upon attempting to resume is unresponsive, the backlight doesn't come up, no display, keyboard is unresponsive and system doesn't respond to pings on the network. At some point I saw capslock flashing but it stopped after a bit.

v2.6.35.12-maverick mainline kernel, with this kernel the system boots but when trying to enter graphical mode becomes unresponsive, no keyboard, network ping or display (screen is black).

Here is some relevant information on the system, please let me know if any more tests are needed.

With this kernel I'm still seeing faulty resume behavior as described for 2.6.38-3 in comment #15 (it enters suspend but fails to resume). However, I am able to switch to a VT when the system is working (i.e. before suspend). So it's probably not worth filing an upstream bug for the VT issue as I'll likely be told that it's been solved.

I could file an Ubuntu bug for the VT problem, as it's still present on the latest proposed Natty kernel (2.6.38-9) but NOT, as I mentioned, on mainline.

Let me know if it makes sense for me to do this to also keep track of the VT switching problem.

I'd suggest testing the latest .38 stable mainline build (2.6.38.7) to see if it's been fixed there. If it has it will filter into natty eventually. If it's not fixed go ahead and file an Ubuntu bug and we'll try to get the fix into natty. Thanks!

I installed Oneiric from the 2011-07-15 image, and retried the sleep test on this machine.

It now enters sleep, and the power light starts blinking, after a while I press the power button to wake up, but then the system is unresponsive, no backlight, no display or response to keyboard (caps lock doesn't react) and no network connection is active.

So behavior has changed but is still bad :(

I could try to install the proprietary nvidia drivers , not sure if those are working with Oneiric yet.

I can also try the power management debug tools from cking, though as of two days ago, they were known to not work on Oneiric.

I've got the same graphic card on Dell studio XPS 1330, and exactly the same frozen configurations after a wake up from suspend-to-ram (no problem with suspend-to-disc).
I tried with several older kernels (debian sid), and I had everytime the same bug.

The /var/log/pm-suspend.log do not contain any line corresponding to the wake-up.

If I run just pm-suspend, without systemtap debugging, the system behaves as originally reported, apparently entering suspend mode, but when I press the power button it fails to resume and just becomes unresponsive.

I also tried the latest mainline, 3.2.0-rc2 kernel from Ubuntu's mainline repository. Behavior is the same as originally reported. So this bug is still an issue.

I'm attaching dmesg, it was a bit difficult to obtain because the system is still unresponsive after a resume, so I'm unable to fetch the file.

However if I run the suspend thing using systemtap, it manages to detect the failure to suspend and doesn't die. This dmesg was obtained after running the systemtap s3test script. Hopefully it'll contain something useful.

I also updated the freedesktop bug with our latest tests on kernel 3.2.

Daniel: Sorry, I should have been more specific. I was intending to ask for the trace from the systemtap case, since I was looking at the messages from the systemtap logs.

It does look like the same or an extremely similar failure -- no sufficient address range in the vmalloc area to satisfy the allocations nouveau is making during suspend.

But I don't think what you see with systemtap is the same bug as what you're seeing without it. The bug you see with systemtap is recoverable (you've seen it recover in fact) and happens during suspend. The bug without it is on the resume side and isn't recovering. As a result I don't think systemtap is adding any useful information for this bug.

Have you tried a serial console to see if you can get more data during resume? I assume you don't have a serial port on the machine, but sometimes you can still get data with console on a USB serial adapter. It may or may not work, but if you have access to an adapter it might be worth trying.

Marcin: I take it you are referring to failure to suspend due to vmalloc failures in the nouveau driver? Indeed, Daniel has verified that this problem can be avoided by using a 64-bit kernel or by passing vmalloc=128M to the kernel (see the reference bug in Launchpad for details).

When that issue is avoided the system does appear to suspend successfully, but then hangs when resuming.