Here's suspend-to-{RAM,disk} combined patch for2.5.17. Suspend-to-disk is pretty stable and was tested in2.4-ac. Suspend-to-RAM is little more experimental, but works for me,and is certainly better than disk-eating version currently in kernel.Major parts are: process stopper, S3 specific code, S4 specificcode. What can I do to make this applied?

Pavel

--- clean/Documentation/driver-model.txt Sun Mar 10 20:06:28 2002+++ linux-swsusp/Documentation/driver-model.txt Fri May 3 00:08:35 2002@@ -52,7 +52,8 @@ Each bus layer should implement the callbacks for these drivers. It then forwards the calls on to the device-specific callbacks. This means that device-specific drivers must still implement callbacks for each operation.-But, they are not called from the top level driver layer.+But, they are not called from the top level driver layer. [So for example+PCI devices will not call device_register but pci_device_register.]

This does add another layer of indirection for calling one of these functions, but there are benefits that are believed to outweigh this slowdown.@@ -60,7 +61,7 @@ First, it prevents device-specific drivers from having to know about the global device layer. This speeds up integration time incredibly. It also allows drivers to be more portable across kernel versions. Note that the-former was intentional, the latter is an added bonus.+former was intentional, the latter is an added bonus.

Second, this added indirection allows the bus to perform any additional logic necessary for its child devices. A bus layer may add additional information to@@ -225,7 +226,6 @@ It also allows the platform driver (e.g. ACPI) to a driver without the driver having to have explicit knowledge of (atrocities like) ACPI.

- current_state: Current power state of the device. For PCI and other modern devices, this is 0-3, though it's not necessarily limited to those values.@@ -251,18 +251,24 @@ }

probe:- Check for device existence and associate driver with it.+ Check for device existence and associate driver with it. In case of device + insertion, *all* drivers are called. Struct device has parent and bus_id + valid at this point. probe() may only be called from process context. Returns+ 0 if it handles that device, -ESRCH if this driver does not know how to handle+ this device, valid error otherwise.

remove: Dissociate driver with device. Releases device so that it could be used by another driver. Also, if it is a hotplug device (hotplug PCI, Cardbus), an- ejection event could take place here.+ ejection event could take place here. remove() can be called from interrupt + context. [Fixme: Is that good?] Returns 0 on success. [Can we recover from+ failed remove or should I define that remove() never fails?]

suspend:- Perform one step of the device suspend process.+ Perform one step of the device suspend process. Returns 0 on success.

resume:- Perform one step of the device resume process.+ Perform one step of the device resume process. Returns 0 on success.

The probe() and remove() callbacks are intended to be much simpler than the current PCI correspondents.@@ -275,7 +281,7 @@

Some device initialisation was done in probe(). This should not be the case anymore. All initialisation should take place in the open() call for the-device.+device. [FIXME: How do you "open" uhci?]

Breaking initialisation code out must also be done for the resume() callback, as most devices will have to be completely reinitialised when coming back from@@ -324,6 +330,7 @@

@@ -352,9 +360,9 @@ Instead, the walking of the device tree has been moved to userspace. When a user requests the system to suspend, it will walk the device tree, as exported via driverfs, and tell each device to go to sleep. It will do this multiple-times based on what the system policy is.--[ FIXME: URL pointer to the corresponding utility is missing here! ]+times based on what the system policy is. [Not possible. Take ACPI enabled +system, with battery critically low. In such state, you want to suspend-to-disk,+*fast*. User maybe is not even running powerd (think system startup)!]

Device resume should happen in the same manner when the system awakens.

@@ -366,22 +374,25 @@ cannot resume the hardware from the requested level, or it feels that it is too important to be put to sleep, it should return an error from this function.

-It does not have to stop I/O requests or actually save state at this point.+It does not have to stop I/O requests or actually save state at this point. Called+from process context.

SUSPEND_DISABLE:

The driver should stop taking I/O requests at this stage. Because the save state stage happens afterwards, the driver may not want to physically disable-the device; only mark itself unavailable if possible.+the device; only mark itself unavailable if possible. Called from process +context.

SUSPEND_SAVE_STATE:

The driver should allocate memory and save any device state that is relevant-for the state it is going to enter.+for the state it is going to enter. Called from process context.

SUSPEND_POWER_DOWN:

-The driver should place the device in the power state requested.+The driver should place the device in the power state requested. May be called+from interrupt context.

For resume, the stages are defined as follows:@@ -389,25 +400,27 @@ RESUME_POWER_ON:

Devices should be powered on and reinitialised to some known working state.+Called from process context.

RESUME_RESTORE_STATE:

The driver should restore device state to its pre-suspend state and free any-memory allocated for its saved state.+memory allocated for its saved state. Called from process context.

Each driver does not have to implement each stage. But, it if it does-implemente a stage, it should do what is described above. It should not assume+implement a stage, it should do what is described above. It should not assume that it performed any stage previously, or that it will perform any stage-later.+later. [Really? It makes sense to support SAVE_STATE only after DISABLE].

It is quite possible that a driver can fail during the suspend process, for whatever reason. In this event, the calling process must gracefully recover-and restore everything to their states before the suspend transition began.+and restore everything to their states before the suspend transition began. +[Suspend may not fail, think battery low.]

If a driver knows that it cannot suspend or resume properly, it should fail during the notify stage. Properly implemented power management schemes should--- clean/Documentation/swsusp.txt Sat Jan 5 21:39:47 2002+++ linux-swsusp/Documentation/swsusp.txt Fri May 3 00:08:35 2002@@ -0,0 +1,159 @@+From kernel/suspend.c:++ * BIG FAT WARNING *********************************************************+ *+ * If you have unsupported (*) devices using DMA...+ * ...say goodbye to your data.+ *+ * If you touch anything on disk between suspend and resume...+ * ...kiss your data goodbye.+ *+ * If your disk driver does not support suspend... (IDE does)+ * ...you'd better find out how to get along+ * without your data.+ *+ * (*) pm interface support is needed to make it safe.++You need to append resume=/dev/your_swap_partition to kernel command+line. Then you suspend by echo 4 > /proc/acpi/sleep.++[Notice. Rest docs is pretty outdated (see date!) It should be safe to+use swsusp on ext3/reiserfs these days.]+++Article about goals and implementation of Software Suspend for Linux+Author: Gábor Kuti+Last revised: 2002-04-08++Idea and goals to achieve++Nowadays it is common in several laptops that they have a suspend button. It+saves the state of the machine to a filesystem or to a partition and switches+to standby mode. Later resuming the machine the saved state is loaded back to+ram and the machine can continue its work. It has two real benefits. First we+save ourselves the time machine goes down and later boots up, energy costs+real high when running from batteries. The other gain is that we don't have to+interrupt our programs so processes that are calculating something for a long+time shouldn't need to be written interruptible.++On desk machines the power saving function isn't as important as it is in+laptops but we really may benefit from the second one. Nowadays the number of+desk machines supporting suspend function in their APM is going up but there+are (and there will still be for a long time) machines that don't even support+APM of any kind. On the other hand it is reported that using APM's suspend+some irqs (e.g. ATA disk irq) is lost and it is annoying for the user until+the Linux kernel resets the device.++So I started thinking about implementing Software Suspend which doesn't need+any APM support and - since it uses pretty near only high-level routines - is+supposed to be architecture independent code.++Using the code++The code is experimental right now - testers, extra eyes are welcome. To+compile this support into the kernel, you need CONFIG_EXPERIMENTAL, +and then CONFIG_SOFTWARE_SUSPEND in menu General Setup to be enabled. It+cannot be used as a module and I don't think it will ever be needed.++You have two ways to use this code. The first one is if you've compiled in+sysrq support then you may press Sysrq-D to request suspend. The other way+is with a patched SysVinit (my patch is against 2.76 and available at my+home page). You might call 'swsusp' or 'shutdown -z <time>'. Next way is to+echo 4 > /proc/acpi/sleep.++Either way it saves the state of the machine into active swaps and then+reboots. You must explicitly specify the swap partition to resume from with ``resume=''+kernel option. If signature is found it loads and restores saved state. If the+option ``noresume'' is specified as a boot parameter, it skips the resuming.+Warning! Look at section ``Things to implement'' to see what isn't yet+implemented. Also I strongly suggest you to list all active swaps in+/etc/fstab. Firstly because you don't have to specify anything to resume and+secondly if you have more than one swap area you can't decide which one has the+'root' signature. ++In the meantime while the system is suspended you should not touch any of the+hardware!++About the code+Goals reached++The code can be downloaded from+http://falcon.sch.bme.hu/~seasons/linux/. It mainly works but there are still+some of XXXs, TODOs, FIXMEs in the code which seem not to be too important. It+should work all right except for the problems listed in ``Things to+implement''. Notes about the code are really welcome.++How the code works++When suspending is triggered it immediately wakes up process bdflush. Bdflush+checks whether we have anything in our run queue tq_bdflush. Since we queued up+function do_software_suspend, it is called. Here we shrink everything including+dcache, inodes, buffers and memory (here mainly processes are swapped out). We+count how many pages we need to duplicate (we have to be atomical!) then we+create an appropiate sized page directory. It will point to the original and+the new (copied) address of the page. We get the free pages by+__get_free_pages() but since it changes state we have to be able to track it+later so it also flips in a bit in page's flags (a new Nosave flag). We+duplicate pages and then mark them as used (so atomicity is ensured). After+this we write out the image to swaps, do another sync and the machine may+reboot. We also save registers to stack.++By resuming an ``inverse'' method is executed. The image if exists is loaded,+loadling is either triggered by ``resume='' kernel option. We+change our task to bdflush (it is needed because if we don't do this init does+an oops when it is waken up later) and then pages are copied back to their+original location. We restore registers, free previously allocated memory,+activate memory context and task information. Here we should restore hardware+state but even without this the machine is restored and processes are continued+to work. I think hardware state should be restored by some list (using+notify_chain) and probably by some userland program (run-parts?) for users'+pleasure. Check out my patch at the same location for the sysvinit patch.++WARNINGS!+- It does not like pcmcia cards. And this is logical: pcmcia cards need cardmgr to be+ initialized. they are not initialized during singleuser boot, but "resumed" kernel does+ expect them to be initialized. That leads to armagedon. You should eject any pcmcia cards+ before suspending.++Things to implement+- SMP support. I've done an SMP support but since I don't have access to a kind+ of this one I cannot test it. Please SMP people test it. .. Tested it,+ doesn't work. Had no time to figure out why. There is some mess with+ interrupts AFAIK..+- We should only make a copy of data related to kernel segment, since any+ process data won't be changed.+- By copying pages back to their original position, copy_page caused General+ Protection Fault. Why?+- Hardware state restoring. Now there's support for notifying via the notify+ chain, event handlers are welcome. Some devices may have microcodes loaded+ into them. We should have event handlers for them aswell.+- We should support other architectures (There are really only some arch+ related functions..)+- We should also restore original state of swaps if the ``noresume'' kernel+ option is specified.. Or do we need such a feature to save state for some+ other time? Do we need some kind of ``several saved states''? (Linux-HA+ people?). There's been some discussion about checkpointing on linux-future.+- Should make more sanity checks. Or are these enough?++Not so important ideas for implementing++- If a real time process is running then don't suspend the machine.+- Is there any sense in compressing the outwritten pages?+- Support for power.conf file as in Solaris, autoshutdown, special+ devicetypes support, maybe in sysctl.+- Introduce timeout for SMP locking. But first locking ought to work :O+- Pre-detect if we don't have enough swap space or free it instead of+ calling panic.+- Support for adding/removing hardware while suspended?+- We should not free pages at the beginning so aggressively, most of them+ go there anyway..+- If X is active while suspending then by resuming calling svgatextmode+ corrupts the virtual console of X.. (Maybe this has been fixed AFAIK).++Any other idea you might have tell me!++Contacting the author+If you have any question or any patch that solves the above or detected+problems please contact me at seasons@falcon.sch.bme.hu. I might delay+answering, sorry about that.+--- clean/MAINTAINERS Tue May 21 23:21:35 2002+++ linux-swsusp/MAINTAINERS Tue May 21 23:33:30 2002@@ -1446,6 +1446,14 @@ L: linux-raid@vger.kernel.org S: Maintained

#define PREFIX "ACPI: "@@ -621,6 +623,34 @@ { acpi_wakeup_address = (unsigned long)alloc_bootmem_low(PAGE_SIZE); printk(KERN_DEBUG "ACPI: have wakeup address 0x%8.8lx\n", acpi_wakeup_address);+}++/*+ * (KG): Since we affect stack here, we make this function as flat and easy+ * as possible in order to not provoke gcc to use local variables on the stack.+ * Note that on resume, all (expect nosave) variables will have the state from+ * the time of writing (suspend_save_image) and the registers (including the+ * stack pointer, but excluding the instruction pointer) will be loaded with + * the values saved at save_processor_context() time.+ */+void do_suspend_magic(int resume)+{+ /* DANGER WILL ROBINSON!+ *+ * If this function is too difficult for gcc to optimize, it will crash and burn!+ * see above.+ *+ * DO NOT TOUCH.+ */+ if (!resume) {+ save_processor_context();+ acpi_save_register_state((unsigned long)&&acpi_sleep_done);+ acpi_enter_sleep_state(3);+ return;+ }+acpi_sleep_done:+ restore_processor_context();+ printk("CPU context restored...\n"); }

__set_current_state(TASK_RUNNING); remove_wait_queue(&kswapd_wait, &wait);-- (about SSSCA) "I don't say this lightly. However, I really think that the U.S.no longer is classifiable as a democracy, but rather as a plutocracy." --hpa-To unsubscribe from this list: send the line "unsubscribe linux-kernel" inthe body of a message to majordomo@vger.kernel.orgMore majordomo info at http://vger.kernel.org/majordomo-info.htmlPlease read the FAQ at http://www.tux.org/lkml/