Description of problem:
Complete system freeze requiring hard pawer off/on cycle and reboot. This is on latest Fedora 27 distribution running on Ryzen 2500U with latest amdgpu from Fedora 27 distribution. (Nothing custome here)
Version-Release number of selected component (if applicable):
How reproducible:
Just regular browsing with Firefox, email reading with Thunderbird, other applications. No clear pattern yet. Sometimes while entering data in for example Bugzilla or new email compose. Does not seem related to video playback such as in Youtube.

Created attachment 1415608[details]
Output from DMESG for basic info on system
DMESG output attached. Also, this is not hardware failure as system exhibits no issues whatsoever on Windows. This unit does have relatively new wireless chip and GPU. To install F27 I had to use a respin to get started with kernel 4.15.
I would be happy to test later versions of drivers etc if I can get a little guidance. I suppose I could go to rawhide if this would be helpful.

Interestingly on this run just before the freeze I had System Monitor running behind firefox window partially viewable show the CPU utilization graphs moving. I had another terminal window shelled into a remote machine running top so I could see it update.
When everything froze, the terminal was frozenn, firefox was frozen, but the system monitor appeared to be in motion. Mouse and keyboard were dead so I could do nothing but power off. This is also using Wayland.
I am now going to try without Wayland and see what it does.
pcieport 0000:00:01.7: is suspicious, and I think the kernel was still running, though I could not access it.

Hello Jerry,
Have you tried booting with parameter "rcu_nocbs=0-7"? (As your processor has 8 threads)
I think it is related to this kernel bug: https://bugzilla.kernel.org/show_bug.cgi?id=196683
There is also some advice regarding some BIOS settings related to the power supply.

This may be connected to bug #1478219, which was also an amdgpu-related total system lockup under active use (in my case with a Radeon RX 550 card). However that issue appears to now be gone in current Fedora package versions and I think that those versions predate this bug report. However it may be worth testing (and verifying). If the kernel parameter 'amdgpu.dpm=0' avoids this issue, it's almost certainly related to that bug.

I have tried Fedora 28 Beta Live Image boot from USB and have not seen the problem yet. However, I have only been able to run for a short time. As soon as Fedora 28 releases I will install it fully, exercise it and report back.
Also note on the HP Laptop, Bios Update to version F16b here. So check bios updates as we track this problem.

(In reply to Jerry from comment #13)
> I have tried Fedora 28 Beta Live Image boot from USB and have not seen the
> problem yet. However, I have only been able to run for a short time. As soon
> as Fedora 28 releases I will install it fully, exercise it and report back.
>
> Also note on the HP Laptop, Bios Update to version F16b here. So check bios
> updates as we track this problem.
Hi Jerry, I had BIOS F16a and just updated to F17a. Still the problem is present, with the kernel 4.16.3 or the hand-compiled amd-staging-drm-next. I'm reproducing it with glxgears/hangouts in a few minutes.

I updated to latest BIOS and confirm it is still present on the F28 Beta Live. I have not tried applying updates to an actual F28 installation to further test yet. (I am running off the USB at the moment)

I installed F28 Beta and installed all available updates. So far with several hours of using the machine, I have had no hangs. The previous combination of glxgears and hangouts has run without issue. I would say I have a few hours of use in now.

This happens on Ryzen 2500u in the HP 15m-bq121dx (ENVY x360) laptop. I encountered the freeze within a few minutes. So far, I've tested kernels 4.16.1, 4.16.2, 4.16.3, 4.16.4, and vanilla 4.17 rc2. I have yet to try 4.16.5.

My wife has an Acer SF315-41 (Swift 3) with a Ryzen 2700U and has been experiencing about 3 freezes a day. I know that some of them are actually just the touch pad freezing but that is a different issue and there have definitely been complete system freezes too.
We're currently using OpenSUSE 42.3 with very recently kernel (4.17.0-rc2) and Mesa (20180413 git) builds as well as the latest BIOS (1.04).
I would use netconsole to get more information but there's no ethernet port on this machine and I'm guessing it doesn't work over wireless.
I also have a desktop Ryzen 1600X and I'm familiar with the freezing issues encountered there. There are no advanced options in Acer BIOS but I have disabled the C6 Package state with zenstates.py to rule achieve the same effect and rule that out. Sadly it seems to be something else this time.

I forgot to add that it sometimes freezes while booting up but I haven't yet seen that with the full kernel log output visible. I also found that booting Fedora from a USB stick would sometimes fail as it would suddenly start reporting errors reading from the device as if it had just been pulled out. May not be related but thought it was worth mentioning.

(In reply to Braulio Oliveira from comment #21)
> Good to know that it affects Fedora/OpenSUSE/ArchLinux, multiple kernels and
> BIOSes and multiple hardwares (HP/Acer/Desktop).
To be clear, I doubt that the desktop issue is related. I ruled it out by applying the same fix on the laptop and it didn't help.
Since I am unable to use netconsole, I tried to use kdump instead. I'm not familiar with it but I haven't been able to get it work, despite jumping through a few hoops. Even if I trigger a crash manually with /proc/sysrq-trigger, kdump doesn't seem to kick in. It just sits there instead of rebooting and I then find /var/crash is still empty. Perhaps you guys could try it?

(In reply to Jerry from comment #24)
> It has been reported that the bug is fixed with mesa-18.0.2. I have not had
> time to install this and test yet.
Where did you hear that? I doubt it as I've been running a very recent Mesa git and have still had freezes every day.

I heard fro a mesa developer. However another developer suggested we need to get more data for them. I found when the freeze occurs I can still ssh into the machine from another and this confirms the cpu and the wifi are still running. Doing this one can then examine logs and capture live information. I have been short on time, but I plan to get setup to redo this and get additional data to the developers.

Perhaps you/they meant Mesa 18.2 (unreleased) rather than 18.0.2? Phoronix just reported it has helped a lot with Vega.
I'm not able to SSH but maybe I need to try harder as even when it hasn't frozen, it takes a minute or two of pinging before inbound connections start to work and that's even after disabling power management on the wifi interface. Never seen that before.

Running Mesa 18.2 devel does not improve the issue for me - Running games still results in a random system lockup.
I am running kubuntu 18.04 with 4.16.9 mainline kernel, and mesa 18.2.0-devel from the padoka ppa.

Also I should add the behavior is the same as well. Very low usage, a few terminals and firefox. Crash happens randomly, maybe an hour, maybe six. Freezes completely, mouse still moves around the screen and no input from keyboard/mouse recognized. Cannot switch virtual terminals and stops responding to icmp.

(In reply to jon from comment #30)
>
> Ryzen 1700X
> MSI Tomahawk B350
> XFX Radeon RX 560 (RX-560P4SFG5)
Jon, you don't have a mobile Ryzen or Vega graphics. I doubt your issue is related and you should look at this report instead, specifically ensure your BIOS is up to date and look for the Power Supply Idle Control setting.
https://bugzilla.kernel.org/show_bug.cgi?id=196683

Just had another crash. After rebooting I did confirm that I'm running the latest MSI Tomahawk Arctic BIOS. 7A34vHD from 2018-05-04.
Same error:
Jul 15 10:41:04 pc003 kernel: [drm:generic_reg_wait [amdgpu]] *ERROR* REG_WAIT timeout 10us * 3000 tries - dce110_stream_encoder_dp_blank line:956
This just started happening maybe one to two weeks ago after updating all packages via dnf. Prior that that I had weeks of uptime at a time. Now it locks up after a couple hours, never even close to a full day.

(In reply to jon from comment #38)
> As an update I want to be clear that this is a regression. This system has
> been running without issue for months.
Try setting kernel parameter 'idle=nomwait' I found this at some other websites and it is working for me. The mwait cpu instruction can hang a thread. This is documented in the AMD errata for certain Ryzen chips.

Thanks for the suggestion but I'm already using that option and I've had several freezes since then.
Are you using displayport by any chance? I'm using three monitors: one each via dvi, hdmi and displayport.

(In reply to jon from comment #40)
> Thanks for the suggestion but I'm already using that option and I've had
> several freezes since then.
>
> Are you using displayport by any chance? I'm using three monitors: one
> each via dvi, hdmi and displayport.
No I have only a laptop here with Ryzen 2500U GPU. So your issue is different.

(In reply to James Le Cuirot from comment #35)
> (In reply to jon from comment #30)
> >
> > Ryzen 1700X
> > MSI Tomahawk B350
> > XFX Radeon RX 560 (RX-560P4SFG5)
>
> Jon, you don't have a mobile Ryzen or Vega graphics. I doubt your issue is
> related and you should look at this report instead, specifically ensure your
> BIOS is up to date and look for the Power Supply Idle Control setting.
>
> https://bugzilla.kernel.org/show_bug.cgi?id=196683
I'm using the rcu_nocbs=0-15 which I think would fix the soft lockup issue. I think this issue is different, and it just started a few weeks ago. I've been using this cpu and gpu for months without issue using rcu_nocbs.
I think Jerry is right, I think my issue is different. I have filed a kernel bug on it.

(In reply to Hexawolf from comment #43)
> Confirming this issue, Ryzen 2500U on HP Notebook 15-db0229ur.
> - I am pretty sure this is related to Vega graphics
> - Disabling C-State C6 partially solves the problem
>
> Is Ryzen 5 so uncommon that this issue gets such little attention? This
> makes thousands of systems totally useless.
How did you disable C-State C6? What OS/kernel are you running?
My current boot parameters:
Kernel command line: BOOT_IMAGE=/vmlinuz-4.18.12-200.fc28.x86_64 root=/dev/mapper/fedora_localhost--live-root ro resume=/dev/mapper/fedora_localhost--live-swap rd.lvm.lv=fedora_localhost-live/root rd.lvm.lv=fedora_localhost-live/swap rhgb quiet LANG=en_US.UTF-8 idle=nomwait processor.max_cstate=5
I have not tried dialing back the cstate further and my system still has issues with some suspend attempts, for example if I am on battery and close the lid, it will not come back.

(In reply to Jerry from comment #44)
> How did you disable C-State C6? What OS/kernel are you running?
By editing /dev/cpu/[0-9]*/msr, here's a good utility I believe:
https://github.com/r4m0n/ZenStates-Linux
Was using 4.19-rc7.
> I have not tried dialing back the cstate further and my system still has
> issues with some suspend attempts, for example if I am on battery and close
> the lid, it will not come back.
I had no problems suspending laptop and bringing it back but random freezes are even worse. This is certainly a critical issue as it *already* caused a massive data loss. It is also interesting because I have noticed that sometimes sound keeps playing and probably other system services like SSH are alive, though this must be checked.
By the way, having latest BIOS from HP - F.11 Rev.A in my case

I have found a very surprising way to reproduce this bug.
Up to now, it causes the soft lockup to 100% but I don't know why.
I booted via a Live USB to scroll through logs and easily edit kernel parameters. While doing so I wanted to change the keyboard layout to german {because I have a german keyboard} when I clicked on the graphical icon for keyoard layout the system freezes and after checking the logs I saw that it is the same error message Jerry posted.
drm:construct amdgpu...
I dont know why it does this, I dont know what that means for the bug, but I wanted to let you know that there is a strange way to replicate this bug and maybe you have better chances using this information than I have.

I encounter this issue almost daily on my Huawei Matebook D with the 2500U on Fedora Silverblue. The bios is currently up to date at version 1.18. I can seemingly trigger it easily when using YouTube in Firefox but it has happened independent of Firefox even being open.
I can only replicate this issue on gnome-shell but it's happened on Fedora 29 Workstation, Silverblue, and Ubuntu.
For me these are hard lockups with the sound repeating the last second or two of whatever was playing at the time. On my Desktop system with an AMD GPU I've seen lockups/hangs where the audio/network continues working but I haven't seen that on ryzen/vega mobile.

I bought an Acer A315-41 laptop with Ryzen 3 2200U APU inside.
I have tried to install Fedora 29 using the netinst installer (version 1.2) but booting the installer reliably triggers the CPU soft lockup error in UDEV while still on the console, although in KMS mode.
I tried adding these options as suggested elsewhere:
idle=nomwait
processor.max_cstate=1
rcu_nocbs=0-3 (2 cores, 4 threads on this CPU)
pcie_aspm=off
The BIOS was hopelessly outdated with version 1.02, version 1.11 was available from Acer. I had to install Windows 10 to run the BIOS upgrade program.
But nothing of the above helped, the Fedora 29 installer cannot start its GUI as the soft lockup is still triggered.

I have progressed a little. pci=biosirq helped me to install Fedora 29 in text mode, I still could not get it to start the GUI.
I used the netinst version of the installer and kernel 4.20.3-200.fc29 was installed.
This Acer A315-41 laptop needs "pcie_aspm=off noapic" (nothing more) for this kernel version to boot properly into graphics but the previous installation somehow did finish correctly. Booting into single mode does not accept the root password, saying that the root user is disabled.
I tried the same set of options with the installer but they did not help. Now, if only the netinst installer was using kernel 4.20...

I have a Lenovo Ideapad 330 15-ARR (16G 2700u) running Fedora 28.
Random Freeze, usually when I hit the "window" key or Activities menu (strike upper left corner).
I have enabled openSSH server so I can log in from another machine, kill gnome-shell and all is good.
dmesg = dmesg_20190128_100000.txt
I can install / run whatever you need ... if Fedora 29 fixed this, I will upgrade and "go away".
B

kernel-4.20.11-200.fc29.x86_64 way more stable
boots up good (no kernel parms)
using zenStates to disable c6
after several days a bit of screen glitching (glitch, redraw) but does it "fixes" itself ...
MUCH BETTER

(In reply to Jerry from comment #54)
> With kernel 4.20.11 all appears OK. I do get some apic errors.
>
This must have bene a fluke or some other dependency. None of the 4.20 kernels work. Scrambly line of color at the bottom of a blank screen. I have reverted back to:
kernel=/boot/vmlinuz-4.19.15-300.fc29.x86_64
args="ro resume=/dev/mapper/fedora-swap rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap rhgb quiet LANG=en_US.UTF-8 idle=nomwait iommu=pt"
Which appears to boot well. I have tried closing lid to suspend once and that worked, but cannot confirm stability with any suspend. From all my readings, there is a driver load problem ongoing and I dont know if its fixed in the the kernel 5 series yet.

As an update:
[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.0.3-200.fc29.x86_64 root=/dev/mapper/fedora-root ro resume=/dev/mapper/fedora-swap rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap rhgb quiet LANG=en_US.UTF-8 idle=nomwait amd_iommu=pt
$ cat /proc/cpuinfo
... snip ...
model : 17
model name : AMD Ryzen 5 2500U with Radeon Vega Mobile Gfx
stepping : 0
microcode : 0x8101007
... snip ...
This boots OK now, and I tried withut the idle=nomwait and it also boots fine.
Attempting to suspend the computer use to hang it up requiring a power cycle. Now it does not hang (for which I am very grateful) but also fails to suspend. Possibly due to the dracut bug here:
https://bugzilla.redhat.com/show_bug.cgi?id=1676357
Which I have not tried the fix for yet. In summary dracut faulted out during install of 5.0.3. There are many different suspend errors from different components, so I will keep an eye on it.

Jerry, does kernel 5.0.4 boot for you?
I have the same setup as Jerry (AMD Ryzen 5 2500U) with Firmware version F.20 and it still isn't booting for me, even after installing the dracut update and reinstalling Kernel 5.0.4.
I get a black screen with a horizontal line near the bottom of scrambled orange pixels.
The last kernel that's working is 4.19.15-300.fc29.x86_64.
# grubby --info /boot/vmlinuz-5.0.4-200.fc29.x86_64
index=0
kernel=/boot/vmlinuz-5.0.4-200.fc29.x86_64
args="ro rhgb quiet LANG=en_US.UTF-8"
root=UUID=f7548a91-a3d2-4ec2-8573-c7f313417cda
initrd=/boot/initramfs-5.0.4-200.fc29.x86_64.img
title=Fedora (5.0.4-200.fc29.x86_64) 29 (Twenty Nine)
I added the kernel parameter amd_iommu=pt and it still didn't boot.
FYI, my suspend/resume is already working fine so I didn't add the parameter idle=nowait.

> Jerry, does kernel 5.0.4 boot for you?
Yes, but to get any of the kernels above 4.19.15 to boot, I had to delete /lib/firmware/amdgpu/raven_dmcu.bin .
See: https://bugs.freedesktop.org/show_bug.cgi?id=109206
I do have to see if I can still boot 4.19.15 with and without this file.

(In reply to Jerry from comment #5)
> Interestingly on this run just before the freeze I had System Monitor
> running behind firefox window partially viewable show the CPU utilization
> graphs moving. I had another terminal window shelled into a remote machine
> running top so I could see it update.
>
> When everything froze, the terminal was frozenn, firefox was frozen, but the
> system monitor appeared to be in motion. Mouse and keyboard were dead so I
> could do nothing but power off. This is also using Wayland.
>
> I am now going to try without Wayland and see what it does.
>
I had the same thing happen (monitor ran, all else stopped), I could log in via SSH to shut the system down.

The only odd issues 5.0.5 (and previous):
Notifications of "component crash" (says tainted kernel, but VirtualBox was removed ... wish there was a tool to list things that taint the kernel)
ABRT crashes on occassion
The screen (no activity, just idle) will FLASH, more noticed when firefox is up (gmail) ... (screen dim / lock disabled).

(In reply to Jerry from comment #59)
> > Jerry, does kernel 5.0.4 boot for you?
>
> Yes, but to get any of the kernels above 4.19.15 to boot, I had to delete
> /lib/firmware/amdgpu/raven_dmcu.bin .
>
> See: https://bugs.freedesktop.org/show_bug.cgi?id=109206
>
> I do have to see if I can still boot 4.19.15 with and without this file.
Thanks. I'll follow up on that specific issue on that bug report.
I am running Fedora 29 KDE, and I would also experience random freezes. It is a bit less common now. However, the logs wouldn't indicate any root cause most of the time.

Hey everyone, I just recently bought a Dell Inspiron 7375 running a Ryzen 5 2500U with vega graphics, and have had a hell of a time getting any Linux distro to run. I've encountered almost every Ryzen/Linux issue that is documented, but I think I've finally found a solution.
I've ran various boot options from Grub with very little success until booting Linux Mint 19.1 "Tessa" via Cinnamon in compatibility mode. After installing to the internal drive I changed the boot options, and everything still works. Wifi, BT, Touchscreen!!, touchpad, speakers...everything.
Boot options:
replace " quiet splash " with
" noapic noacpi nosplash irqroll -- "
Not sure if all of that is necessary, I literally just got it working after five days of blank screens and processor error loops, so more testing will be required.
Hope this helps someone end a nightmare.

(In reply to freqyxin from comment #66)
> Hey everyone, I just recently bought a Dell Inspiron 7375 running a Ryzen 5
> 2500U with vega graphics, and have had a hell of a time getting any Linux
> distro to run. I've encountered almost every Ryzen/Linux issue that is
> documented, but I think I've finally found a solution.
>
> I've ran various boot options from Grub with very little success until
> booting Linux Mint 19.1 "Tessa" via Cinnamon in compatibility mode. After
> installing to the internal drive I changed the boot options, and everything
> still works. Wifi, BT, Touchscreen!!, touchpad, speakers...everything.
>
>
>
> Boot options:
>
> replace " quiet splash " with
>
>
> " noapic noacpi nosplash irqpoll -- "
>
>
> Not sure if all of that is necessary, I literally just got it working after
> five days of blank screens and processor error loops, so more testing will
> be required.
>
> Hope this helps someone end a nightmare.

This message is a reminder that Fedora 28 is nearing its end of life.
On 2019-May-28 Fedora will stop maintaining and issuing updates for
Fedora 28. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora 'version' of '28'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.
Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 28 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Fedora 28 changed to end-of-life (EOL) status on 2019-05-28. Fedora 28 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.
If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.
Thank you for reporting this bug and we are sorry it could not be fixed.

I am currently on Fedora 30 and all is working great as long as one removes raven_dmcu.bin from /lib/firmware/amdgpu/ before installing any kernel or kernel updates. This is a driver loading issue for the Vega graphics. Once the boot image is created without the influence of this file, things are very stable.

(In reply to Jerry from comment #72)
> I am currently on Fedora 30 and all is working great as long as one removes
> raven_dmcu.bin from /lib/firmware/amdgpu/ before installing any kernel or
> kernel updates. This is a driver loading issue for the Vega graphics. Once
> the boot image is created without the influence of this file, things are
> very stable.
I tried this. It did not work for me. Still soft-locks. Vega 2500U.

(In reply to Jerry from comment #72)
> I am currently on Fedora 30 and all is working great as long as one removes
> raven_dmcu.bin from /lib/firmware/amdgpu/ before installing any kernel or
> kernel updates. This is a driver loading issue for the Vega graphics. Once
> the boot image is created without the influence of this file, things are
> very stable.
Didn't work for me either. I'm using fedora 30 with gnome-shell. My cpu: AMD Ryzen 5 PRO 2500U w/ Radeon Vega Mobile Gfx (8)

(In reply to kadu from comment #75)
> (In reply to Jerry from comment #72)
> > I am currently on Fedora 30 and all is working great as long as one removes
> > raven_dmcu.bin from /lib/firmware/amdgpu/ before installing any kernel or
> > kernel updates. This is a driver loading issue for the Vega graphics. Once
> > the boot image is created without the influence of this file, things are
> > very stable.
>
> Didn't work for me either. I'm using fedora 30 with gnome-shell. My cpu: AMD
> Ryzen 5 PRO 2500U w/ Radeon Vega Mobile Gfx (8)
If you previously installed the current kernel that does not boot you need to remove that image after you delete the raven_dmcu.bin file. There are four kernel packages involved:
kernel
kernel-core
kernel-modules
kernel-modules-extra
This is one approach to rebuilding the kernel without the driver issue, but you have to have a booting kernel that works to do this.
The other approach is to rebuild the kernel boot image. See https://bugs.freedesktop.org/show_bug.cgi?id=109206 for the steps todo this after you have removed the raven_dmcu.bin file.
Also note. I have noticed that the troublesome bin file gets re-installed everytime the "firmware" package gets updated which has required me to redo the removal procedure more than once.

I have noticed that when the machine wakes (open laptop lid) the fan will occasionally not run (normally it is very active varying speed and such).
When the fan stops, heavy loads will freeze the graphics (can still log in via ssh).
B

HP x360, 2500u, BIOS F.20: Linux 5.3 from Fedora Rawhide freezes within minutes and shows garbled graphics, even when deleting the raven_dmcu.bin file and adding "idle=halt" to the kernel command line.
I wasn't able to find any logs of the error on my system.
Kernel 5.0.9 works fine with just the raven_dmcu deletion, and requires no command line parameters.

Note

You need to
log in
before you can comment on or make changes to this bug.