Description of problem:
I have a Thinkpad T480s laptop and it has a 1Gb/s ethernet port. If I boot the laptop with ethernet cable attached, it works with 1Gb/s speeds. But once I detach and re-attach the cable, it suddenly works with just 10Mb/s. This is nicely visible in gnome control center, the initial "Connected - 1000Mb/s" gets replaced by "Connected - 10Mb/s". This only affects the download speed, I can still upload with 1Gb/s speeds. The same problem happens when I boot the laptop with the cable unplugged, and plug it in after boot.
I have debugged this problem for a long time and finally traced it down to NetworkManager (or kernel, but NM is definitely affecting it somehow). Here's a summary:
* This is not a single hardware failure, because it occurs on a colleague's T480s as well (so it probably affects this whole laptop family).
* This is not a hardware/firmware fault of this laptop family, because this problem doesn't happen on Windows 10 (always runs at 1Gb/s).
* This problem occurs in all kernel version available in Fedora 29.
* This problem occurs when I boot the Fedora 28(!) Workstation Live, so it's not brand new.
* This problem occurs even with selinux in permissive mode, so it's probably not selinux related.
* This problem doesn't occur when NetworkManager service is stopped and I use ifup/ifdown scripts to manage my ethernet port connection. This made me report this against NM.
* This problem can be fixed while NM is running when I execute "ethtool -r device".
So it seems that NM somehow breaks speed negotiation. If the connection is made outside of NM (during boot or using ifup), the negotiation works fine.
My network card is:
00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (4) I219-LM [8086:15d7] (rev 21)
I also have a network card on a thunderbolt dock, and everything works fine there. The problem affects just that internal laptop port displayed above.
When I connect the cable, I see these messages in the journal:
Sep 11 15:34:58 phoenix NetworkManager[9872]: <info> [1536672898.9113] device (enp0s31f6): carrier: link connected
Sep 11 15:34:58 phoenix kernel: e1000e: enp0s31f6 NIC Link is Up 10 Mbps Full Duplex, Flow Control: Rx/Tx
Sep 11 15:34:58 phoenix kernel: e1000e 0000:00:1f.6 enp0s31f6: 10/100 speed: disabling TSO
Sep 11 15:34:58 phoenix NetworkManager[9872]: <info> [1536672898.9115] device (enp0s31f6): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
Sep 11 15:34:58 phoenix NetworkManager[9872]: <info> [1536672898.9124] policy: auto-activating connection 'enp0s31f6' (803b0104-53b1-3c32-9688-a1d51ac083b8)
...
And ethtool says:
$ sudo ethtool enp0s31f6 | grep Speed
Speed: 10Mb/s
When I run "ethtool -r enp0s31f6", I see this in the journal:
Sep 11 15:35:31 phoenix NetworkManager[9872]: <info> [1536672931.8034] device (enp0s31f6): carrier: link connected
Sep 11 15:35:31 phoenix kernel: e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Sep 11 15:35:31 phoenix NetworkManager[9872]: <info> [1536672931.8037] device (enp0s31f6): DHCPv4 lease renewal requested
Sep 11 15:35:31 phoenix NetworkManager[9872]: <info> [1536672931.8200] dhcp4 (enp0s31f6): canceled DHCP transaction, DHCP client pid 12266
Sep 11 15:35:31 phoenix NetworkManager[9872]: <info> [1536672931.8201] dhcp4 (enp0s31f6): state changed bound -> done
Sep 11 15:35:31 phoenix NetworkManager[9872]: <info> [1536672931.8204] dhcp4 (enp0s31f6): activation: beginning transaction (timeout in 45 seconds)
...
And ethtool says:
$ sudo ethtool enp0s31f6 | grep Speed
Speed: 1000Mb/s
Do you need any more information to help you fix this? Thanks.
Version-Release number of selected component (if applicable):
kernel-4.18.5-300.fc29.x86_64
NetworkManager-1.12.2-2.fc29.x86_64
network-scripts-10.01-1.fc29.x86_64
ethtool-4.17-2.fc29.x86_64
How reproducible:
always
Steps to Reproduce:
1. get a T480s
2. plug in a cable (and disable wifi), reboot
3. see that your download and upload speed is 1Gb/s (use iperf3 with a different machine, or just speedtest.net)
4. disconnect the cable, wait 10 seconds (or check "ip a" until IP addresses for your ethernet device are gone)
5. reconnect the cable
6. see that your download speed is 10Mb/s, but your upload speed still 1Gb/s
7. run "sudo ethtool -r device", wait some seconds
8. see that your download and upload speed is again 1Gb/s
- and/or alternatively -
9. repeat steps 2-6
10. stop NetworkManager.service
11. run "sudo ifdown device"
12. disconnect the cable, wait 10 seconds
13. reconnect the cable
14. run "sudo ifup device", wait some seconds
15. see that your download and upload speed is again 1Gb/s
Actual results:
I get a 10Mb/s download speed if I don't know how to fix it
Expected results:
I get 1Gb/s download speed

(In reply to Dan Williams from comment #4)
> FWIW, I cannot reproduce this on F27 with 4.17.12-100.fc27.x86_64 and
> NetworkManager-1.8.8-1.fc27.x86_64. My T480s always negotiates 1000Mb/s on
> cable plug/unplug while NM is active.
Confirmed. This is working well if I boot F27 Workstation Live (containing kernel-4.13.9-300.fc27.x86_64 and NetworkManager-1.8.4-2.fc27.x86_64). At least we have a reference point at when it worked.
I'll work on getting the logs.

NetworkManager has support for settings ethernet.auto-negotiate, ethernet.speed, and ethernet.duplex.
It seems you didn't specify these parameters in the connection profile, hence, NetworkManager would be expected to not touch these settings.
NetworkManager configures this via ethtool ioctl "ETHTOOL_SSET", and in comment 7 no logging indicates that NetworkManager would set that. As one would expect.
In the logfile there is
<info> [1536940306.3752] device (enp0s31f6): carrier: link connected
<trace> [1536940306.3753] ethtool[2]: ETHTOOL_GSET, enp0s31f6: success
<debug> [1536940306.3754] device[0x559d05be2520] (enp0s31f6): speed is now 10 Mb/s
This is not something done by NetworkManager. It just reports the set speed.
Before that, NetworkManager does very little with the device:
- it sets ipv6 addrgen mode none
- it sets some sysctl relevant for IPv6
- it deletes a fq_codel qdisc
- it sets the device IFF_UP
none of this seems relevant to me.
With
$ grep -r -i ethtool /etc/sysconfig/
does that find anything relevant? (ignoring the matches in "network-functions" and "ifup-eth" files)

(In reply to Thomas Haller from comment #8)
> It seems you didn't specify these parameters in the connection profile,
I'd like to point out that I'm running a clean F29 installation and I haven't configured anything manually, I'm running with defaults. I also see the same behavior when I test this using F28 Workstation Live image (guaranteed to be using NM defaults).
> With
> $ grep -r -i ethtool /etc/sysconfig/
> does that find anything relevant? (ignoring the matches in
> "network-functions" and "ifup-eth" files)
No, the only matches are in ifup-eth and network-functions.

As Thomas said it would be strange if this was related to
NetworkManager, as it seems more a physical link negotiation issue.
Have you tried if the issue also happens when connected to a different
switch? Also, could you please try to disable eee with the following
command:
ethtool --set-eee enp0s31f6 eee off
and see if it makes any difference? Thanks.

(In reply to Beniamino Galvani from comment #11)
> As Thomas said it would be strange if this was related to
> NetworkManager, as it seems more a physical link negotiation issue.
Yeah, however stopping NM service fixes the issue.
> Have you tried if the issue also happens when connected to a different
> switch?
I tested at home and at work, behaves the same.
> Also, could you please try to disable eee with the following
> command:
>
> ethtool --set-eee enp0s31f6 eee off
>
> and see if it makes any difference? Thanks.
Before running the command, the status was:
$ sudo ethtool --show-eee enp0s31f6
EEE Settings for enp0s31f6:
EEE status: enabled - inactive
Tx LPI: 17 (us)
Supported EEE link modes: 100baseT/Full
1000baseT/Full
Advertised EEE link modes: 100baseT/Full
1000baseT/Full
Link partner advertised EEE link modes: Not reported
After running it:
EEE status: disabled
And it did not fix the problem.
An additional interesting fact: If I remove the cable and then plug it back again fast enough (under a second or two, before GNOME/NM has time to react and update the GUI), the connection is "uninterrupted" and the speed stays 1Gb/s. Only when I wait a few more seconds, so that GNOME/NM shows "cable disconnected" and re-connect it again, then I get 10Mb/s.

On my F29 system I downgraded to NetworkManager-1:1.8.4-2.fc27 (the F27 Live version that worked OK for me), and it did not fix the problem. I also downgraded to kernel-4.13.9-300.fc27 (again, F27 Live version), and kept NM in the latest fc29 version, and it fixed the problem!
So, this is definitely somehow related to kernel, because downgrading the kernel fixes the issue. However, stopping the NM service fixes the issue as well. How to explain this?

(In reply to Kamil Páral from comment #13)
> On my F29 system I downgraded to NetworkManager-1:1.8.4-2.fc27 (the F27 Live
> version that worked OK for me), and it did not fix the problem. I also
> downgraded to kernel-4.13.9-300.fc27 (again, F27 Live version), and kept NM
> in the latest fc29 version, and it fixed the problem!
>
> So, this is definitely somehow related to kernel, because downgrading the
> kernel fixes the issue. However, stopping the NM service fixes the issue as
> well. How to explain this?
Perhaps the bug is triggered by NM when it applies some settings to the interface, like bringing it up at the wrong moment. Do you have any chance to do a bisection or try an intermediate kernel between 4.13.9 and 4.18.5?

It seems to me that those two kernels have the same e1000e driver:
$ diff -Naur kernel-4.15.fc28/linux-4.15.0-1.fc28.x86_64/drivers/net/ethernet/intel/e1000e/ kernel-4.15.fc27/linux-4.15.3-300.fc27.x86_64/drivers/net/ethernet/intel/e1000e/
$
so I am clueless. I'm reassigning the bug to kernel for investigation.

(In reply to Laura Abbott from comment #17)
> 4.15 is old, please test on the latest kernel
Laura, it seems in comment 0 a later kernel (kernel-4.18.5-300.fc29.x86_64) was already found to have the same issue.
Kamil merely bisected the issue to above 4.15 kernel.

I am using Fedora 28 KDE on a Dell Precision 7530 laptop using a I219-LM rev10.
Kernel: 4.18.16-200.fc28.x86_64
I have created a workaround to mitigate against eno1 from using 10Mbps by modifying
/etc/sysconfig/network-scripts/ifcfg-eno1
I changed the following line in ifcfg-eno1 from:
ETHTOOL_OPTS="autoneg on"
to
ETHTOOL_OPTS="speed 1000 duplex full"
I tested the change by removing the eno1 Network cable for roughly 10 seconds and then reconnected the Network cable.
dmesg | grep e1000e
[33274.105579] e1000e: eno1 NIC Link is Down
[33303.610993] e1000e: eno1 NIC Link is Up 10 Mbps Full Duplex, Flow Control: Rx/Tx
[33303.610997] e1000e 0000:00:1f.6 eno1: 10/100 speed: disabling TSO
[33308.883575] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Interestingly, eno1 comes back up at 10Mbps and then 5 seconds later is now updated to 1000Mbps by ifcfg-eno1 (is performed automatically).
So I am guessing there is some issue with the auto negotiation procedure which causes 10Mbps to be selected. This issue is repeatable for me.

I have a very similar problem with a card which is inside Thunderbolt Dock.
Laptop: Dell Latitude E7480
Dock: Dell TB16
OS: Fedora 29
Kernels: both 4.18.18 and 4.19.2
The speed is always set to 10Mbs without respect to when I plug the cable or connect dock station.
Tested the same cable with builtin network card - it's always set to 1000Mbs.
The trick with manual setting the speed with `ethtool` works.
I can collect more logs if needed.

We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 29 kernel bugs.
Fedora 29 has now been rebased to 4.19.5-300.fc29. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
If you experience different issues, please open a new bug report for those.

I tried to compile the vanilla 4.15 kernel on my F29 using the instructions (jumping through several hoops in the process), in order to start bisecting, but I can't compile it:
CC /home/kparal/devel/linux-stable/tools/objtool/pager.o
pager.c: In function ‘pager_preexec’:
pager.c:36:12: error: passing argument 2 to restrict-qualified parameter aliases with argument 4 [-Werror=restrict]
select(1, &in, NULL, &in, NULL);
^~~ ~~~
cc1: all warnings being treated as errors
I tried to work around it (no familiarity with C build process), but failed.

Commit ad343a98e74e ("tools/lib/subcmd/pager.c: do not alias select() params") should fix that warning and you'll need commit b3348dab4e79 ("objtool, perf: Fix GCC 8 -Wrestrict error"). It looks like there are a couple of build issues with F29 toolchains, though, so it's probably easiest to use the F27 toolchain.

I tried F27 toolchain as well (in a mock), but that exited immediately failing to find stdarg.h (or similar). I had gcc installed and the file was present, I didn't know what to do with that. I also tried F28 toolchain, and that behaved the same as F29. I might try F29 again with the commits you mention. I tried to disable "warnings as errors" gcc behavior, but couldn't figure out how.

So unfortunately commit ad343a98e74e arrived too late, around the time of 4.15.3 kernel. And I need to bisect 4.15 and 4.15.3. The commit is also not referenced in any 4.15.x branch, just master branch. Commit b3348dab4e79 is not present in repo at all. I'm using the linux-stable git, because I need to use x.y.z tags, and Torvald's git doesn't contain those. So it seems I'm not able to compile and bisect those on F29.
I'll try F27 buildroot once again.

Oops, sorry, b3348dab4e79 was the reference to my cherry-pick. The upstream commit is 854e55ad289e. If you build with pre-gcc8 you probably won't need that, though.
You can add a second remote with "git remote add -f torvalds git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git" so you'll have the upstream commits and tags. Then "git bisect start", "git bisect good v4.15", "git bisect bad v4.15.3", and then "git apply fixup.patch" where fixup.patch has those two commits. It should leave them modified as you bisect without making git-bisect upset.

I managed to bisect the commits using F27 toolchaing. Here's the bad commit:
commit 4e45815fcd38e0a335f9be45336fd95011f6275b (HEAD, refs/bisect/bad)
Author: Tomas Winkler <tomas.winkler@intel.com>
Date: Tue Jan 2 12:01:41 2018 +0200
mei: me: allow runtime pm for platform with D0i3
commit cc365dcf0e56271bedf3de95f88922abe248e951 upstream.
>From the pci power documentation:
"The driver itself should not call pm_runtime_allow(), though. Instead,
it should let user space or some platform-specific code do that (user space
can do it via sysfs as stated above)..."
However, the S0ix residency cannot be reached without MEI device getting
into low power state. Hence, for mei devices that support D0i3, it's better
to make runtime power management mandatory and not rely on the system
integration such as udev rules.
This policy cannot be applied globally as some older platforms
were found to have broken power management.
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Reviewed-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff --git a/drivers/misc/mei/pci-me.c b/drivers/misc/mei/pci-me.c
index f4f17552c9b8..4a0ccda4d04b 100644
--- a/drivers/misc/mei/pci-me.c
+++ b/drivers/misc/mei/pci-me.c
@@ -238,8 +238,11 @@ static int mei_me_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
*/
mei_me_set_pm_domain(dev);
- if (mei_pg_is_enabled(dev))
+ if (mei_pg_is_enabled(dev)) {
pm_runtime_put_noidle(&pdev->dev);
+ if (hw->d0i3_supported)
+ pm_runtime_allow(&pdev->dev);
+ }
dev_dbg(&pdev->dev, "initialization successful.\n");
This commit causes my T480s ethernet to connect as 10Mb/s instead of 1000Mb/s.
Jeremy, can you communicate this to the upstream kernel maintainers?
Additionally, you could also provide koji scratch builds for cc365dcf0 and the previous good commit (76ee8f3d7) to get confirmation from other people with other laptops that this is also the broken commit for them, if you think that would help to confirm this.

Kamil, excellent. I've started a v4.20.3 build with that commit reverted - if people can compare this to the standard v4.20.3 build in F29 that would be great: https://koji.fedoraproject.org/koji/taskinfo?taskID=32188526. I'll email upstream once I double-check the build since I happen to have access to a T480s this afternoon.

Thanks, do you know if your machine is labeled with vPro enabled?
If you are not sure you can compile
linux/samples/mei/mei-amt-version.c
cd
It needs a little fix
-me->fd = open("/dev/mei0", O_RDWR);
+me->fd = open("/dev/mei0", O_RDWR);
cd samples/mei/
make
sudo ./mei-amt-version
It should provide the vPro version if it is enabled.
or it will print something like
Error: IOCTL_MEI_CONNECT_CLIENT receive message. err=-1