We currently have no generic concept of a PHY address, since all
existing implementations simply hardcode the PHY address within the
MII access methods.

A bit-bashing MII interface will need to be provided with an explicit
PHY address in order to generate the correct waveform. Allow for this
by separating out the concept of a MII device (i.e. a specific PHY
address attached to a particular MII interface).

When booting some versions of the UEFI shell, our driver binding
protocol's Supported() entry point is called at TPL_NOTIFY for no
discernible reason. Attempting to raise to TPL_CALLBACK triggers an
immediate assertion failure in the firmware.

Since our Supported() method can run at any TPL, fix by simply not
attempting to raise the TPL within this method.

[tls] Ensure that window change is propagated to plainstream interface

The cipherstream xfer_window_changed() message is used to retrigger
the TLS transmit state machine. If the transmit state machine is
idle, then the window change message will not be propagated to the
plainstream interface. This can potentially cause the plainstream
interface peer (e.g. httpcore) to block waiting for a window change
message that will never arrive.

Fix by ensuring that the window change message is propagated to the
plainstream interface if the transmit state machine is idle. (If the
transmit state machine is not idle then the plainstream window will be
zero anyway.)

Provide symbols constructed from the object name and line number for
code fragments placed into alternative sections, such as inline
REAL_CODE() assembly placed into .text16. This simplifies the
debugging task of finding the source code corresponding to a given
instruction pointer.

Note that we cannot use __FUNCTION__ since it is not a preprocessor
macro and so cannot be concatenated with string literals.

The existence of MMX and SSE is required by the System V x86_64 ABI
and so is assumed by gcc, but these registers are not preserved by our
own interrupt handlers and are unlikely to be preserved by other
context switch handlers in a boot firmware environment.

Explicitly prevent gcc from using MMX or SSE registers to avoid
potential problems due to silent register corruption.

We must remove the %xmm0-%xmm5 clobbers from the x86_64 version of
hv_call() since otherwise gcc will complain about unknown register
names. Theoretically, we should probably add code to explicitly
preserve the %xmm0-%xmm5 registers across a hypercall, in order to
guarantee to external code that these registers remain unchanged. In
practice this is difficult since SSE registers are disabled by
default: for background information see commits 71560d1 ("[librm]
Preserve FPU, MMX and SSE state across calls to virt_call()") and
dd9a14d ("[librm] Conditionalize the workaround for the Tivoli VMM's
SSE garbling").

We currently perform various min-entropy calculations using build-time
floating-point arithmetic. No floating-point code ends up in the
final binary, since the results are eventually converted to integers
and asserted to be compile-time constants.

Though this mechanism is undoubtedly cute, it inhibits us from using
"-mno-sse" to prevent the use of SSE registers by the compiler.

Some drivers are known to call the optional Map_Mem() callback without
first checking that the callback exists. Provide a usable basic
implementation of Map_Mem() along with the other callbacks that become
mandatory if Map_Mem() is provided.

Note that in theory the PCI I/O protocol is allowed to require
multiple calls to Map(), with each call handling only a subset of the
overall mapped range. However, the reference implementation in EDK2
assumes that a single Map() will always suffice, so we can probably
make the same simplifying assumption here.

Tested with the Intel E3522X2.EFI driver (which, incidentally, fails
to cleanly remove one of its mappings).

Some CAs provide non-functional OCSP servers, and some clients are
forced to operate on networks without access to the OCSP servers.
Allow the user to explicitly disable the use of OCSP checks by
undefining OCSP_CHECK in config/crypto.h.

Mark the link as blocked if the LACP partner is not reporting itself
as being in sync, collecting, and distributing.

This matches the behaviour for STP: we mark the link as blocked if we
detect that the switch is actively blocking traffic, in order to
extend the DHCP discovery period and so prevent boot failures on
switches that take an excessively long time to enable ports.

When DEBUG=librm_mgmt is enabled, intercept CPU exceptions and provide
a register and stack dump, then drop to an emergency shell. Exiting
from the shell will almost certainly not work, but this provides an
opportunity to view the register and stack dump and carry out some
basic debugging.

Note that we can intercept only the first 8 CPU exceptions, since a
PXE ROM is not permitted to rebase the PIC.

Commit c89a446 ("[efi] Run at TPL_CALLBACK to protect against UEFI
timers") introduced a regression in the EFI entropy gathering code.
When the EFI_RNG_PROTOCOL is not present, we fall back to using timer
interrupts (as for the BIOS build). Since timer interrupts are
disabled at TPL_CALLBACK, WaitForEvent() fails and no entropy can be
gathered.

Fix by dropping to TPL_APPLICATION while entropy gathering is enabled.

As noted in the comments, UEFI manages to combines the all of the
worst aspects of both a polling design (inefficiency and inability to
sleep until something interesting happens) and of an interrupt-driven
design (the complexity of code that could be preempted at any time,
thanks to UEFI timers).

This causes problems in particular for UEFI USB keyboards: the
keyboard driver calls UsbAsyncInterruptTransfer() to set up a periodic
timer which is used to poll the USB bus. This poll may interrupt a
critical section within iPXE, typically resulting in list corruption
and either a hang or reboot.

Work around this problem by mirroring the BIOS design, in which we run
with interrupts disabled almost all of the time.

Reporting a completion via usb_complete() will pass control outside
the scope of xhci.c, and could potentially result in a further call to
xhci_event_poll() before returning from usb_complete(). Since we
currently update the event consumer counter only after calling
usb_complete(), this can result in duplicate completions and
consequent corruption of the submission TRB ring structures.

Fix by updating the event ring consumer counter before passing control
to usb_complete().

The i219 appears to have a seriously broken reset mechanism. After
any transmit or receive activity, resetting the card will break both
the transmit and receive datapaths until the next PCI bus reset.

The Linux and BSD drivers include a convoluted workaround authored by
Intel which involves setting a bit in the undocumented FEXTNVM11
register, then transmitting a dummy 512-byte packet containing garbage
data, then reconfiguring the receive descriptor prefetch thresholds
and temporarily reenabling the receive datapath. The comments in the
Intel fix do not even remotely match what the code actually does, and
the code accidentally leaves the transmitter enabled after use.

Experimentation suggests that an equivalent fix is to simply set the
undocumented bit in FEXTNVM11 before enabling the transmit or receive
descriptor rings.

Some older versions of gcc (observed with gcc 4.7.2) report a spurious
uninitialised variable warning in ena_get_device_attributes(). Work
around this warning by manually inlining the relevant code (which has
only a single call site).

[netdevice] Make netdev_irq_enabled() independent of netdev_irq_supported()

The UNDI layer uses the NETDEV_IRQ_ENABLED flag to choose whether to
return PXENV_UNDI_ISR_OUT_OURS or PXENV_UNDI_ISR_OUT_NOT_OURS for a
given interrupt. For a network device that does not support
interrupts, the flag will never be set and so pxenv_undi_isr() will
always return PXENV_UNDI_ISR_OUT_NOT_OURS. This causes some NBPs
(such as lpxelinux.0) to hang.

Redefine NETDEV_IRQ_ENABLED as a simple administrative flag which can
be set even on network devices that do not support interrupts. This
allows pxenv_undi_isr() (which is the sole user of NETDEV_IRQ_ENABLED)
to function as expected by lpxelinux.0.

Using "ld --oformat binary" for mbr.bin and usbdisk.bin seems to cause
segmentation faults on some versions of binutils (observed on Fedora
27). Work around this problem by using ld to create an intermediate
ELF object, followed by objcopy (via the existing %.tmp -> %.bin rule)
to create the final binary.

Note that we cannot simply use a single-stage "objcopy -O binary"
since this will not process the relocation records for x86_64: see
commit 1afcccd ("[build] Do not use "objcopy -O binary" for objects
with relocation records").

The URIs printed as part of download progress messages are intended to
provide a quick visual progress indication to the user. Very long
query strings can render this visual indication useless in practice,
since the most important information (generally the URI host and path)
is drowned out by multiple lines of human-illegible URI-encoded data.

Omit the query string entirely from the download progress message.
For consistency and brevity, also omit the URI fragment along with the
username and password (which was previously redacted anyway).

Servers may provide multiple WWW-Authenticate headers, each offering a
different authentication scheme. We currently fail the request as
soon as we encounter an unrecognised scheme, which prevents subsequent
offers from succeeding.

Fix by silently ignoring headers for schemes that we do not recognise.
If no schemes are recognised then the request will eventually fail
anyway due to the 401 response code.

If multiple schemes are supported, arbitrarily choose the scheme
appearing first within the response headers.

Some HP BIOSes (observed with a Z840) seem to attempt to connect our
drivers in the middle of our call to DisconnectController(). The
precise chain of events is unclear, but the symptom is that we see
several calls to our Supported() and Start() methods, followed by a
system lock-up.

Work around this dubious BIOS behaviour by explicitly failing calls to
our Start() method while we are in the middle of attempting to
disconnect drivers.

When submitting binaries for UEFI Secure Boot signing, certain
known-dubious subsystems (such as 802.11 and NFS) must be excluded
from the build. Mark the directories containing these subsystems as
insecure, and allow the build target to include an explicit "security
flag" (a literal "-sb" appended to the build platform) to exclude
these source directories from the build process.

For example:

make bin-x86_64-efi-sb/ipxe.efi

will build iPXE with all code from the 802.11 and NFS subsystems
excluded from the build.

Some UEFI BIOSes will deliberately break the implementation of
ConnectController() to return errors for devices that have been
"disabled" via the BIOS setup screen. (As an added bonus, such BIOSes
may return garbage EFI_STATUS values such as 0xff.)

Work around these broken UEFI BIOSes by ignoring failures and
continuing to attempt to connect any remaining handles.

The UEFI specification does not state whether or not a return value of
EFI_BUFFER_TOO_SMALL from the SNP Receive() method should follow the
usual EFI API behaviour of allowing the caller to retry the request
with an increased buffer size.

Examination of the SnpDxe driver in EDK2 suggests that Receive() will
just return the truncated packet (complete with any requested
link-layer header fields), so match this behaviour.

Record and report the number of peers (calculated as the maximum
number of peers discovered for a block's segment at the time that the
block download is complete), and the percentage of blocks retrieved
from peers rather than from the origin server.

Some external code (such as the UEFI UNDI driver for the Realtek USB
NIC on a Microsoft Surface Book) will block during transmission
attempts and can take several seconds to report a transmit error. If
there is a large queue of pending transmissions, then the accumulated
time from a series of such failures can easily exceed the EFI watchdog
timeout, resulting in what appears to be a system lockup followed by a
reboot.

Work around this problem by immediately cancelling any pending
transmissions as soon as any transmit error occurs.

The only expected transmit error under normal operation is ENOBUFS
arising when the hardware transmit queue is full. By definition, this
can happen only for drivers that do not utilise deferred
transmissions, and so this new behaviour will not affect these
drivers.

The SnpDxe driver raises the task priority level to TPL_CALLBACK when
calling the UNDI entry point. This does not appear to be a documented
requirement, but we should probably match the behaviour of SnpDxe to
minimise surprises to third party code.

Calling discard_cache() is likely to result in a call to
free_memblock(), which will call valgrind_make_blocks_noaccess()
before returning. This causes valgrind to report an invalid read on
the next iteration through the loop in alloc_memblock().

Ensure that all headers (PCI, UNDI, PnP, iPXE) are aligned to at least
four bytes, so that all accesses to header fields will be correctly
aligned even when reading directly from the expansion ROM BAR.

We must not steal ownership from the Gen 2 UEFI firmware, since doing
so will cause an immediate system crash (most likely in the form of a
reboot).

This problem was masked before commit a0f6e75 ("[hyperv] Do not fail
if guest OS ID MSR is already set"), since prior to that commit we
would always fail if we found any non-zero guest OS identity. We now
accept a non-zero previous guest OS identity in order to allow for
situations such as chainloading from iPXE to another iPXE, and as a
prerequisite for commit b91cc98 ("[hyperv] Cope with Windows Server
2016 enlightenments").

A proper fix would be to reverse engineer the UEFI protocols exposed
within the Hyper-V Gen 2 firmware and use these to bind to the VMBus
device representing the network connection, (with the native Hyper-V
driver moved to become a BIOS-only feature).

As an interim solution, fail to initialise the native Hyper-V driver
if we detect the guest OS identity known to be used by the Gen 2 UEFI
firmware. This will cause the standard all-drivers build (ipxe.efi)
to fall back to using the SNP driver.

EDK2 commit 6440385 ("MdePkg/Include: Add enumeration size checks to
Base.h") enforced the UEFI specification mandate that enums should
always be 32 bits. This revealed a latent bug in iPXE, which does not
build with -fno-short-enums.

The inline assembly used in include/errno.h to generate the einfo
blocks requires the ability to generate an immediate constant with no
immediate-value prefix (such as the dollar sign for x86 assembly).

We currently achieve this via the undocumented "%c0" form of operand.
This causes an "invalid operand prefix" error on GCC 4.8 for ARM64
builds.

Fix by switching to the equally undocumented "%a0" form of operand,
which appears to work correctly on all tested versions of GCC.

The UEFI specification has an implicit and demonstrably incorrect
requirement (in the Mem_IO() calling convention) that any UNDI network
device has at most one memory BAR and one I/O BAR.

Some UEFI platforms have been observed to report the existence of
non-existent additional I/O BARs, causing iPXE to select the wrong
BAR. This problem does not affect the SnpDxe driver, since that
driver will always choose the lowest numbered existent BAR of each
type.

Adjust iPXE's behaviour to match that of SnpDxe, i.e. to always select
the lowest numbered BAR(s).