Overview

This document describes a heterogeneous AMP (Asymmetric Multi Processing) example design that is composed of the following:

HPS domain: ARM Cortex-A9 running Linux SMP

FPGA domain: Nios-II running uC/OS-II

The HPS domain is intended for non-real-time processing, while the FPGA domain is intended to be used for real-time processing.

The communication between HPS and FPGA domains is accomplished through a MCAPI transport layer, that uses shared HPS DDR memory as underlying mechanism.

FPGA domain has its own memories (OCRAM and DDRAM) so that the HPS impact on the real-time processing is minimized. With the dual domain implementation, memory access latency can be better controlled and deterministic. There are no shared peripherals between HPS and Nios-II.

The purpose of this design example is to provide a foundation on which a custom system can be built.

A simple “hello” application is used to demonstrate the inter-processor communication for this example design. ARM Cortex-A9 sends “hello” message to Nios-II continuously and vice versa. The “hello” message received by HPS will be displayed on HPS UART; while the message received by Nios-II will be displayed on JTAG UART.

Deliverables

This section presents the set of deliverables that are part of the Heterogeneous AMP Example design release.

Hardware Design

The AMP example design system requires having the following components:

FPGA

Nios II

Mutex IP

Mailbox IP

On Chip RAM

JTAG UART

Address span extender

SDRAM controller

EPCS flash controller

HPS (ARM Cortex-A9)

Components

Nios II

A single Nios-II core without MMU is instantiated in FPGA for real-time processing. Instruction and data cache are turned on. Nios-II processor is hooked up to soft IP such as JTAG UART, mailbox and mutex. CPU ID of Nios-II core is defined by user at hardware design time in Qsys. It is defined as 2 for this example design.

Mailbox

Mailbox soft IP are used for inter-processor interrupt and data passing. This is a new IP introduced in ACDS13.1. 2 mailboxes are needed for inter-processor communication between ARM Cortex-A9 and Nios-II since the IP can only passing messages in single direction. An interrupt will be fired to the recipient when a message is written into mailbox.

Mutex

Multiple mutexes are instantiated to protect critical sections of the code. This is to prevent multiple accesses to the critical sections at the same time. Both ARM Cortex-A9 and Nios-II need to obtain the mutex lock before they can write to the protected shared memory region.

On Chip RAM (OCRAM)

On chip RAM in FPGA domain is 256KB. OCRAM is connected to Nios-II and it is served as instruction and data memory for Nios-II. uC/OS-II OS code and application codes can be loaded into OCRAM if the code footprint is less than 256KB.

SDRAM controller

SDRAM controller is needed when the uC/OS-II code footprint grows beyond 256KB that it cannot be fit into OCRAM. DDR connected to FPGA will be used instead of OCRAM for this example design.

EPCS Flash Controller

EPCS flash controller is connected to Nios-II so that EPCS can be used as storage for FPGA bit stream and uC/OS-II software binary.

JTAG UART

JTAG UART is used to display message received by Nios-II for demo and validation purpose. It can be omitted if display is not needed.

Memory Maps

The L3 Interconnect within HPS support remap feature. Please refer Interconnect document for detail remap and memory map of each of the HPS peripheral. Software will need to set the remap bit correctly in order to access the H2F and LWH2F interfaces. The remap configuration is handled by Preloader

MPU View

The memory map of system peripherals in FPGA as view by the MPU which sit on top of LWH2F with base address of 0xFF20_000 is show in following table.

HPS SDRAM of location 0x3000_0000 of HPS system memory. This memory is currently not executable as the HPS SDRAM MPFE port is yet declared as executable memory

fpga_sdram

0x4000_0000

1G

FPGA SDRAM as executable memory

Software Design

The AMP example design showcase a system that runs 2 different operating systems in one SoC. HPS will be running Linux SMP operating system, and Nios II will be running uC/OS-II operating system. The example design software is to lay down a foundation for a communication channel between the two processors in the AMP example design from different domain and different architecture. The communication channel is established via MCAPI. Each processor is treated as a MCAPI node. The transport layer for the communication channel is shared memory

Mutex soft IP is used to protect the critical sections of this shared memory region. It is to control mutual exclusive access to the shared memory region to prevent memory corruption due to concurrent access. The mutex soft IP provides a hardware-based atomic test-and-set operation, allowing software in a multiprocessor environment to determine which processor owns the mutex.

Mailbox soft IP is used to notify and interrupt the processors on the new messages in the communication channel. Mailbox IP driver is required for both Linux SMP and uC/OS-II. This driver will be used by openMCAPI library for inter-processor communicationwhen a packet needs to be sent to the other processor in this example design, “Hello” messages will be exchanged between the processors; these messages will be divert to UART port controlled by the processor for demonstration and validation purposes.

Nios-II fetches its instructions from DDR in this example design as this may allow bigger code footprint. An OCRAM with size of 256KB is connected to Nios-II in this example design. Users may choose to switch the Nios-II to run from OCRAM for faster performance. However, the code footprint cannot grow beyond 256KB. Slower performance will be expected when Nios-II run from SDRAM and the execution timing may not be deterministic since the latency to SDRAM access is not deterministic. It is expensive to have a large OCRAM in a FPGA design as it will takes up a large portion of FPGA area. Therefore, there will be a tradeoff between FPGA area and performance for different sizes of code footprint.

Boot media for this example design is SD/MMC, but it can be modified to use other boot media..Users may choose to boot from QSPI or NAND for their own customization. Software changes are not needed, but boot image creation and boot setting on the board will differ

Boot Flow

HPS will be brought up together with FPGA. When the HPS is booting, the first boot component is Boot ROM, Boot ROM will fetch the preloader from SD/MMC and then boot into preloader. After that, preloader will fetch u-boot from SD/MMC and boot into u-boot. U-boot is responsible to fetch Linux image from SD/MMC and then boot into Linux.

FPGA will be configured from EPCS connected to Nios-II. The reset vector of Nios-II points to boot copier in EPCS controller memory. Nios-II is part of FPGA design; therefore Nios-II will only be brought up after FPGA is configured. Nios-II ELF will be fetched by boot copier from EPCS. Nios-II will then boot into uC/OS-II.

Besides that, u-boot is also responsible to release the H2F, LWH2F and F2SDRAM bridges after FPGA goes into user mode. These bridges are controlled by the HPS, FPGA domain has no write access to the control register groups to control these bridges. Access from FPGA domain to HPS domain will be backpressured if the bridges have not been released, Nios-II may stall resulting from this backpressure mechanism.

Resource Partitioning

There are no shared peripherals in this example design; ARM Cortex-A9 and Nios-II control the dedicated peripherals in their own domains. The only shared resource between ARM Cortex-A9 in HPS and Nios-II in FPGA will be SDRAM that connected to HPS physically. MCAPI transport layer is shared memory, therefore the HPS SDRAM needs to be accessible from both processors. Nios-II can access the HPS SDRAM via F2SDRAM bridge. The SDRAM will be partition into two partitions through u-boot booting arguments. Linux will be using most of the SDRAM, and a small portion of the memory will be used as a shared memory region between HPS and Nios II.

The shared memory must be marked as non-cacheable in both Linux and uC/OS-II. This is important as both Nios-II and ARM Cortex-A9 have their own caches; there is no cache coherency control between these two processors. Therefore, the processors have no knowledge on whether the contents in the cache are the latest if caching is allowed.

MCAPI Library

MCAPI specification is produced by the Multicore Association to standardize API for communication and synchronization between processing cores in embedded system.The following figure shows where the MCAPI framework resides in a multicore system.

MCAPI API are defined in MCAPI specification for user space applications; the APIs are OS agnostic and architecture agnostic. These characteristics allows the application developed using MCAPI API becomes portable for different platforms and OSes. However, the communication mechanism underlying MCAPI API between different OSes and cores are OS dependent and architecture dependent.The MCAPI library has been ported to support ARM Linux and Nios-II uC/OS-II for Altera SoCFPGA platform.

Features

This subsection gives a short description of the features supported by the MCAPI specifications:

Frame Formats

This subsection describes the frame format for different type of communication. The maximum frame size is implementation specific. It is default to 1024B.

Connectionless Message Communication

The following two figures show the frame format being used for message communication type. The first 12 bytes are common openMCAPI header. Message communication is mainly used for control path messaging (such as setting up connection, create endpoints etc) and exchanging message between cores. For control path messaging, message type is defined in Protocol Type field after the common header. Control path messaging payload is in various sizes depending on the protocol type / request type. While the maximum payload size for data path is implementation specific. It is default to 1012B (maximum frame size minus 12B header overhead).

Message Frame Format (Control Path):

Message Frame Format (Data Path):

Connection-oriented Packet Channel Communication

The following figure shows the frame format being used for packet channel communication type. The first 12 bytes are common openMCAPI header. Packet channel communication is mainly used for data path messaging to exchange information between two applications reside on different cores. Application message carried in payload field is application specific and to be defined by the application. The maximum payload size is implementation specific. It is default to 1012B (maximum frame size minus 12B header overhead).

Packet Frame Format:

Connection-oriented Scalar Channel Communication

The following figure shows the frame format being used for scalar channel communication type. The first 12 bytes are common openMCAPI header. Scalar channel communication is mainly used for data path messaging to exchange scalar values between two applications reside on different cores. Payload of scalar frame is in fixed length, it is either 8 bits, 16 bits, 32 bits or 64 bits. The payload is expected to be consumed by the application directly without further decapsulation.

Scalar Frame Format:

Maximum Message and Packet Size

The maximum message/packet size is made the same for both directions since both OSes communicates via the same shared memory transport layer. The maximum message/packet size is 1024B. The maximum packet size can be changed during build time via autotools configuration options.

Configuration Parameters

For this MCAPI library, the number of nodes to be supported is 2, which is one node forARM Cortex-A9 and one for Nios-II. While the endpoints supported for each node is 4, 2 endpoints are needed for control plane processing and 2 endpoints for data plane messaging as the messaging channels are uni-directional.

The following table shows several important settings which can be easily specified at build time and are applicable to both Linux and uC/OS-II builds.

Assumptions and Constraints

Number of nodes and connectivity topology are known at design time, not run-time. Therefore MCAPI specification does not specify link configuration and link management.

MCAPI specification does not take care of endianess, endianess is architecture and implementation specific. Nios-II soft cores and ARM Cortex-A9 cores are both little endian.

Some assumptions and constraints inherited from openMCAPI framework

Only supports unicast, not supporting multicast and broadcast.

For connection-oriented communication, it is assumed to be reliable. There is no acknowledgement mechanism like TCP connection.

For connectionless communication, sending a message to a non-exist endpoint is not an invalid send request, this type of requests will be reported as success by MCAPI API. Error handling for this type of error is not within the scope of this implementation.

Packets will be discarded if no working link to the specified destination node can be found, there is no error reporting by MCAPI API. Error handling for this type of error is not within the scope of this implementation.

Error code will be returned when the system is running out of buffer. It is up to the application layer to react on the failure and resend the message upon receiving error code.

OpenMCAPI Framework

This section gives an overview of OpenMCAPI framework, that was used as starting point for the MCAPI implementation on both Linux and Nios-II.

The following figure shows an overview of the Mentog Graphics OpenMCAPI implementation on Linux OS on PowerPC platform.

The MCAPI API called by application resides in user space.

There is a transport layer sits between kernel driver and MCAPI API layer. This transport layer is agnostic to target platform and OSes. OS specific implementation is abstracted by the OS abstraction layer (in this case known as Linux layer). All platforms may use the transport layer to link to the platform specific and OS specific implementation of physical layer in kernel space (kernel modules).

Shared memory (SHM) is used as the physical layer for OpenMCAPI library

The MCAPI generic layer and transport layer are compiled into a static library (libmcapi.a) so that it can be linked to applications.

Master-slave model is used in the SHM management driver. All cores shares the same SHM management block. However, only the first node becomes master and the rest will be slave nodes.

A pool of buffers is created from the shared memory region allocated in kernel module for MCAPI communication.

Each node on the core is allocated with a route interface that enables the messages/packets being routed to the right core. Besides that a buffer descriptor queue is created for each node to hold the incoming messages/packets from other nodes.

Kernel module beneath transport layer is responsible for memory allocation and mapping for the shared memory region. Callback functions from the kernel modules are registered to this transport layer.

Shared Memory (SHM) Buffer Management

For the communication between cores, a shared memory region is defined and allocated. The memory allocated must be a contiguous region. Both cores are able to read and write the shared memory region for intercore communication.

The shared memory region is initialized by the master node (first MCAPI node in the system) to create a pool of fixed size (1024B) buffers that are indexed.

The maximum SHM buffers allocated for the system during initialization is 128 (SHM_BUFF_COUNT). Therefore, the buffers are indexed with range of [0..127].

A buffer is taken from the pool and allocated to a node for message/packets communication when needed. The availability of buffers in the pool is tracked via a counter (shm_buf_count) and bit masks (buff_bit_mask) to derive which buffer is free for use.

A locking mechanism is introduced to prevent race condition of update the shared memory buffer variables (such as the counter and bit mask) due to the MCAPI library is running in multi-threaded environment and also multi-core system.

Shared Memory (SHM) Queue Management

MCAPI messages/packets are enqueued into the SHM queue to reach destination.

The queues are accessible by all nodes in the system.

The depth of the queue is 16 entries (SHM_BUFF_DESC_Q_SIZE).

The descriptor queue is implemented as a ring queue using producer-consumer concept that the sender is producer and the receiver as consumer. The sender will enqueue the message/packet to be sent into the ring queue and the receiver will dequeue it from the queue.

There is a counter for each queue to indicate how many entries are enqueued into the particular queue. The counter is treated as a mailbox here. When the counter is greater than 0 then it means that the mailbox is active; then it will either trigger interrupt to the destination node or wake up the receiving thread from sleeping to process the message/packet.

Shared Memory and Buffer Management

Buffer pools are shared between cores for existing implementation.

There number of SHM buffers allocated is 128 (# of SHM buffer). The maximum message/packet size (MCAPI_MAX_DATA_LEN) supported is 1024B and the header overhead for MCAPI frame is 12 bytes.

Besides that, the SHM buffer entry consist of buffer index, next pointer, buffer size, and etc, that total up to 20 bytes per entry.

Besides that, the MCAPI buffer is aligned to 4K address. Therefore, the total memory size (M) required for the SHM buffers (shared memory region) is at least 131kB.

The shared memory region for this buffer pool is located in SDRAM with both cores visible to this region. To avoid data coherency problem, the shared memory region is mapped to both OSes as non-cacheable region.

Locking Mechanism

There are three type of locks being used to protect the critical section of the shared memory region:

Initialization lock (_shm_drv_mgmt_struct_→ shm_init_lock) A lock to prevent multiple initialization of shared memory region from multiple cores that run different OSes. The shared memory region needs to be initialized once only and then being used by both cores.

Buffer descriptor queue lock (_shm_drv_mgmt_struct_→_shm_buff_desc_q_→lock) A lock to prevent multiple access and race condition to the buffer descriptor queue. The buffer descriptor queue is a ring queue and the maximum entries in the ring queue is 16. The buffer descriptor queue uses producer and consumer concept, the sender is known as the producer and the receiver is known as the consumer. The number of MCAPI messages/packets being enqueued in the ring queue is kept tracked via a counter. Both the producer and consumer will manipulate on the counter, one to increment the counter after enqueue and one to decrement the counter after consume it. Therefore, it should be protected by a lock to prevent race condition.

Buffer pool lock (_shm_drv_mgmt_struct_→_shm_buff_mgmt_blk_→lock) A lock to prevent multiple access and race condition to the buffer pool. The SHM buffer pool is the main structure to pass the messages/packets between cores. The SHM buffer pool is located in shared memory region. Both cores can get free buffers from the pool and free unused buffers back to the pool concurrently as they are running with different OS instances that have no knowledge on each other. Variables such as number of free buffers available, buffer index that marked the buffer as used or unused need to be protected. Therefore, the buffer pool needs to be protected with a lock.

Data Handling Flow

For communication between endpoints, a sender needs to create the message and then send the data over MCAPI channel to the endpoints. The receiver will then receive the message at its endpoint.

Both polling and interrupt mechanisms are supported through Linux kernel module in the openMCAPI library. Upon initialization of the MCAPI library in Linux; two threads are created for each node to handle incoming messages to its endpoints. One of the thread (mcapi_receive_thread) processes all the incoming messages/packets and enqueue the messages/packets to their endpoint’s receive queue. Control plane messages will be enqueued bythis thread into control plane endpoint’s receive queue. While the other thread (mcapi_process_ctrl_msg) processes the control plane messages.

The control plane thread’s routine is an infinite loop that waits for the incoming messages to the RX control endpoint for a particular node. The thread will be suspended if there is no message in the RX control endpoint’s queue. It will be signaled and resumed when there is any control messages arrive into the queue.

While for the mcapi_receive_thread thread, the thread can either be running in poll mode or interrupt mode. The thread’s routine is also an infinite loop. In poll mode, the thread will poll its SHM receive queue to check for message/packet entries from other endpoints. The thread continues polling until there are entries available and then dequeue the message/packet entry and enqueue the entry to the corresponding endpoint’s receive queue. The polling continue until the thread is killed.

When interrupt is enabled for the system/cores, the mcapi_receive_thread is put into sleeping state in the interruptible wait queue (for Linux OS environment). The thread can either be woken up upon the waiting condition is true or an interrupt is received. An interrupt will be generated targeting the destination node when a message/packet is sent to that particular node. In this case, the IRQ handler of the target node will be triggered and wake up this sleeping thread. The other condition to wake up the thread is the waiting condition evaluates to true when the wait queue is woke up. The waiting condition evaluates to true when the mailbox is active, this means that when the counter of queue entries in SHM queue is not zero.

Implementation Details

OpenMCAPI Library Porting

This section briefly describe a few points that need to take into consideration when enabling openMCAPI for Altera SoCFPGA platform support from hardware design software design perspectives.

Autotools Enablement

openMCAPI library has been enabled to be built and configured with autotools. The original build method using phyton bases ‘waf’ tool has been disabled. There are a few files being added and modified to enable autotools as shown below:

Makefile.am

config.sub

configure.ac

libmcapi/Makefile.am

libmcapi/include/openmcapi.h

libmcapi/shm/linux/kmod/Makefile

util/Makefile.am

Altera SoCFPGA Platform Enablement

openMCAPI has been enabled for Altera SoCFPGA platform for ARM Linux and Nios-II uC/OS-II. There are a few files being added/modified to enable that as shown below:

include/openmcapi_cfg.h

include/ucosii/mcapi_os.h

libmcapi/include/arch/arm/barrier.h

libmcapi/include/nios2/barrier.h

libmcapi/include/lock.h

libmcapi/mcapi/ucosii/mcapi_os.c

libmcapi/shm/linux/kmod/Kbuild

libmcapi/shm/linux/kmod/common.c

libmcapi/shm/linux/kmod/loop.c

libmcapi/shm/linux/kmod/mcomm.h

libmcapi/shm/linux/kmod/socfpga.c

libmcapi/shm/linux/shm_os.c

libmcapi/shm/shm.c

libmcapi/shm/shm.h

libmcapi/shm/ucosii/shm_os.c

libmcapi/shm/ucosii/socfpga.c

libmcapi/shm/ucosii/ucosii_mcomm.h

util/memtool.c

Locking Mechanism

The physical layer of transport layer for the MCAPI library is a shared memory, both Nios II and ARM A9 have the read and write access to the shared memory region concurrently. Therefore a locking / mutex mechanism needs to be enforced to protect the critical section of this shared memory region.

These locks are implemented with mutex soft IP, the number of mutex soft IP requires for this example design is 4 (LOCK_MAX_NUM = 2 + DESCQ_LOCK_NUM ). Please note that the number of buffer descriptor queue locks need to be increased when the number of MCAPI nodes supported is increased (DESCQ_LOCK_NUM == CONFIG_SHM_NR_NODES). Therefore, the hardware design need to be modified to increase the number of mutexes when more MCAPI nodes are supported in future.

The device tree for Linux and also SHM driver (socfpga.c)needs to be updated accordingly also if there are changes in the number of mutexes in the design.Besides that, configure.ac file also needs to updated accordingly to take in more mapping of mutexes.

Interrupt Mechanism

Interrupt mechanism is needed to notify the cores in a system on the arrival of new MCAPI packets/messages. MCAPI library will make use of Mailbox IP to interrupt the other processor when a packet needs to be sent. Processor can use Mailbox IP to interrupt the other processor by writing the CMD and DATA register to the Mailbox IP’s address.

Please note that the number of mailbox soft IP required needs to be increased when more than one Nios-II cores are supported in future. The existing design only requires 2. Each pair of MCAPI nodes requires 2 mailboxes for the interrupt mechanism as mailbox is uni-directional.

The device tree for Linux and also SHM driver (socfpga.c)needs to be updated accordingly also if there are changes in the number of mailboxes in the design.Besides that, configure.ac file also needs to updated accordingly to take in more mapping of mailboxes.

Memory Barrier Support

For ARM architecture, memory barrier opcode is available for memory operation synchronization; therefore the memory barrier opcode is used to make sure memory operations in sync between HPS and FPGA domain. Please refer to libmcapi/include/arch/arm/barrier.h for the implementation.

However, there is no memory barrier opcode for Nios-II architecture. Memory barrier support is done by issuing a dummy read to any memory location. This will ensure the memory write operations are carried out prior this read operation. This is the hardware support in qsys fabric. Please refer to libmcapi/include/arch/nios2/barrier.h for the implementation. This mechanism may need to be updated following the changes of Nios-II architecture in future.

Non-cache support in Nios-II

Nios-II is enabled with data cache, there is no MMU support to specify the memory region cache policy. By default all data access is cached. However memory access to HPS SDRAM shared region must be un-cache; therefore memory access to the HPS SDRAM shared region is achieved by marking bit-31 of the SDRAM address to bypass cache. This is the supported feature in Nios-2. This mechanism may need to be updated following the changes of Nios-II architecture in future.

Address Mapping of Shared Memory Region

The shared memory region resides in HPS SDRAM; a window bridge is used to enable Nios-II to map to the HPS SDRAM into its view with the base address specify in qsys design. The BSP support for window bridge in ACDS13.1 is not complete yet; therefore a definition for the mapping of windown bridge on the region that HPS SDRAM region can be seen by Nios-II is hard coded to 0x3000000 as ADDRESS_SPAN_EXTENDER_NIOS2SDRAM1G_RESET_MAP in this example design (socfpga.c for uC/OS-II). This needs to be replaced when BSP editor expose this definition in BSP. Please note the mapping value may be changed based on the design. It is design time parameter.

CPUID for ARM Processor and Nios-II Processor

A unique CPUID needs to be assigned to all the processors in the system so that each processor can be identified separately. The CPUID field is used when acquiring the mutex lock, therefore it must be unique and no overlapping on the CPUID value for ARM processor and Nios-II processor. ARM Cortex-A9 is a dual core processor, the CPUID for core 0 is 0 and core 1 is 1. Linux SMP is run on ARM processor and the processor ID return for SMP mode is 0. CPUID for Nios-II is user defined, it is defined as 2 in this example design.

uC/OS-II Task Priority

uC/OS-II is a pre-emptive RTOS, higher priority task will interrupts lower priority task. Therefore, it is very important to set the task priority correctly. Priority 0 and 1 are reserved for system tasks that have the highest priority. The default range of task priority for uC/OS-II ranges from 0 - 63. Task with priority 63 (OS_LOWEST_PRIO) has the lowest priority. Application running in uC/OS-II can be assigned to priority between 3 (APP_CFG_TASK_START_PRIO)to 61.

There are 2 tasks being spawned out in MCAPI implementation. One of the tasks (mcapi_receive_thread) is on data plane that continue to check on availability of messages/packets to a particular node and dispatches the message/packets to the intended endpoint queue. The other task (mcapi_process_ctrl_msg) is responsible to manipulating control plane message. The control plane task should have higher priority than the data plane task as control plane task handles management tasks such as node creation, endpoint creation, establish connection and etc. Data plane task is the main entry for MCAPI library to receiving incoming messages/packets.

The following Table shows the task priority being assign to all tasks executed on uC/OS-II core, N can be fined tuned based on the system design. The default value for N in this reference design is 5.

FPGA Configuration

There are multiple ways to configure FPGA, such as via HPS software (bootloader and Linux driver) or external flashes. This example design deviates from CV SoC GSRD for the FPGA configuration approach as the tool support for AMP example design is limited now. There is no complete solution to enable Nios-II ELF to be downloaded from HPS domain yet. Therefore, this example design falls back to traditional approach that uses external flash (EPCS) for the FPGA configuration and Nios-II ELF download.

Boot Script for u-boot

u-boot boot script is used to release bridges upon FPGA goes into user mode (L1 – L6 in Table 2) and also to partition HPS SDRAM (L8 in Table 2).

The shared memory region is partitioned in u-boot so that Linux cannot claim the shared memory region as free memory pool in the kernel, it can only mapped it as I/O region. In this example design the upper 256MB of HPS SDRAM is reserved for the shared memory region though the memory size required is less than 256kB. User may also make use of the shared memory region for larger data sharing, this can be achieved by further partition the shared memory region for data payload and then the MCAPI message will only send the address of the data payload to the receipients.

Users may choose to change the buffer size via autotools configuration and the memory partition size in u-boot to desired size through u-boot memory parameter.

Device Tree for ARM Linux

XML files needed for device tree generators (DTG) to generate the dts for this example design is similar to the XML files in CV SoC GSRD. The mcomm node for openMCAPI library must be added into the XML files to generate a workable dts. For example:

Besides that, <IRQMasterIgnore className="altera_nios2_qsys"/> must be added into boardinfo file so that dts can be generated correctly for ARM view. This is because the interrupt lines are hooked up to multiple masters in this design.

Linux Kernel Configuration

Building the Reference Design

The reference design is delivered both in binary format (ready to run) and in source format. This section presents how to build the example design from sources.

Prerequisites

The hardware design requires a Linux host machine in order to be built. This is because the Linux Yocto recipes require a Linux host machine. All the other steps can also be performed on a Windows machine.

The following tools are required in order to build the Reference Design:

Signal Tap II

The SignalTap® II Logic Analyzer helps with the process of design debugging. This logic analyzer is a solution that allows you to examine the behavior of internal signals, without using extra I/O pins, while the design is running at full speed on an FPGA device.

The SignalTap II Logic Analyzer is scalable, easy to use, and is available as a stand-alone package or included with the Quartus® II software subscription. This logic analyzer helps debug an FPGA design by probing the state of the internal signals in the design without the use of external equipment. Defining custom trigger-condition logic provides greater accuracy and improves the ability to isolate problems.

The SignalTap II Logic Analyzer does not require external probes or changes to the design files to capture the state of the internal nodes or I/O pins in the design. All captured signal data is conveniently stored in device memory until you are ready to read and analyze the data.

System Console

System Console is a flexible system-level debugging tool that helps designers quickly and efficiently debug their design while the design is running at full speed in an FPGA.

System Console enables designers to send read and write system-level transactions into their Qsys system to help isolate and identify problems.

It also provides a quick and easy way to check system clocks and monitor reset states, which can be particularly helpful during board bring-up. In addition, System Console allows designers to create their own custom verification or demonstration tool using graphical elements, such as buttons, dials, and graphs, to represent many system-level transactions and monitor the processing of data.

Errata

In some instances, there will be some text (printf) not printed from u-boot during boot time. If user attached putty, it may further cause the boot to halt. Tested against teraterm and minicom and they only caused some text to not be printed but not halting the boot.