See Also

Fabien Le Mentec (contact)
My experience spans several domains, ranging from robotics, to industrial control monitoring and wireless networking. I am currently a software engineer at the European S...show full bio

Would you like to be notified by email when Fabien Le Mentec publishes a new blog?

Introduction

Writing this kind of logic in assembly language is not easy. First the assembly language itself may be difficult to learn depending on your background. Then, fixed and floating point arithmetics require lot of code. While macros help to handle the complexity, they still are error prone as you must be very careful to enforce rules as regard with register usage for instance. This is especially difficult in projects involving several people, who are not used to work together and not full time on the project. Even if the rules are well established, people tend to forget them when they come back in the project from time to time, several weeks apart.

Someone pointed that TI introduced C support in the latest Code Composer Studio version. For me, using the C language clearly helps in the above issues so I started to investigate this feature. Several people contacted me to know if I made any progress on using the C language to program the PRU. My answer is that the toolchain proposed by TI, called CGT (Code Generation Tools) is very usable once the different tools and some details have been understood.

This article describes how to setup and use the PRU C toolchain, and includes a simple example.

Prerequisites

I have a GITHUB repository containing all the materials related to this article. Please clone it from here:

The toolchain is located in the CCS installation directory called pru_2.0.0B2. If you do not find it, you may have to start CCS and install missing components using the GUI menus. The pru_2.0.0B2 directory is self contained and can moved anywhere to be used as is. Please put it in the pru_sdk directory you created previously.

pru_2.0.0B2/bin contains the usual tools:

clpru: a C compiler,

asmpru: an assembler,

dispru: a disassembler,

lnkpru: a linker,

hexpru: an output file generation tool,

others tools to have information on binary (ELF) files.

Also, TI provides a C standard library and a runtime, with the sources included. It can be found in the pru_2.0.0B2/lib directory. More information can be found in the README.txt file and man pages in pru_2.0.0B2/man.

Installing the PRU loader

To load application and interact with the PRU from LINUX, I use the opensource library found here:

One important feature is the loading and execution of PRU binary files. In our case, it must be noted that the starting address will not be 0. Rather, it is located at a symbol called _c_int00, which can be anywhere in the text section. Thus, I had to modify the loading library so that an address can be specified in the routine:

prussdrv_exec_program

This was done by adding a new routine, without breaking existing interfaces:

prussdrv_exec_program_at

It writes the PRU control register address fields, as specified in am335xPruReferenceGuide section 5.4. Refer to examples/pruss_c/host_main.c for an example.

I put the new library in the pru_sdk repo, and I will submit a patch to the official repository soon.

Compiling and running a program

I added an example for this article in the pru_sdk repo, directory example/pruss_c.

The code itself simply does a floating point multiplication on the PRU and puts the result into the memory shared, refer to pru_main.c The CPU reads and display the result, refer to host_main.c

I made a simple script that compiles everything, refer to example/pruss_c/build.sh .

I will integrate it to the pru_sdk build system later. The script works as follow:

invoke the compiler to produce object files from PRU C files,

link them to produce an ELF file,

extract code and data binary from the ELF file,

retrieve the start address,

compile the CPU program, to be run on the Beagle Bone Black.

At the end of the process, the following files are produced:

pru_enable-00A0.dtbo: the device tree overlay enabling the PRU,

main: the ELF program to be run on the CPU,

text.bin: the binary file to initialize the PRU code,

data.bin: the binary file to initialize the PRU data.

You must copy the following files to your BBB board:

pru_enable-00A0.dtbo into /lib/firmware,

run.sh,

main, text.bin and data.bin.

On the BBB, the program is then run using run.sh. It is a small wrapper taking care of loading the uio_pruss driver and enabling the PRU.

Using inline assembly

Accessing some parts of the hardware still require assembly code. Previously, we took care of wrapping this code inside macros. I am in the process of porting these macros into functions usable from C code.

The functions are kept minimal and implemented using inline assembly. The inline assembly support is not as advanced as the GCC one, especially lacking support to describe register usage.

One important thing is to know the calling convention and rules used by the PRU C compiler. They are described in the README.txt. To summarize:

r2 contains the stack pointer,

r3 contains the return address,

r14 to r29 are used for argument passing,

r14 is used for the return value,

r3 to r13 must be saved by the callee.

You can look at pru_hal.c for examples. It is still in progress, but gives the idea.

More to come

I am now in the process of rewriting the low level hardware related routines in C. I started with inline assembly, but it may be possible to use intrinsics instead. This is to be investigated.

Another thing to investigate is the generated code size, as the PRU program memory is limited to 8KB. From what I have seen, CLPRU does a good job and provides different optimization options to reduce the generated code size. I will soon see if it is enough.

One may argue about using the PRU C toolchain directly from command line instead of using the CCS software. It is a valid point, as CCS is freely available. Personnaly, I prefer the command line especially as I do not always have a usable X connection to the machine hosting the build system we use for the PRU.

Updates

A reader gave a link to download the PRU C compiler without the whole CCS software:

posted byFabien Le Mentec
My experience spans several domains, ranging from robotics, to industrial control monitoring and wireless networking. I am currently a software engineer at the European Synchrotron Radiation Facility (ESRF), working on high performance data acquisition systems. I am also an open {source,hardware} and DIY enthusiast. My full resume can be found here: https://github.com/texane/resume/raw/master/output/resume_fabien_lementec.pdf

Hugely helpful; many thanks. I knew about the PRU C compiler (thanks to prior communication with your previous commenter, actually) and had done some PRU assembly programming on the Beaglebone Black, but had not yet been able to connect the dots to get working C code on that platform. Like you, I prefer using command-line tools whenever possible.

They also modified the am335x_pru_package to support loading the data.bin and text.bin files separately. But they do not provide an entry address like you do - so I'm not sure if this is really neccessary?!? There is a nice blinkled example in subfolder am335x_pru_package\pru_sw\example_apps\blinkled of this repository that shows how to execute a PRU programm compiled with the TI PRU C compiler.

Thanks for the tip. Actually, I do not know how they make it work without specifying an entry point. I guess that they managed (by chance, or by knowing it implicitly ...) to have the pre main initialization code at 0, esp. as they are using the stack for local variable 'int i' . In the example I show (float multiplication), the entry point is not at 0 (refer to the resulting MAP file). So specifying the entrypoint is required. There are other solution than changing the entrypoint, but I wanted to use the TI C library 'as is'. I hope it answers your question

9 months ago

0

ukindler

Said:

Yes, indeed - you are right. If I compile the given blinkled example and link to the starteware library for pru, then my map file shows:

Hello again. I'm still struggling with how one would load the data.bin file into PRU RAM. I noticed in your host_main.c program, you have this as a "TO DO". I also looked at the BeaglePilot/PRUSS-C blinked example on his github repo and see that he added functions to the libprussdrv.so to handle loading data files into PRU Data Ram. However, can't seem to find source code for these functions, or for the other functions in libprussdrv anywhere. Admittedly, I am no expert in any of this, but I am hoping that you might have written such a function for your libprussdrv, or could point me to information that might permit to attempt todo so.

Duh... I just found the source code for the prussdrv - I think I will be able to write a function to load the data.bin file into PRU_Dataram... we'll see

9 months ago

0

texane

Replied:

Hi,

Actually it was left as a TODO until someone needed it. I integrated the support for data loading, as done in the BeaglePilot/PRUSS-C github repo. I am pretty confident it will work, but I did not test it yet. If you do please let me know if it works. Everything is in the latest branch of the pru_sdk repo. Thanks for your comments !

9 months ago

+1

verminsky

Said:

Thank you! I just happened to look at your page again now and saw your reply, and the added data loading functions. I shall try this all out tomorrow and will let you know how it goes. I am continually surprised by the Linux/Beaglebone community and how people go out of their way to be so helpful. It gives one hope!

Yes, people in the BBB are quite reactive. Let me know about your progress!

9 months ago

0

verminsky

Said:

Well today, after getting distracted by non-lectronic, family related issues, I was finally able to compile the new libpruss libraries and get all the CROSS_COMPILE paths and exports working and successfully got your pruss_cexample to generate all the files. But I got an error message when the build script tried to generate the pru_enable.dtbo overlay file. It complains about a missing "(" in the .dts file. Didn't have time yet to track that down, but I did add some global variables to the pro_main.c file and manipulated them and they show up in the map file and the data.bin file size is larger. Tomorrow, hopefully, I can finish and actually load it onto my BBB and report back success! Thanks again.

the DTC tool is used to generate the DTBO from DTS. The only reason I see it to fail is that I compiled DTC for a 64 bits architecture using shared libraries. It may fail to execute on your own platform, either because of the hardware architecture or missing dependencies. In this case, you may have to compile the DTC tool yourself.

I just added the DTC source code and a build.sh script in the pru_sdk repository. Please rebuild the DTC tool using the script, and put the resulting binary in the correct directory... I hope this should fix your issue.

Cheers !

8 months ago

0

verminsky

Said:

Hello again! Thanks for doing all this extra work. Indeed, I am running a 32bit version ofdebian on VirtualBox on the Windows side of an iMac boot camped into Windows 7... I had already downloaded and compiled a 32 bit version of DTC, so all that was necessary was for me to change your top.mk file in the build directory and I finally got everything to compile correctly.

I then scp'd all the necessary ties over to my debian BBB, and voila - it works!

Now, I still have much to learn, but thanks to you help, I have a working example to guide me. This has also forced me to learn more about make files and shell scripts, and has propelled me further on my journey of knowledge.

Got it working! Thanks for doing all the legwork on this. I'm not sure if it is running correctly, it probably is since it ran at all, but it gives me more confidence if I can see the program outputs match. Do you have an example of the output? Here is what it said for me: 0xdeadbeef (-6259853398707798016.000000)0x2b2b2c2d (0.000000)0x41fb51ec (31.415001)0x2404287b (0.000000)0x9e9b6c26 (-0.000000)0xad8883d0 (-0.000000)0x56b5a8ac (99868022407168.000000)0x2cd21c2e (0.000000)

When you have it, please post here the URL of your forum thread sothat others can know about the bugs you are mentioning, and theeventual fixes.

Thanks !

6 months ago

0

turbobob

Said:

The bugs were really instabilities in my toolchain, and old library code that was not working well with the latest compiler. Everything is working pretty well now.

You created an _at version of the exec functions, but it seems there is a way to force the linker to put the entry point at 0 (the latest TI examples show it)

putting the following in the linker command file (or using the latest example from TI) removes the need for presetting the PRU program counter, and allows the prudebug command line functions to work properly as well. (I'm debugging remotely)

.text:_c_int00* > 0x0, PAGE 0

I set up the build to create the header file so I can use the exec_code functions (the new compiler won't create them like pasm did)

you must link against a C library that complies with your kernel. I can not help youmuch on this particular point, sorry.

Texane.

6 months ago

0

longqi

Said:

I compiled then I got the error: http://pastebin.com/kcSsc7CNbut I did get the prumain.elf, data.bin and text.bin, So I guess the cross compiler is the wrong one, So I changed to another one which from CCS 6, but still have error http://pastebin.com/gRZNAJ48Could you help me?

I got the pruss_c example to work and in an attempt to understand more I changed the pru_main.c file to write the entries at a different offset to the shared memory start. Something like: { shm_write_uint32(0x0120, 0xdeadbeef); shm_write_uint32(0x0124, 0x2b2b2c2d); shm_write_float(0x0128, x); }

and not changing host_main at all, I built it and ran it and the answers were the same as if I had not written it to a different offset. I'm looking for the reference for how to read the registers from the c function calls. Or any hints to make some progress. Even what my mistake in thinking is?

That was pretty silly, If I didn't erase it or reboot, why wouldn't it be there? Sorry for the bother.

5 months ago

0

texane

Replied:

Hi,

I do not know what the error is. Why are you using 0x120 offset to write your values ?

5 months ago

0

raymadigan

Said:

I just wanted to be general, when I write my app I will double buffer data into the shared memory. It works as it should, there is no error. I forgot that I didn't erase the memory at zero so it was still there. I write to offset 0x120 in the pru and read from the 30th uint32 in the host. It all works well.

Do you know of a forum or place to share or get ideas why code doesn't seem to work? As an example I am trying to get the pruss to block waiting for the host to interrupt and I can't seem to find the right combination.

First, thanks for all you did to get this started for me. I have done a few things but my biggest problem in trying things is that there are assembly instructions that result in Invalid instruction messages that people use in the assembler just fine. For example

" SBCO R14, c0, 0x24, 4 \n"

this is used often to clear system interrupts but it won't compile for me. The read me talks about some instructions, like MOV that can't be used in specific circumstances. Is there a place to look to find how to alter the instruction to work?

Sorry, I said that wrong, My question is when using the c language bindings many times the instructions that are used in different samples I find on the web won't compile. The one I posted above for instance.

I am trying to figure out how to clear an interrupt and I know what I wrote above isn't right, but it won't compile.

I am trying to clear the ARM_PRU0_INTERRUPT once pru0 responds to it. and I don't know if it is the right logic or if the compile is giving me grief. It seems easy enough.

If you're still having trouble, I had the same problem and figured it out.

From the README:6. Operands with a '&' prefix The existing PASM assembler accepts operands with or without the & symbol: LBBO &r0 or LBBO r0. The assembler in this release requires the & for these operands.

Hello,I am trying to compile your example directly on the Beaglebone Black (rev C). I downloaded the standalone compiler to the bbb as well as your repository. When I run your build script I am running into the following error:/usr/bin/ld: error: main uses VFP register arguments, /home/..../pru_sdk/lib/libprussdrv.a(prussdrv.o) does not /usr/bin/ld: failed to merge target specific data of file /home/...pru_sdk/lib/libprussdrv.a(prussdrv.o)

I tried compiling with gcc with different options such as -mfloat-abi=soft but it doesn't seem to make a difference. Do you know how to compile your example on the bbb directly using gcc?

Some people, especially those with realtime backgkround (ie. DSP, FPGA),do not mind programming in assembly. And the PRU assembly is quite simple,which makes it even more appealing to them. Also, the C compiler PRU supportis relatively recent.

That being said, I see more and more physics institutes interested inusing the PRU to replace MCU + DSP/FPGA based solutions. The main reasonsbeing costs reduction and the simplicity of hard and soft resource sharingbetween the PRU and the MCU.

I hope the beagleboard platform will democratize this approach to more opensource projects. When needed, this is much more powerful than the competition(RPI...)

4 months ago

0

navin40

Said:

Hello Texane ,

Could you please share the GCC version used in crosstool-ng while building compiler.