If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

T4.0 Memory - trying to make sense of the different regions

While trying to debug some issues, where a program would cause USB to not work in cases, but did in others, I decided to try to understand more on the memory organization of new Teensy T4.0.
I put a lot of this up on the Teensy 4 First beta thread, maybe starting with the post: https://forum.pjrc.com/threads/54711...l=1#post213113

Also during this time frame, it was clear we probably need to enhance some of our data reporting at the end of the builds, as for example I did a sample application, which has the binary data for three full size bitmaps to display on an ILI9341 display, which it by default tries to put all of it in the ITCM segment. The linker said I only used under 50% of memory, but in actuality all of it was trying to fit into the DTCM segment and failed to load.

It was suggested that others might find some of this useful, especially if they did not have to dig through something like 170 pages of a forum thread. Hopefully over time we can cleanup these descriptions and maybe transfer some of this to a more appropriate spot like a Wiki or... Until then, those of moderator privileges, feel free to cleanup some of this... Including probably removing some of this intro:

During the T4 beta, I kept seeing and reading about different terms like FlexRam, ITCM, DTCM, OCRAM, ...? Sometimes I hate 3 or 4 letter acronyms! What do all of these mean, and what impact does each of these mean to me:

So again what does this mean? I know I can search the web and the like and I have...

First: 1024KB of RAM with 512KB tightly coupled: The memory of the T4 is divided into two main pieces, both of which are 512KB in size.

FlexRam:
The first part which in the documents is called the FlexRam, which has 16 banks of 32KB of memory. Each of these banks can be configured to be of the type: ITCM, DTCM, OCRAM, or not used. In our setup we only use types ITCM (Instruction Tightly Coupled Memory) or DTCM(Data Tightly Coupled Memory). I will describe each of these in more details. But basically the build process will take some or all of your code and allocate enough 32k blocks of memory to hold this, and the startup code will copy that code into this block. The remaining blocks of the FlexRam will be marked to be DTCM. So for example if you have over 32Kb and less than 64KB of code, it will take 2 banks for ITCM and leave you with 14 banks for Data.

OCRAM: On Chip RAM
This is second 512KB of memory.

1924KB Flash memory - This is where your program is stored (maybe) - As I mentioned above about ITCM - A lot of your code may be moved into ITCM.

Now to describe some of these sections in some more details. As part of this, during the early part of the T4 beta, I believe it was @FrankB who first developed a tool that you could build into the build process (updated Platform.txt), that would give us more information than the standard build did. I know others have also done parts of this as well like @mjs513, @defragster, ... Over the last couple of days I decided to try to update it from the T4B1 (IMXRT 1052) to the current T4 (IMXRT 1062). It is still a Work In Progress, as for my own self I want to cleanup some of the output, plus put in some error checking which for example if you run out of room in DTCM, the tool should report this and return an error status.

I am not sure yet, if I should post the code again, or put into github or if someone already has a github project... But here is an example output, for a real simple sketch which can blink any pin...

### Code ###: What goes into ITCM versus what goes into Flash?
I believe the simple answer is, by default all code will try to be placed into the ITCM section. As you can see, just adding the Serial.println increased this sections size.

I believe the way to leave the code in the flash memory is by using the keyword: PROGMEM
Yes The T4 is different than The T3.x and TLC in that PROGMEM means something again. So for example if I change my sketch setup function to be defined like:

Code:

void PROGMEM setup() {
// Blink any pin. Note: I put pin 13 as input to see if we can
// jumper to it to see if we can find the pin...
while (!Serial && millis() < 5000);
DBGSerial.begin(115200);
delay (250);
DBGSerial.println("Find Pin by blinking");
DBGSerial.println("Enter pin number to blink");
DBGSerial.println("Defaults to pin 13");
pinMode(13, OUTPUT);
}

DTCM - By default I believe just about everything goes here? This include all of your global variables, both initialized and uninitialized variables.

Unlike the T3.x, variables such as arrays that are defined as const, will not stay in Flash, but instead will be copied at startup time into DTCM. So some programs that for example work on T3.6 may run into issues of running out of RAM.

FLASH - as I mentioned under DTCM, const data is by default not left in Flash, but instead moved into DTCM. You can tell the system to leave some specific const structures in flash, by using the PROGMEM keyword. Like:

Code:

const unsigned short teensy40_front[76800] PROGMEM={...};

Also I am not sure, but example earlier might imply that strings you pass to things like Serial.println, may also be left in Flash?

OCRAM - Again the other 512KB of memory...

So far I have found only two ways to use this memory. You can define a variable with the attribute: DMAMEM
Or you use malloc/new to allocate the memory.

So far I have not found any way to have a program put any initialized structures up in this region of memory.

The memory in the OCRAM section is defined as being cached WBWA – which can really screw up DMA. That is DMA operations will talk to the underlying memory whereas normal instructions will talk through the cache, which may or may not match. ….

Bad enough using it for DMA buffers, not sure how to get it to work with things like DMASettings. At lest I did not get them to work at all, especially when it involves replaceOnCompletion semanatics.

Will add more soon, but tired of typing!

Again those with moderator access, feel free to correct, add, ...

Also let me know if there is some additional things I should add/remove/modify.

As per earlier request, I put my current version of imxrt-size up on github: https://github.com/KurtE/imxrt-size
I believe this includes the .exe for windows. Sorry I have not tried building a version for Linux or MAC...

Not sure if there are other versions up there or not and or what is the best location for such a tool...

I have made a few changes to the output since the earlier stuff.

There are probably cleaner ways to set this up, to be used, but currently I just added a line to the platform.txt file, similar to what I think was @FrankB did earlier, that for my current setup looks like:

I am not sure how interesting this part might be, but with the uncannyeyes, which it appears like at least some of the issues we were running into is caused by unitialized members of the st7735_t3 code, which you will only ever see if you do a new of the display class...

But while debugging some of this stuff, I did add to my version of the sketch a couple of functions, to help me debug some stuff:

Which if we assume that this stack will not grow above 2K needed, implies we have over 400K in lower memory, that we might want to make available for usage.

Example currently in the ST7789_t3 DMA update code, I currently have the displays malloc their frame buffer. Yes I also have the option to allocate this myself and tell the display to use my own buffer... But if the user object does not do that, maybe it might want to give preference to having the frame buffer in DTCM where you don't have DMA cache issues...

Likewise currently I define a structure with some smaller buffers as well as DMASetting and DMAChannel structures and I define a set of three static ones for this class on the off chance the user will do a new of this display class. Might be good if again we could simply allocate lower memory on the fly when we need it.

Maybe does not need to be anything more than just allocate (ie. maybe don't support free or realloc...)

Will show that it is indeed up in the other memory region and not copied into ITCM...

However it appears like you can not have both. That is if you uncomment the line: uint8_t const_progmem_array[] PROGMEM = "XYZ";

You will get a compiler error that it conflicts with the data...

Have we found a way to resolve this?

The only way to do it is to have two different names for the sections, one name for sections that contain functions, and another name for sections that contain data, and then modify the linker script so it links both adjacent to each other. This will obvious require having two macros.

I believe this is due to the data .sections not wanting to set the executable bit in the ELF section information, and wanting to set that bit for functions.

The only way to do it is to have two different names for the sections, one name for sections that contain functions, and another name for sections that contain data, and then modify the linker script so it links both adjacent to each other. This will obvious require having two macros.

I believe this is due to the data .sections not wanting to set the executable bit in the ELF section information, and wanting to set that bit for functions.

Yes, FrankB discovered this during T4_Beta - using PROGMEM on DATA and CODE in the same compile unit caused a fail. He tried a counting macro to edit the name progressively and that failed - maybe for the reason MMeisnner noted.

Like maybe a build option: that says leave all code up in this memory, except for those functions marked as FASTRUN.

I've been considering making an alternate linker script which would default code and const variable into to slow-but-cached flash. This and the one we have now need user friendly names. I've been thinking something like "Use RAM to improve speed" and "Save RAM for variables".

But before that, I guess we need another name like PROGMEM to be used on functions. Anyone care to suggest words to be forever commandeered.... and create thorny conflicts if anyone uses that word in libraries or programs?

I've been considering making an alternate linker script which would default code and const variable into to slow-but-cached flash. This and the one we have now need user friendly names. I've been thinking something like "Use RAM to improve speed" and "Save RAM for variables".

But before that, I guess we need another name like PROGMEM to be used on functions. Anyone care to suggest words to be forever commandeered.... and create thorny conflicts if anyone uses that word in libraries or programs?

That makes sense. I do think we should also consider adding something like the more detailed output, like I show above like in post #6 to the standard setup, as to help make it clear when you might actually be running out of specific types of memory. But as you mentioned we probably need better names.

I agree that you probably need a different term than PROGMEM for program space, hopefully someone will come up with good names for things. Names are not my best thing. I would probably tend to call it something like SLOWRUN but that might not be the best :lol:

I've been considering making an alternate linker script which would default code and const variable into to slow-but-cached flash. This and the one we have now need user friendly names. I've been thinking something like "Use RAM to improve speed" and "Save RAM for variables".

But before that, I guess we need another name like PROGMEM to be used on functions. Anyone care to suggest words to be forever commandeered.... and create thorny conflicts if anyone uses that word in libraries or programs?

Since you asked. How about just the corollary - FUNCMEM. Don't think I have ever seen that in any of the libraries I looked at. No its not all that much fun but too tired this morning.