ARM Cortex M7 Cache, RAM, and Flash Performance
I recently ran some Dhyrstone benchmarking tests on an ARM Cortex M7 core to see how the tests performed from various memory locations. The ARM Cortex M7 is a microcontroller core contained in microcontrollers from various manufacturers. In my case, I used an STM32F723E from STMicroelectronics. The ARM Cortex M7 includes a data and instruction cache that can be used to improve performance. In the case of the STM32F723E, there is 8KB of each type.
Because most microcontrollers have integrated flash memory and RAM, I was curious how much performance could be boosted with the cache. When I dug a little deeper, I found it can be boosted a lot. I discovered a few reasons why:
- Memory buses don’t always run at the same speed as the core
- Fetching memory can sometimes take multiple clock cycles
- Bus contention (from DMA or other hardware) can delay data and instructions from getting to the core
The cache overcomes these problems by tightly coupling the cache to the core processor in the Cortex M7.
The Setup
I performed the tests using an off-the-shelf 32F723E-DISCO development board from STMicroelectronics running StratifyOS. It includes:
- 216MHz Core CPU Speed
- 512KB Internal Flash memory
- 64KB Tightly Coupled RAM
- 176KB Internal RAM
- 512KB External RAM on a 16-bit data bus
If you have a 32F723E-DISCO board, you can install Stratify OS using the following commands after you have installed the sl command line tool.
sl os.bootstrap:bootloader
sl os.bootstrap:os
The Tests
Once Stratify OS is installed, you can run the dhrystone application using the following commands:
sl bench.test:id=QpXcn3w2P1YUcatvAZZd # runs in flash
sl bench.test:id=QpXcn3w2P1YUcatvAZZd,ram
sl bench.test:id=QpXcn3w2P1YUcatvAZZd,ram,tightlycoupled
sl bench.test:id=QpXcn3w2P1YUcatvAZZd,ram,external
The Results
{
"type": "bar",
"data": {
"labels": ["DMIPS Cache On", "DMIPS Cache Off"],
"datasets": [
{
"label": "Flash",
"data": [
245,
49
],
"backgroundColor":"orange",
"borderColor":"orange"
},
{
"label": "RAM",
"data": [
245,
69
],
"backgroundColor":"blue",
"borderColor":"blue"
},
{
"label": "External RAM",
"data": [
245,
69
],
"backgroundColor":"red",
"borderColor":"red"
},
{
"label": "Tightly Coupled RAM",
"data": [
239,
217
],
"backgroundColor":"green",
"borderColor":"green"
}
]
}
}
Memory | Cache On | Cache Off | Cache Speed Up |
---|---|---|---|
Flash | 245 DMIPS | 49 DMIPS | 5x |
RAM | 245 DMIPS | 69 DMIPS | 5x |
External RAM | 245 DMIPS | 69 DMIPS | 5x |
Tightly Coupled RAM | 239 DMIPS | 217 DMIPS | 1.1x |
The Conclusion
The big takeaway is that applications running in external RAM run just as fast as applications running in any other memory as long as the cache is on. Not surprisingly, execution from tightly coupled memory was the least affected by the cache.