Cortex-M7 Launches: Embedded, IoT and Wearables
by Stephen Barrett on September 23, 2014 7:01 PM ESTThe Cortex-M7 CPU
The primary focus of the Cortex-M7 is improved performance. ARM’s goal was to elevate the M series performance to a level previously unseen, while maintaining the M series' signature small die size and tiny power consumption. There are at least two reasons ARM focused on performance for the M7 processor. First, they want to further drive a wedge between traditional 8- and 16-bit microcontrollers and provide ARM a further differentiated market position; second, the M7 will help support the IoT (Internet of Things) and wearable device markets. Focusing on enhanced DSP capabilities, the M7 is more suited to audio and visual sensor hub processing than any previous M series design.
Digging into the details, the Cortex-M7 features a six-stage, in-order, dual-issue superscalar pipeline with single- and double-precision floating point units, instruction and data caches, branch prediction, SIMD support, and tightly coupled memory. Here's the high level view of the pipeline:
The presence of instruction and data caches, branch prediction, as well as tightly coupled memory are differentiating features of the M7 versus previous M series processors. Microcontrollers often forego caches and sometimes even operate with flash as the only memory interface. By providing high performance instruction and data caches, the M7 approaches more typical high performance processor design.
Tightly coupled memory (TCM) is a technology ARM’s partners can use to extend the effective caching of a single M7 processor and has only been seen in previous A and R series designs. In use, it can have the performance of a cache but, unlike cache, its contents are directly controlled by the developer. That is, TCM is part of the physical memory map of the microcontroller. Developers can place critical code and data inside TCM that can be deterministically accessed with high performance in routines such as interrupt service requests. The M7 supports up to 16 MB of tightly coupled memory.
Adding branch prediction allows arm to target dedicated DSP devices with its Cortex-M7 microcontroller. DSP code is often analog data stream filters for applications such as audio input keyword detection, audio output equalization, and frequency domain amplitude peak searching. When running on an always-on microcontroller these tasks are almost always looped. Without a branch predictor, the code must continually evaluate a loop condition that 99.9% of the time results in the same outcome. Branch predictors cost extra die space but when DSP is your target, they are an obvious design benefit.
Summarizing the M series cores can be done both from an instruction features standpoint and also a die size and performance standpoint. Unfortunately ARM, who provides HDL (Hardware Description Language) that can be synthesized to physical chips, was not yet willing to provide die size numbers until their partner Cortex-M7 announcements, since the processor does not become physical until a partner gets involved. Until a partner releases data, we can simply assume the M7 somewhat larger than its predecessors.
ARM Cortex-M Instruction Sets | |||||||||||
M0 | M0+ | M3 | M4 | M7 | |||||||
Thumb | Most | Most | Entire | Entire | Entire | ||||||
Thumb-2 | Subset | Subset | Entire | Entire | Entire | ||||||
Hardware multiply | 1 or 32 cycles | 1 or 32 cycles | 1 cycle | 1 cycle | 1 cycle | ||||||
Hardware divide | No | No | Yes | Yes | Yes | ||||||
Saturated math | No | No | Yes | Yes | Yes | ||||||
DSP Extensions | No | No | No | Yes | Yes, enhanced | ||||||
Floating-point | No | No | No | Optional single precision | Yes | ||||||
Tightly coupled memory | No | No | No | No | yes | ||||||
Architecture | ARMv6-M | ARMv6-M | ARMv7-M | ARMv7-M | ARMv7-M | ||||||
Cache Architecture | Von Neuman | Von Neuman | Harvard | Harvard | Harvard |
ARM Cortex-M Area, Power, Performance | |||||||||||
M0 | M0+ | M3 | M4 | M7 | |||||||
90nm LP dynamic power (µW/MHz) | 16 | 9.8 | 32 | 33 | n/a | ||||||
90nm LP area mm2 | 0.04 | 0.035 | 0.12 | 0.17 | n/a | ||||||
40nm G dynamic power (µW/MHz) | 4 | 3 | 7 | 8 | n/a | ||||||
40nm G area mm2 | 0.01 | 0.009 | 0.03 | 0.04 | n/a | ||||||
Dhrystone (official) DMIPS/MHz | 0.84 | 0.94 | 1.25 | 1.25 | 2.14 | ||||||
Dhrystone (max options) DMIPS/MHz | 1.21 | 1.31 | 1.89 | 1.95 | 3.23 | ||||||
CoreMark/MHz | 2.33 | 2.42 | 3.32 | 3.40 | 5.04 |
ARM did state that power consumption of M7 is roughly in line with previous performance/mW, so we could estimate a corresponding increase of 50% to 75% more power consumption. Area is anyone's guess at the moment.
43 Comments
View All Comments
Guspaz - Tuesday, September 23, 2014 - link
"General purpose operating systems such as Linux (Android), Windows, OSX, and iOS require an MMU to function. That means M series processors, like all microcontrollers (MCUs), will never be tasked with running general purpose operating systems."This is incorrect: Linux runs on the Cortex M and other platforms that lack an MMU via uClinux. There are some differences, yes, but for the most part it's transparent to the developer, and most software runs unmodified.
extide - Tuesday, September 23, 2014 - link
This is true, but there are some pretty big limitations when using uClinux. I wouldn't suggest it unless you really really need it, heh.HardwareDufus - Tuesday, September 23, 2014 - link
Like the Cortex-M3 & M4, it is a 32-bit ARMv7-M core processor. I is said to use a six-stage superscalar pipeline.The ARM press release says that it will have highly flexible system and memory interfaces. Looking forward to seeing more details on that... (though of course it will lack a MMU).
It launches manufactured on a 40nm process and runs up to 400Mhz.
However it will move to a 28nm process in the near future, where performance is expected to double (so one can assume a near doubling of clock speed as well).
Atmel is already said to have a license. Will be interesting to see if the Arduino folks pick this processor up for a new Arduino board. Arduino has picked up the Cortex-0+ for the new Arduino ZERO and uses the Cortex-M3 for the Arduino DUE. But they've yet to use the Cortex-M4. Their latest board, the Arduino TRE uses a Texas Instruments Sitari chip which is a Cortex A series processor. So who knows what direction they are moving in.
Texas Instruments licenses the Cortex-M processors, but I haven't heard of a license for this new processor. I just picked up a nice development board that uses the Cortex-M4 from TI.
HardwareDufus - Tuesday, September 23, 2014 - link
Yes, I over used the phrase 'picked up' and cannot edit it. Feel free to substitute chose, selected, employed, used, etc...xenol - Wednesday, September 24, 2014 - link
Speaking from someone who's used MCUs and the M3/M4 series micro-controllers in development (professionally and hobby), I don't really see a point in using a general purpose OS on a small footprint. Something like this should be doing one thing and doing it very well.There are plenty of RTOSes out there that take up a very tiny footprint that will do memory management and "thread management" for you if you need it. Heck all of my projects have gotten away with doing amazing things without needing a heap (i.e., calling malloc).
The biggest thing I don't like about uCLinux is it requires external RAM. A lot of development boards don't have them.
akdj - Sunday, September 28, 2014 - link
Im curious as to which RTOS, micro controller or any other 'hobby or professional' use case that rivals this M7? It's footprint, efficiency, capabilities and phenomenal 'future' opportunities this motion sensor will provide. You're using vacuum tubes. As are all wearables to date. Those utiliIng this new ARM architecture will be moving directly to SolidState. Overnight. It's THAT HUGE!Best to 're-think' your tools and options, as using a micro-controller in your new project slated for mid 2015 release will be chewed up and spat to the floor by anyone using a 'currently available' M7. Difference between a typewriter and computer with Word Prodessor, with today's capabilities!
J
isa - Tuesday, September 23, 2014 - link
I can't apologize enough for my profound ignorance and stupidity, but what exactly were the 2 similar IoT anouncements on Anandtech in the last 2 days as referenced in the article? I can only find the Mediatek announcement. In the future, links to references would help stupid people like me.extide - Tuesday, September 23, 2014 - link
Mediatek, and this oneloftie - Tuesday, September 23, 2014 - link
If the embargo on the slides are correct, you published an hour early.Stephen Barrett - Tuesday, September 23, 2014 - link
Yeah the CMS failed me on daylight savings. Hopefully we will be forgiven ;-)