Using the ESP32-S3 ULP Coprocessor with the Arduino IDE

By Matt Gaidica, PhD • October 10, 2024February 17, 2025

The ESP32 devices’ Ultra-Low-Power (ULP) coprocessor enables access to peripherals such as RTC GPIO, RTC I2C, SAR ADC, or temperature sensors (TSENS). The features stated in the ESP32-S3 Technical Reference (TR; Section 2.2) are:

Access up to 8 KB of SRAM RTC slow memory for instructions and data
Clocked with 17.5 MHz RTC_FAST_CLK
Support working in normal mode and in monitor mode
Wake up the CPU or send an interrupt to the CPU
Access peripherals, internal sensors and RTC registers

To enable ESP32 programming, follow the guide to install ESP32 support in Arduino IDE. You will later see a series of #includes and you will want to hover (or CTRL/⌘) to ensure you are referencing the correct board version (e.g., esp32s3).

Memory and ULP Architecture

The ESP32-S3 internal memory architecture is outlined in TR 4.3.2: Internal ROM, Internal SRAM, and RTC Memory. The RTC memory is 16 KB in total, split between RTC FAST and RTC SLOW memory (8 KB each).

RTC Memory

The RTC (Real Time Clock) memory is implemented as Static RAM (SRAM), making it volatile. However, RTC memory has the added benefit of being persistent throughout deep sleep, meaning it retains its values during deep sleep.

RTC FAST Memory (8 KB):
RTC FAST memory can only be accessed by the CPU and is not accessible by the ULP co-processor. It typically stores instructions and data that need to persist across deep sleep.
RTC SLOW Memory (8 KB):
RTC SLOW memory can be accessed by both the CPU and the ULP co-processor, making it useful for storing instructions and sharing data between the CPU and the ULP co-processor.

ULP Coprocessor Overview

We will focus on the RTC SLOW memory (RTC_SLOW_MEM in Arduino IDE) because it bridges the main CPU and the ULP coprocessor (see TR Section 2.2).

They mention a ULP-FSM and ULP-RISC-V coprocessors: Arduino IDE will use the FSM; the RISC-V is only available through ESP-IDF. Work mode is determined automatically by the sleep functions you implement.

RTC_SLOW_MEM & ULP in Practice

RTC_SLOW_MEM is where the entire ULP program and any data IO are stored, nowhere else.

The best way to think about RTC_SLOW_MEM is that you will have an optional block of memory to interact with, followed by the ULP program itself. As the figure shows, the interactive memory is protected by loading the ULP program relative to an offset. If you modify any memory associated with the ULP program after loading it, you are effectively overwriting assembly code instructions.

You can read more about the ESP32 ULP Coprocessor Instruction Set; note that although RTC_SLOW_MEM uses 32-bit addressing and value stores, the ULP coprocessor effectively operates on 16-bit logic. Therefore, you should read values from RTC_SLOW_MEM with a 0xFFFF mask to isolate the lower 16-bits. This also means that the assembly instructions that involve integer comparisons (or the delay function) will only work with values 0x0000 to 0xFFFF.

Coding for the ULP in Arduino IDE

There is some precedence for including pure assembly files using the ulptool on GitHub; however, this project has not been maintained for over four years and requires installing OS-specific toolchains. There also exists some commentary on using micropython for ULP programming. You might be inspired (or misguided) by ESP-IDF ULP Examples, but they require an assembly compiler native to ESP-IDF; an alternative is to refer to projects like this ADC sampler on GitHub that have assembly code written.

To program the ULP “natively” in Arduino IDE, we will focus on the Espressif-documented method of Programming the ULP FSM Coprocessor Using C Macros. Note, you should refer to your own ulp.h file for the accurate list of co-processor instructions—the Espressif documentation differs slightly.

The ULP C-macros are assembly-like functions in a ULP program structure and are loaded into the ULP coprocessor at runtime. As the figure above highlights, the ULP program itself is not limited by RTC_SLOW_MEM itself, but CONFIG_ULP_COPROC_RESERVE_MEM, which is currently set to 512 (bytes) for the ESP32-S3.

As I understand it, this reserves 512 * 32-bit-words for a total of 2,048 bytes, in contrast to the 8 KB stated in the technical reference. The function used to load the program will toss an error if your ULP program is too large, so it is wise to handle it.

The Pulse Counter Challenge

The particular problem I set out to solve is how to count the pulses of a square wave using the ULP (in deep sleep). Before solving that, let’s review some similar problems. We can easily wake up from deep sleep using state-based interrupts on the ESP32.

The ESP32 does not have a concept of onChange interrupts that you might be familiar with for other Arduino devices. The ESP32-S3 has a pulse counter, but this does not operate in deep sleep; it is ideal for main-core, background encoder counting (see an Arduino Example on GitHub, *no deep sleep*).

This interrupt routine does not work well for counting pulses because you will be exiting deep sleep just to increment a number, all while handling states and adjusting the interrupt level. Deep sleep can take 100-500 milliseconds to wake up, depending on what needs re-initialization.

That is, with one caveat: You can implement a Deep-sleep Wake Stub that runs before virtually any other initialization. As seen in the Application Example, you can hit the stub and go right back to deep sleep (roughly ~25 milliseconds awake; see a power profile in an Espressif blog post). However, as you can see in another GPIO counting example, it’s unclear if you can swap the interrupt level; in this example, the wake stub waits for the logic level to flop before entering deep sleep again.

Arduino Pulse Counter using a ULP Program

View my ULP_Example.ino on GitHub. The pulse counter ULP program counts all transitions, which can be divided by two if you want a single-state transition (e.g., all HIGH-to-LOW).

Transition Detection: The program monitors a GPIO pin (through RTC_GPIO_INDEX) to detect transitions (state changes) between HIGH and LOW.
Transition Counter: The program uses register R3 as a counter to track the number of transitions (edge detections) observed on the monitored GPIO pin.
State Comparison: The current GPIO state is stored in R1 and compared to the previous state stored in R2. If the state has changed, the transition counter (R3) is incremented.
Debounce Mechanism: After each GPIO state check, the program introduces a delay (I_DELAY(0xFFFF)) to debounce the input, ensuring that rapid transitions are not falsely counted. The total delay is approximately 22.44 ms.
Store Transition Count: The value of the transition counter is periodically stored in RTC_SLOW_MEM[EDGE_COUNT], making it available to the main processor for further use.
Looping Behavior: The program continuously loops, checking for GPIO state changes and updating the transition counter while applying the debounce delay to prevent noise from being detected as multiple transitions.
GPIO indexes are offset by RTC_GPIO_IN_NEXT_S. This is the mapping provided in rtc_io_reg.h—GPIO and RTC indexes are not the same.

Because some of the instruction macros expand to inline function calls, defining such array in global scope may cause the compiler to produce an “initializer element is not constant” error. To fix this error, move the definition of instructions array into local scope.

Other Resources

I found these examples of reading an ADC to be informative, and I like leveraging an enum to set both ULP variables and program address for RTC_SLOW_MEM. I found that RTC_DATA_ATTR variable types are placed somewhere downstream of RTC_SLOW_MEM, making it difficult to rely on a known location in the ULP program (this is likely determined by the linker). For example:

RTC_DATA_ATTR int countValue = 0; // locates to 0x50000200, not sure why

Working ULP Program Snippets

10 Comments

Michael Thomas says:

December 7, 2024 at 10:21 pm

This is great. Loading the ULP program by macro is very convenient and gives a lot of flexibility, and avoids the need for installing the additional ULP compiler tool.
I was seeing an issue where the counter was sometimes resetting.
I added a check to see if the ULP program had already been initialised, and only load the ULP program if it hasn’t been initialised already – this seems to have fixed the issue.

Fix as follows:
— At definitions
#define ULP_INIT_MARKER_ADDR 100 // Address in RTC_SLOW_MEM for the marker
#define ULP_INIT_MARKER 0x1234 // Unique marker to identify initialization
— At start of init_ulp_program()
if (RTC_SLOW_MEM[ULP_INIT_MARKER_ADDR] == ULP_INIT_MARKER) {
Serial.println(“ULP program already initialized.”);
return;
}
— At the end of init_ulp_program()
// Set the marker to indicate initialization is complete
RTC_SLOW_MEM[ULP_INIT_MARKER_ADDR] = ULP_INIT_MARKER;

Also, I was using an older esp32 chip, I had to make the following changes
– Use “esp32/ulp.h” instead of “esp32s3/ulp.h”
– Use GPIO_NUM_14, RTC index 16 (pin 18 didn’t seem to work, I think the newer chips may have more gpios available?)

Reply
1. Matt Gaidica, PhD says:
  
  December 8, 2024 at 4:49 am
  Hi Michael, that’s a really good point: my intended behavior is to wake up, log the counts, and then re-cycle the counter from 0. In your case, you indeed need some non-volatile flag to manipulate in order to distinguish a user-reset vs. a dee-sleep reset.
  
  A cleaner way to do this would be to leverage the enum:
```
enum {
  EDGE_COUNT,
  INIT_FLAG,
  SLOW_PROG_ADDR  // Program start address
};
```
  This gives you a 32-bit word ahead of the program (which you likely don’t know the size off) just to ensure you never overrun your flag with the program; the program is loaded beyond the counter and initializer by:
```
size_t size = sizeof(ulp_program) / sizeof(ulp_insn_t);
esp_err_t err = ulp_process_macros_and_load(SLOW_PROG_ADDR, ulp_program, &size);  // offset by PROG_ADDR
```
  And yes, the ESP32-S3 has some additional GPIO that can be used with the ULP.
  Reply
  1. Michael Thomas says:
    
    December 19, 2024 at 12:57 am
    
    Thanks for the cleaner way to set the non-volatile flag 👍
    
    Reply
  2. Michael Thomas says:
    
    December 19, 2024 at 1:15 am
    
    Hi Matt
    One more question please.
    I recently bought the “DFRobot Firebeetle 2” board which is ESP32c6 chipset.
    The “ulp.h” library doesn’t seem to be available in the espressif SDK.
    You wouldn’t have any suggestion for this?
    Thanks
    Michael
    
    Reply
    1. Matt Gaidica, PhD says:
      
      December 20, 2024 at 2:50 pm
      
      Hi Michael, I do not know the answer to this. I tend to be entrenched in Arduino for shareability with colleagues.
      
      Reply
      1. Robert says:
        
        January 20, 2025 at 6:46 am
        
        I think you gents may be slightly talking past each other. The whole FSM/ULP scheme is a legacy thing that exists only in the Xtensa-based parts. See:
        
        https://github.com/espressif/esp-idf/blob/v5.4/components/ulp/ulp_fsm/include/ulp_fsm_common.h
        
        https://github.com/espressif/esp-idf/tree/v5.4/components/ulp/ulp_fsm/include/esp32
        
        https://github.com/espressif/esp-idf/tree/v5.4/components/ulp/ulp_fsm/include/esp32s2
        
        https://github.com/espressif/esp-idf/tree/v5.4/components/ulp/ulp_fsm/include/esp32s3
        
        Espressif licensed XTensa from Cadence and needed something even smaller for low-power ops but they only needed a few opcodes so they built their own little _thing_ that they called their finite state machine “processor”. They never made a really awesome API to get to it. This is why there’s one common include and one ulp.h in each of the three chips that has it. This FSM has four registers and a total of about a dozen opcodes. You can see a representative list for S3 at https://github.com/espressif/esp-idf/blob/67c1de1eebe095d554d281952fde63c16ee2dca0/components/ulp/ulp_fsm/include/esp32s3/ulp.h#L39
        
        All this is why the old ULP stuff has its own compilers to handle register allocation, opcode generation, goofy preprocessor generation of opcodes, obscure linking rules, and all that.
        
        CEO of Espressif said that all chips after S3 (later in 2020) would be RISC-V. This meant they didn’t have to wait for the GCC/GAS/GDB wizards to reverse engineer the Cadence LX6/LX7 style parts any more and they had access to a plethora of high-quality, open development tools for the RISC-V cores.
        
        Since they were using RISC-V for their “big” (160Mhz is large by embedded standards) cores that were user-facing, why not use smaller RISC-V parts – maybe even built in-house; it’s a pretty common college assignment – to handle the low power support? So they did. Thus all the fsm stuff just goes away on newer parts like C3, C2, C5, P4, and … taa daa … C6.
        
        So on the RISC-V parts, all the old FSM/ULP stuff is just not there. It’s replaced by what they call their ULP (Ultra Low Power) core that’s a real RISC-V ISA core that has all the standard base opcodes plus the extensions for IMAC (Integer, Mult & Div, Atomic, and Compressed). This part also has an interrupt controller of its own and other feature that would have made it a reasonable CPU of its own in the early 90’s.
        
        So the good news is that you can program the low-power part (20Mhz or so) of the C6 (and C3 and P4 and …) almost exactly like you do the high-power (160-400Mhz) cores. You can program them in C or C++ even! You can use the real compiler. The process is spelled out in a new section of the C6 (representative of modern RISC-V variants) at:
        
        https://docs.espressif.com/projects/esp-idf/en/stable/esp32c6/api-reference/system/ulp-lp-core.html
        
        The basic parts of Matt’s tutorial are still very much on the nose. The block diagram is still pretty similar, just imagine hoisting out one dedicated, single-use, (weird) CPU that handled the LP side of things and dropping in a tiny version of the main CPU.
        
        Now there was one more zinger in your conversation that may have prevented you from quite connecting. Espressif provides the awesome ESP-IDF toolkit with compilers and debuggers and build system and all of that. For ESP32, Arduino is a layer of stuff smeared on the top of it to make it act a little more like an 8-bit part from the early 90’s. For the easy parts of it, it’s pretty thin wrapping. digitalWrite(p,v) vs gpio_set_level(p,v). For the aspects that don’t exist on on an ancient AVR ATmega8 the wrapping in Arduino gets weirder because id doesn’t exist. Clearly The 512-byte arduino didn’t exactly have low power management with two cores, so it’s a forced fit. In Espressif’s eyes, the ESP-IDF support described at ulp-lp-core above is fine. (And it is…) But you can’t program it with the Arduino toolkit because Arduino doesn’t exactly have models for such things as it’s just a whole class of chip that the native Arduino ecosystem doesn’t have to deal with….and it’s been only fairly recently added to ESP-IDF – which opens another riff in time.
        
        If PlatformIO is involved, C6 just isn’t supported because PlatformIO is not supporting any of the newer espressif chips – and barely supporting the older ones, including not accepting community-provided fixes to the platformio/arduino project. But you must have found your way around that (probably by moving to pioarduino instead of platformio if you needed c6) if you’re already working with Arduino on C6.
        
        So: to recap:
        There IS a very robust, much better (from a computer science view) ULP mode for C6. It’s documented in the link I provided above and works great in ESP-IDF. You can’t really edit it with the Arduino editors, but you can probably reference the IDF symbols from within the Pioarduino project.
        
        Hope this is enough to get more projects going on C6. Code on!
      2. Matt Gaidica, PhD says:
        
        January 20, 2025 at 7:09 am
        
        Fantastic, Robert! I truly appreciate the rundown and most importantly, the deprecation notice of the ULP if me or anyone else is moving into the RISC-V architecture. I also agree that doing this in Arduino is clunky… and ultimately a portability and distribution constraint for my [non-technical] end users. However, after many hours playing with the ULP and tuning things just-so, it could be argued that if it plays a central role, the technical burden is increased because that section of the code is going to be impenetrable anyway vs. a more straight forward C implementation. Thanks again and stay in touch,
Marcos Silva says:

January 13, 2025 at 12:24 pm

Hi Matt

Congratulations on the pulse counter article. I’m a beginner and I’m trying to use the ULP processor in my esp32 project to capture a signal with a specific frequency. Is it possible to program the ULP to wake up with the desired frequency? Can I use I_delay (1000ms) to determine the number of pulses per second and consequently the frequency?

Thanks

Reply
1. Matt Gaidica, PhD says:
  
  January 13, 2025 at 1:05 pm
  
  Hi Marcos,
  
  Funny you mention this, I was recently implementing a fixed-time ULP program. You *can* use the ULP timer to wakeup at specific intervals, see the example here: ESP32 ULP Timer Example.
  
  That example will run the ULP until I_HALT() every 100ms resulting in this output:
  
  Deep sleep time: 5 seconds ULP timer period: 100 milliseconds Counter value: 54 Initializing ULP timer program Entering deep sleep
  
  Notice you lose about 40ms between when the ULP runs and you grab the values at wakeup.
  
  There’s an important caveat: you must call I_HALT() for the timer to work. I also provided/commented a ULP program that never reaches a I_HALT() statement and you will find the ULP only runs once per deep sleep cycle:
  
  Deep sleep time: 5 seconds ULP timer period: 100 milliseconds Counter value: 1 Initializing ULP timer program Entering deep sleep
  
  This makes the ULP timer of limited utility for loop-sampling, perhaps other than incrementing your storage buffer (RTC_MEM) position at the beginning of every ULP cycle and accumulating “pulses-per-second” in those, then averaging them when deep sleep exits—just make you don’t overrun the RTC_MEM space (of length CONFIG_ULP_COPROC_RESERVE_MEM, each slot is int16).
  
  Your application depends on how often you want to wakeup from deep sleep to do something with your count… I could imagine all sorts of tricks to make an efficient counting system.
  
  Reply
Marcos Silva says:

January 21, 2025 at 7:11 am

Hi Matt! Your application is very interesting. I think I understood the logic of the program. I think it’s the way to develop my application. “An IR sensor captures the beating frequency of the mosquito wings, if it is true, it triggers the capture system.” I thought of a kind of frequency meter. Thank you for sharing your knowledge with us

Reply