Case Study

Using the ESP32-S3 ULP Coprocessor with the Arduino IDE

The ESP32 devices’ Ultra-Low-Power (ULP) coprocessor enables access to peripherals such as RTC GPIO, RTC I2C, SAR ADC, or temperature sensors (TSENS). The features stated in the ESP32-S3 Technical Reference (TR; Section 2.2) are:

  • Access up to 8 KB of SRAM RTC slow memory for instructions and data
  • Clocked with 17.5 MHz RTC_FAST_CLK
  • Support working in normal mode and in monitor mode
  • Wake up the CPU or send an interrupt to the CPU
  • Access peripherals, internal sensors and RTC registers

To enable ESP32 programming, follow the guide to install ESP32 support in Arduino IDE. You will later see a series of #includes and you will want to hover (or CTRL/⌘) to ensure you are referencing the correct board version (e.g., esp32s3).

Memory and ULP Architecture

The ESP32-S3 internal memory architecture is outlined in TR 4.3.2: Internal ROM, Internal SRAM, and RTC Memory. The RTC memory is 16 KB in total, split between RTC FAST and RTC SLOW memory (8 KB each).

RTC Memory

The RTC (Real Time Clock) memory is implemented as Static RAM (SRAM), making it volatile. However, RTC memory has the added benefit of being persistent throughout deep sleep, meaning it retains its values during deep sleep.

  • RTC FAST Memory (8 KB):
    RTC FAST memory can only be accessed by the CPU and is not accessible by the ULP co-processor. It typically stores instructions and data that need to persist across deep sleep.
  • RTC SLOW Memory (8 KB):
    RTC SLOW memory can be accessed by both the CPU and the ULP co-processor, making it useful for storing instructions and sharing data between the CPU and the ULP co-processor.

ULP Coprocessor Overview

We will focus on the RTC SLOW memory (RTC_SLOW_MEM in Arduino IDE) because it bridges the main CPU and the ULP coprocessor (see TR Section 2.2).

They mention a ULP-FSM and ULP-RISC-V coprocessors: Arduino IDE will use the FSM; the RISC-V is only available through ESP-IDF. Work mode is determined automatically by the sleep functions you implement.

RTC_SLOW_MEM & ULP in Practice

RTC_SLOW_MEM is where the entire ULP program and any data IO are stored, nowhere else.

The best way to think about RTC_SLOW_MEM is that you will have an optional block of memory to interact with, followed by the ULP program itself. As the figure shows, the interactive memory is protected by loading the ULP program relative to an offset. If you modify any memory associated with the ULP program after loading it, you are effectively overwriting assembly code instructions.

You can read more about the ESP32 ULP Coprocessor Instruction Set; note that although RTC_SLOW_MEM uses 32-bit addressing and value stores, the ULP coprocessor effectively operates on 16-bit logic. Therefore, you should read values from RTC_SLOW_MEM with a 0xFFFF mask to isolate the lower 16-bits. This also means that the assembly instructions that involve integer comparisons (or the delay function) will only work with values 0x0000 to 0xFFFF.

Coding for the ULP in Arduino IDE

There is some precedence for including pure assembly files using the ulptool on GitHub; however, this project has not been maintained for over four years and requires installing OS-specific toolchains. There also exists some commentary on using micropython for ULP programming. You might be inspired (or misguided) by ESP-IDF ULP Examples, but they require an assembly compiler native to ESP-IDF; an alternative is to refer to projects like this ADC sampler on GitHub that have assembly code written.

To program the ULP “natively” in Arduino IDE, we will focus on the Espressif-documented method of Programming the ULP FSM Coprocessor Using C Macros.

The ULP C-macros are assembly-like functions in a ULP program structure and are loaded into the ULP coprocessor at runtime. As the figure above highlights, the ULP program itself is not limited by RTC_SLOW_MEM itself, but CONFIG_ULP_COPROC_RESERVE_MEM, which is currently set to 512 (bytes) for the ESP32-S3.

As I understand it, this reserves 512 * 32-bit-words for a total of 2,048 bytes, in contrast to the 8 KB stated in the technical reference. The function used to load the program will toss an error if your ULP program is too large, so it is wise to handle it.

The Pulse Counter Challenge

The particular problem I set out to solve is how to count the pulses of a square wave using the ULP (in deep sleep). Before solving that, let’s review some similar problems. We can easily wake up from deep sleep using state-based interrupts on the ESP32.

The ESP32 does not have a concept of onChange interrupts that you might be familiar with for other Arduino devices. The ESP32-S3 has a pulse counter, but this does not operate in deep sleep; it is ideal for main-core, background encoder counting (see an Arduino Example on GitHub, *no deep sleep*).

This interrupt routine does not work well for counting pulses because you will be exiting deep sleep just to increment a number, all while handling states and adjusting the interrupt level. Deep sleep can take 100-500 milliseconds to wake up, depending on what needs re-initialization.

That is, with one caveat: You can implement a Deep-sleep Wake Stub that runs before virtually any other initialization. As seen in the Application Example, you can hit the stub and go right back to deep sleep (roughly ~25 milliseconds awake; see a power profile in an Espressif blog post). However, as you can see in another GPIO counting example, it’s unclear if you can swap the interrupt level; in this example, the wake stub waits for the logic level to flop before entering deep sleep again.

Arduino Pulse Counter using a ULP Program

View my ULP_Example.ino on GitHub. The pulse counter ULP program counts all transitions, which can be divided by two if you want a single-state transition (e.g., all HIGH-to-LOW).

  • Transition Detection: The program monitors a GPIO pin (through RTC_GPIO_INDEX) to detect transitions (state changes) between HIGH and LOW.
  • Transition Counter: The program uses register R3 as a counter to track the number of transitions (edge detections) observed on the monitored GPIO pin.
  • State Comparison: The current GPIO state is stored in R1 and compared to the previous state stored in R2. If the state has changed, the transition counter (R3) is incremented.
  • Debounce Mechanism: After each GPIO state check, the program introduces a delay (I_DELAY(0xFFFF)) to debounce the input, ensuring that rapid transitions are not falsely counted. The total delay is approximately 22.44 ms.
  • Store Transition Count: The value of the transition counter is periodically stored in RTC_SLOW_MEM[EDGE_COUNT], making it available to the main processor for further use.
  • Looping Behavior: The program continuously loops, checking for GPIO state changes and updating the transition counter while applying the debounce delay to prevent noise from being detected as multiple transitions.
  • GPIO indexes are offset by RTC_GPIO_IN_NEXT_S. This is the mapping provided in rtc_io_reg.h—GPIO and RTC indexes are not the same.

Because some of the instruction macros expand to inline function calls, defining such array in global scope may cause the compiler to produce an “initializer element is not constant” error. To fix this error, move the definition of instructions array into local scope.

Other Resources

I found these examples of reading an ADC to be informative, and I like leveraging an enum to set both ULP variables and program address for RTC_SLOW_MEM. I found that RTC_DATA_ATTR variable types are placed somewhere downstream of RTC_SLOW_MEM, making it difficult to rely on a known location in the ULP program (this is likely determined by the linker). For example:

RTC_DATA_ATTR int countValue = 0; // locates to 0x50000200, not sure why

Working ULP Program Snippets

Leave a Reply

Your email address will not be published. Required fields are marked *