SWO – Getting started with Tooling

Basic Single Wire Output replaces a serial port for debug purposes, but that’s hardly scratching the surface of the full capability of what’s behind that pin. To get more out of it needs additional software on the host side, and that’s where Orbuculum makes its first appearance.

If you’re following along at home, and you’re of that kind of engineering mentality, you will have looked at the SWO output from the last blog post and noticed that every valid data byte was interspersed with a 0x00. That doesn’t matter to most terminal programs (although it will screw up flashy terminal handling in case you were trying to get clever) and it’s really just a way of the ITM reminding you that it’s still there, and would still like to play.

The ITM is documented in The ARMv7-M Architecture Reference Manual which is a right riveting read. It can actually output data four different types of data;

  • Software Trace: Messages generated by program code
  • Hardware Trace: Messages generated by the DWT, which the ITM then outputs
  • Time Stamps: Either relative to the the CPU clock or the SWO clock
  • Extension Packets: These aren’t used much in CORTEX-M, but the one facility they do provide is a ‘page extension’ to extend the number of available stimulus ports from 32 to 256.

The minimalist pseudo-serial port output from the last post is actually a degenerate example of the use of Software Trace outputting one byte messages from ITM channel 0. That’s the reason you’re seeing the 0’s interspersed with the data… but a lot more functionality is available.

An ITM message is, in general, a data packet of 8 to 32 bits. Program code can send out chunks of 8-32 bits via 32 ‘stimulus ports’. A write to stimulus port 0..31 on the target side of 1, 2 or 4 bytes will result in a ITM Software message being encoded and sent over the link. This effectively means you’ve got 32 individual channels of up to 32 bit width multiplexed onto a single serial link, and handled by the hardware. You can do that kind of thing just using software and a conventional serial port, but the ITM embeds that functionality in code you don’t have to write.

This makes the ITM Software channels ideal for separating different types of debug information for processing by the host; Channel 31 is reserved for Operating System support information, and 0 is generally used for 8 bit serial data (as we’ve already seen). The others are pretty much available for whatever purpose you wish. There’s no CMSIS support for anything other than Channel 0, but adding support for the other channels is trivial;

static __INLINE uint32_t ITM_SendChar (uint32_t c, uint32_t ch)
{
    if ((CoreDebug->DEMCR & CoreDebug_DEMCR_TRCENA_Msk) && /* Trace enabled */
         (ITM->TCR & ITM_TCR_ITMENA_Msk) && /* ITM enabled */
         (ITM->TER & (1ul << c) ) /* ITM Port c enabled */
        )
    {
        while (ITM->PORT[c].u32 == 0); // Port available?
        ITM->PORT[c].u8 = (uint8_t) ch; // Write data
    }
    return (ch);
}

I’ll leave it as an exercise for the reader how to create 16 and 32 bit variants of the write routine…or extend this one.

Anyway, while we’re here we’ll take a quick look at the hardware messages that the ITM conveys. These messages originate from the DWT and are encoded in a very similar way to the software ones. However, the message types are much more standardised, and offer an incredibly rich insight into the operation of the CPU, considering how minimal the implementation is. The defined messages are;

  • ID0 : Event Counter: the DWT maintains event counters for a number of distinct event types. When these counters ‘wrap around’ to zero then this event is emitted.
  • ID1: Exception Trace: One of the most versatile messages, this reports which interrupt is Entered, Exited or Returned to. By monitoring exception trace messages the host can identify exactly how interrupts are being handled.
  • ID2: Periodic Program Counter Sample Packets: the DWT can be configured to sample and report the current value of the Program Counter (PC). This allows statistical profiling and code coverage of an application running on the target without any code changes.
  • ID3-23: Data Trace Packets: These messages allow you to trigger events when certain data locations are accessed, values are changed or program locations hit. You might question how these messages differ from the capability afforded by the Debug module, but it’s much more intended for monitoring flows and triggering actions, rather than the interventional stuff that the Debug macrocell is generally used for.

You can see why the DWT is a bit of a Cinderella…its doing quite a lot of useful work and there’s a rich seam to be mined here, so we’ll be back to give it more attention in a future post.

Obviously the ITM has limited bandwidth, especially in comparison to the TRACEDATA pins, and it’s quite possible that it can be flooded by multiple data sources contending for it’s use. When that occurs there is a priority order to the messages that are output, with the end result that if you start seeing overflow messages, you can be reasonably sure that you are losing useful data. Unfortunately, the available bandwidth is the Achilles heel of the TRACESWO pin.

Lets consider the flexibility that the software source packets afford as a simple example of the use of the ITM. Doing this requires some software on the host side which, until recently, was limited and mostly only available in expensive (costing more than zero) proprietary packages, although OpenOCD and Sigrok both have some decode capability.

Orbuculum was created during early summer 2017 to capture and decode these SWO (and, specifically, ITM) flows. Running on OSX or Linux Orbuculum has significantly opened up the potential that SWO offers. In its core form it receives the data stream from the ITM (which may, optionally, have been through the TPIU multiplexer) and both presents it via TCP/IP port 3443 to any number of subsidiary client applications while simultaneously creating FIFOs delivering the decoded serial data to any local application that wants to use it.

The TCP/IP link is another thing we’ll deal with later, but for now, as an example, let’s consider an application where we want three debug serial flows (debug, clientEvents and Actions) with a 32-bit signed value Z and a 16-bit signed value Temperature.

Orbuculum can connect via a USB logic level UART, a Segger debug probe or, the default, a Black Magic Debug probe. For now, let’s assume we’re using the BMP, but it’s only a couple of slightly different command line options to connect to either a Segger or a logic level USB UART.

Anyway, to achieve all this functionality in one window type;

>orbuculum -m 1000 (+any probe selection options you need)

and in a second window;

orbfifo -b swo/ -c 0,debug,”%c” -c 1,clientEvents,”%c” -c 2,Actions,”%c” -c 3,Z,”Z=%d\n” \
                                  -c 4,Temperature,”Temp=%d\n”

orbfifo will connect to orbuculum and it will create, in the directory swo/, the following files;

swo/
  debug
  clientEvents
  Actions
  Z
  Temperature

(+1 more file, which we’re not going to deal with in this post)

These can be streamed via the ‘cat’ command, or copied to a regular file. On the target side writing to one of the ITM channels (0 = debug, 1 = clientEvents etc.) with the appropriate length message will cause that number of octets (comms people say ‘Octets’ rather than ‘Bytes’ cos we’re pedantic) to be sent over the link to pop out and be processed by Orbuculum on the host.

As with the simple serial streaming case we talked about in the last post, some configuration is required to get all the various bits and pieces of SWO pointing in the right direction and running at the same speed. In general you’ll find it’s easier to do that from the debug port rather than target program code, and there are gdb scripts and libraries for exactly that purpose shipped with Orbuculum.

Orbuculum is designed to be a pretty hardy piece of code. It will deal with the target (and the debug interface) appearing and disappearing as the debug cycle takes place. The intention is that it behaves more as a daemon than as a regular application program so that it becomes part of the instrumentation infrastructure that supports your debug activities. Orbfifo then bolts into orbuculum to post-process those data. Typically, I have several windows open each cat’ing one of the debug flows, and those windows are maintained through restarts, pauses and reboots of the target. Nowadays, programs like openocd and pyocd also can feed data directly to orbfifo and orbuculum itself isn’t needed … checkout the documentation for those programs to see how that is done.

So, you now have the ability to stream multiple, independent, information flows from your target to your host. More sophisticated exploitation of this capability will be the subject of the next few posts, once we’ve dealt with the hardware side messages from the DWT, SWOs Cinderella.