FastSPI_LED2 is a rebuild from the ground up using multiple layers of components that, it turns out, may have usages outside of just the LED library! The lowest level of these is the Pin access library. This is designed to allow me to write higher level code accessing pins, using the fastest mechanisms available on known platforms, and falling back to workable methods on arduino platforms that don’t have all the information in place for Mostest Speed[tm]. The goal with the Pin class is to make pin access as easy/trivial as possible, and also as portable as possible. For example, here’s a simple bit of code to blink a pin:
#include "FastSPI_LED2.h" setup() { Pin<13>::setOutput(); } loop() { Pin<13>::hi(); delay(200); Pin<13>::lo(); delay(200); } // even shorter loop() { Pin<13>::toggle(); delay(200); }
and that’s it! What you don’t see, though, is the mechanism that is used under the hood. Most introductory arduino code recommends using methods like digitalWrite – which has a lot of overhead for turning a pin on or off. While that may not be important for a simple example like this, with something like FastSPI_LED2, high performance is part of the name and the game! The Pin class/library, under the hood, tries to use the most efficient/speedy method for twiddling the pin that it can.
For example, on an AVR, the hi() call compiles down to a single avr operation, which runs in a single clock cycle, seen in the below disassembly:
000000aa : <loop> aa: 2d 9a sbi 0x05, 5 ; 5
On an arduino where I haven’t yet defined pin mappings, this code looks more like:
00000100 : <loop> 100: e0 91 0a 01 lds r30, 0x010A 104: f0 91 0b 01 lds r31, 0x010B 108: 80 81 ld r24, Z 10a: 90 91 09 01 lds r25, 0x0109 10e: 89 2b or r24, r25 110: 80 83 st Z, r24
A few more instructions, part of why i’m working on making sure I get pin definitions in for as many platforms as possible. Still though, better than using digital write which becomes:
00000100 <loop>: 100: 8d e0 ldi r24, 0x0D ; 13 102: 61 e0 ldi r22, 0x01 ; 1 104: 0e 94 b5 01 call 0x36a ; 0x36a <digitalWrite>
and that digital write code? Well, let’s take a look at the disassembly of digitalWrite:
0000036a <digitalWrite>: 36a: 48 2f mov r20, r24 36c: 50 e0 ldi r21, 0x00 ; 0 36e: ca 01 movw r24, r20 370: 82 55 subi r24, 0x52 ; 82 372: 9f 4f sbci r25, 0xFF ; 255 374: fc 01 movw r30, r24 376: 24 91 lpm r18, Z+ 378: ca 01 movw r24, r20 37a: 86 56 subi r24, 0x66 ; 102 37c: 9f 4f sbci r25, 0xFF ; 255 37e: fc 01 movw r30, r24 380: 94 91 lpm r25, Z+ 382: 4a 57 subi r20, 0x7A ; 122 384: 5f 4f sbci r21, 0xFF ; 255 386: fa 01 movw r30, r20 388: 34 91 lpm r19, Z+ 38a: 33 23 and r19, r19 38c: 09 f4 brne .+2 ; 0x390 <digitalWrite+0x26> 38e: 40 c0 rjmp .+128 ; 0x410 <digitalWrite+0xa6> 390: 22 23 and r18, r18 392: 51 f1 breq .+84 ; 0x3e8 <digitalWrite+0x7e> 394: 23 30 cpi r18, 0x03 ; 3 396: 71 f0 breq .+28 ; 0x3b4 <digitalWrite+0x4a> 398: 24 30 cpi r18, 0x04 ; 4 39a: 28 f4 brcc .+10 ; 0x3a6 <digitalWrite+0x3c> 39c: 21 30 cpi r18, 0x01 ; 1 39e: a1 f0 breq .+40 ; 0x3c8 <digitalWrite+0x5e> 3a0: 22 30 cpi r18, 0x02 ; 2 3a2: 11 f5 brne .+68 ; 0x3e8 <digitalWrite+0x7e> 3a4: 14 c0 rjmp .+40 ; 0x3ce <digitalWrite+0x64> 3a6: 26 30 cpi r18, 0x06 ; 6 3a8: b1 f0 breq .+44 ; 0x3d6 <digitalWrite+0x6c> 3aa: 27 30 cpi r18, 0x07 ; 7 3ac: c1 f0 breq .+48 ; 0x3de <digitalWrite+0x74> 3ae: 24 30 cpi r18, 0x04 ; 4 3b0: d9 f4 brne .+54 ; 0x3e8 <digitalWrite+0x7e> 3b2: 04 c0 rjmp .+8 ; 0x3bc <digitalWrite+0x52> 3b4: 80 91 80 00 lds r24, 0x0080 3b8: 8f 77 andi r24, 0x7F ; 127 3ba: 03 c0 rjmp .+6 ; 0x3c2 <digitalWrite+0x58> 3bc: 80 91 80 00 lds r24, 0x0080 3c0: 8f 7d andi r24, 0xDF ; 223 3c2: 80 93 80 00 sts 0x0080, r24 3c6: 10 c0 rjmp .+32 ; 0x3e8 <digitalWrite+0x7e> 3c8: 84 b5 in r24, 0x24 ; 36 3ca: 8f 77 andi r24, 0x7F ; 127 3cc: 02 c0 rjmp .+4 ; 0x3d2 <digitalWrite+0x68> 3ce: 84 b5 in r24, 0x24 ; 36 3d0: 8f 7d andi r24, 0xDF ; 223 3d2: 84 bd out 0x24, r24 ; 36 3d4: 09 c0 rjmp .+18 ; 0x3e8 <digitalWrite+0x7e> 3d6: 80 91 b0 00 lds r24, 0x00B0 3da: 8f 77 andi r24, 0x7F ; 127 3dc: 03 c0 rjmp .+6 ; 0x3e4 <digitalWrite+0x7a> 3de: 80 91 b0 00 lds r24, 0x00B0 3e2: 8f 7d andi r24, 0xDF ; 223 3e4: 80 93 b0 00 sts 0x00B0, r24 3e8: e3 2f mov r30, r19 3ea: f0 e0 ldi r31, 0x00 ; 0 3ec: ee 0f add r30, r30 3ee: ff 1f adc r31, r31 3f0: ee 58 subi r30, 0x8E ; 142 3f2: ff 4f sbci r31, 0xFF ; 255 3f4: a5 91 lpm r26, Z+ 3f6: b4 91 lpm r27, Z+ 3f8: 2f b7 in r18, 0x3f ; 63 3fa: f8 94 cli 3fc: 66 23 and r22, r22 3fe: 21 f4 brne .+8 ; 0x408 <digitalWrite+0x9e> 400: 8c 91 ld r24, X 402: 90 95 com r25 404: 89 23 and r24, r25 406: 02 c0 rjmp .+4 ; 0x40c <digitalWrite+0xa2> 408: 8c 91 ld r24, X 40a: 89 2b or r24, r25 40c: 8c 93 st X, r24 40e: 2f bf out 0x3f, r18 ; 63 410: 08 95 ret
Quite a bit of difference in generated code output, no? The Pin library is for those times when you absolutely have to bitbang (either you can’t use the hardware SPI port, or you’re doing something with pins that don’t involve SPI or anything SPI like at all, I’m looking at you WS2811) but still want it to be as fast as possible. Also – this library works on the teensy 3.0 arm platform as well, reducing the hi/lo calls to just a load and a write. (The load is required because the GPIO registers are in a high block of memory, you need a full 32 bits to represent them, so the address to the GPIO location for a pin needs to be loaded into a register, then you can push a pin value you into).
Right now, the Pin class is written and tuned for high performance output. Setting pins, toggling pins, with a variety of support functions to help achieve higher performance, even in environments where the pin->GPIO port mapping can’t happen at compile time. A future post will detail how to use some of these other methods in the Pin class to squeeze the most performance out of your bit twiddling code (for example, when bitbanging SPI output, the inner loop of the fast SPI code can push out a bit every 4 cycles – 2 to determine if the current bit is hi or lo, 2 to set the data line appropriately and strobe the clock)! In addition, a future revision of the FastSPI_LED2 library will update the Pin class to support reading data/values as well as writing them.