IR Modulation Processing Algorithm Development – Part VIII

In my last post on this subject, I showed how I could speed up ADC cycles for the Teensy 3.5 SBC, ending up with a configuration that took only about 5μSec/analog read. This in turn gave me some confidence that I could implement a full four-sensor digital BPF running at 20 samples/cycle at 520Hz without running out of time.

So, I decided to code this up in an Arduino sketch and see if my confidence was warranted. The general algorithm for one sensor channel is as follows:

Collect a 1/4 cycle group of samples, and add them all to form a ‘sample_group’

For each sample_group, form I & Q components by multiplying the single sample_group by the appropriate sign for that position in the cycle. The sign sequence for I is (+,+,-,-), and for Q it is (-,+,+,-) .

Perform steps 1 & 2 above 4 times to collect an entire cycle’s worth of samples. As each I/Q sample_group component is generated, add it to a ‘cycle_group_sum’ – one for the I and one for the Q component.

When a new set of cycle_group_sums (one for I, one for Q) is completed, use it to update a set of two N-element running sums (one for I, one for Q).

Add the absolute values of the I & Q running sums to form the final demodulated signal value for the sensor channel.

To generalize the above algorithm for K sensor channels, the ‘sample_group’ and ‘cycle_group_sum’ variables become K-element arrays, and each step becomes a K-step loop. The N-element running sum arrays (circular buffers) become [K][M] arrays, i.e. two M-element array for each sensor (one for I, one for Q).

All of the above sampling, summing, and circular buffer management must take place within the ~96μSec ‘window’ between samples, but not all steps have to be performed each time. A new sample for each sensor channel is acquired at each point, but sample groups are converted to cycle group sums only once every 5 passes, and the running sum and final values are only updated every 20 passes.

I built up the algorithm in VS2017 and put in some print statements to show how the gears are turning. In addition, I added code to set a digital output HIGH at the start of each sample window, and LOW when all processing for that pass was finished. The idea is that if the HIGH portion of the pulse is less than the available window time, all is well. When I ran this code on my Teensy 3.5, I got the following print output (truncated for brevity)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

RunningSumInsertionIndex=63

SampleSumCount=0

SampleSumCount=1

SampleSumCount=2

SampleSumCount=3

SampleSumCount=4

CycleGroupSumCount=0

SampleSumCount=0

SampleSumCount=1

SampleSumCount=2

SampleSumCount=3

SampleSumCount=4

CycleGroupSumCount=1

SampleSumCount=0

SampleSumCount=1

SampleSumCount=2

SampleSumCount=3

SampleSumCount=4

CycleGroupSumCount=2

SampleSumCount=0

SampleSumCount=1

SampleSumCount=2

SampleSumCount=3

SampleSumCount=4

CycleGroupSumCount=3

RunningSumInsertionIndex=0

SampleSumCount=0

SampleSumCount=1

SampleSumCount=2

SampleSumCount=3

SampleSumCount=4

CycleGroupSumCount=0

SampleSumCount=0

SampleSumCount=1

SampleSumCount=2

SampleSumCount=3

SampleSumCount=4

CycleGroupSumCount=1

SampleSumCount=0

SampleSumCount=1

SampleSumCount=2

SampleSumCount=3

SampleSumCount=4

CycleGroupSumCount=2

SampleSumCount=0

SampleSumCount=1

SampleSumCount=2

SampleSumCount=3

SampleSumCount=4

CycleGroupSumCount=3

RunningSumInsertionIndex=1

SampleSumCount=0

SampleSumCount=1

And the digital output pulse on the scope is shown in the following photo

Timing pulse for BPF algorithm run, shown at 10uS/cm. Note the time between rising edges is almost exactly 96uSec, and there is well over 60uSec ‘free time’ between the end of processing and the start of the next acquisition window.

As can be seen in the above photo, there appears to be plenty of time (over 60μSec) remaining between the end of processing for one acquisition cycle, and the start of the next acquisition window. Also, note the fainter ‘fill-in’ section over the LOW part of the digital output. I believe this shows that not all acquisition cycles take the same amount of processing time. Four acquisition cycles out of every 5 require much less processing, as all that happens is the individual samples are grouped into a ‘sample_group’. So the faint ‘fill-in’ section probably shows the additional time required for the processing that occurs after collection/summation of each ‘sample_group’.

The code for these measurements is included below:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

//Program to implement John Jenkins I/Q sq wave demod scheme

//see his sync_filter_2c.xlsx document

//Purpose (v3):

// Produce a running estimate of the magnitude of a square-wave modulated IR

// signal in the presence of ambient interferers. The estimate is computed by

// adding the absolute values of the running sums of the I & Q outputs from a