Introduction

Refer to http://en.wikipedia.org/wiki/Electrocardiography for an understanding of ECG signal and leads. Conventionally such ECG signals are acquired by ECG acquisition devices and those devices generate a printout of the lead outputs. A cardiologist analyzes the data for checking the abnormality or normalcy of the signal. But in recent times, automatic ECG processing has been of tremendous focus. http://www.codeproject.com/KB/cpp/ecg_dsp.aspx gives a fantastic overview of acquiring and filtering ECG signals through inexpensive hardware into your PC. You can download ECG signal samples of various diseases from http://www.physionet.org/physiobank/database/mitdb/. Now the main point of concern is how to develop a system for extracting the features from ECG signal so that these features can be used for Automatic Diseases Diagnosis. In this Article we shall discuss a technique for extracting features from ECG signal and further analyze for ST-Segment for elevation and depression which are symptoms of Ischemia. You can Learn more about Cardio Vascular Abnormalities and their correlation with ECG peaks fromhttp://circ.ahajournals.org/content/110/17/2721.full.

Background

We will discuss about the algorithm in detail which process the ECG signal Obtained from MIT-BIH database and are in .mat format. For the current analysis, we consider signal of both Normal Sinus Rhythm and ST-Elevated signals. Finally Using a threshold we check the normalcy of the signals.

Using the Code

Append 100 zeros before and after the signal to remove the possibility of window crossing the signal boundaries while looking for peak locations.

z=zeros(100,1);
A=[z;A;z];

Perform wavelet decomposition. The process of wavelet decomposition down samples the signal. Which essentially means taking the samples at a much lower frequency than the orifinal signal. Therefore details are reduced and QRS complex is preserved.

If you plot the coefficients you will observe that the frequency bands are separated and ca1,ca2,ca3 and ca4 are cleaner signal. But they will have less number of samples than the actual signal due to downsampling. You can see that first signal resembles to the actual signal but has exactly one forth number of samples because the signal was decomposed in 4 levels. 2nd level has exactly half number of samples that of 1st level, 3rd level has exactly half number of samples than the 2nd level. Because the number of samples is reduced, such signals are also called down-sampled signal.

It is clear that 2nd level decomposed data is noise free. Therefore we consider this signal as ideal ECG signal from which QRS must be detected. But the first R is located in 3rd level decomposition signal at approximately 40th sample whereas the same is located in the original signal at 260th location. Therefore once R peak is detected in 3rd level reconstructed signal, it must be cross validated in the actual signal

Detecting R peak in the down sampled Signal

First find the values which are greater than 60% of the max value of the actual signal. Invariably these are R peaks. As the decomposed signals are noise free signals, First R peak needs to be detected in the Noise free signal. But remember the ultimate goal is to detect the Peak in the original Signal. The sample values in Original Signal will be different than the decomposed signal. So Our strategy here will be to first detect the R peaks in the down sampled signal and than cross verify those points the actual signal.

Let y1 be the decomposed signal.

m1=max(y1)*.60;
P=find(y1>=m1);

So P is now set of points which satisfies the above criteria. If you observe the signal very closely, R-Peak is not a single Impulse peak, therefore there are chances of multiple points in the same peak satisfying the criteria. One thing to remember is in 500Hz sampled signal No to R-Location will be found below 350 samples. In 4th Level decomposition order this value is around 20. So first we will remove the R locations that are too close.

Now Variable P2 represents the position of R-Peaks in the down sampled signal.

Detecting R peak in the Original Signal

Search for the position of all the location in signal y1 which are greater than this value m1. They are R locations. But before we proceed, you must know that A R Location in Rt is at least 1/4th of the actual R location of the same point. Hence we will first map the detected positions to original signal by first multiplying with 4.

P3=P2*4;
%Multiply the current location with 4 to get the actual scale.

Another important thing you must remember is that, R location in down sampled signal will never be on the original signal at a scale of 4. Down sampling process always deviate the signal positions. Hence we need to search for the maximum value in the Original Signal in a window of +-20 samples from the reference R point obtained as P3.

Rloc=[];
for( i=1:1:length(P3))
range= [P3(i)-20:P3(i)+20]
%Search within a window of +-20 samples in the original signal with reference to up scaled R locations detected in downsampled signal.
m=max(A(range));
l=find(A(range)==m);
pos=range(l);
Rloc=[Rloc pos];
end
Ramp=A(Rloc);

It is clear now that Ramp and Rloc represents the R peak amplitude and location at the original scale. Let us see the marking of the same in the waveform.

Detecting Other Peaks With Reference to R-Peaks

From R-Peak Traverse Forth and Back and Search for Minima and Maxima, these are P,Q,T,S peaks respectively. So loop in Rloc and search for the other peaks.

Firstly, If you observe the waveform, it will be very clear that from R location if you select a window of Rloc-100 to Rloc-50 and find the maximum, than that maxima is P peak.

Once All the peaks are correctly detected, you can find the Onset and Offset as points of zero crossing for each lead. ST Segment can be calculated from S-Offset and T-Onset.

Points of Interest

You might be interested to learn the feature extraction technique from 12-Lead data using PCA from this Link. 12-Lead ECG Feature extraction with PCA using Matlab. You could also consider cleaning the ECG signal before processing using Symlet or any other filtering technique.

Step 2-1: Obtain feature Fi using the following algorithm:
If eR
¼ 1, then feature Fi (see Fig. 3) is obtained
for 1 6 i 6 9, 1 6 k 6 5, 1 6 j 6 5, k – j, where eR
¼
Ri;k \ Ri;j, and the index i, j (or k) is defined as the same
as that of Ri,j. Remark 2-1: Let eR
¼ Ri;k \ Ri;j ¼ 1 if the
two feature value ranges Ri,k and Ri,j do not overlap.
Thus, feature Fi is obtained to discriminate heartbeats
case-k and case-j, and NFi is increased by 1, where NFi
denotes the total number of cases in which feature
Fi can discriminate two disjoint heartbeats case-k
and case-j. For instance, if R3,1 = [33,79] ms, R3,2 = [86,
153] ms (see Fig. 4c), then eR
¼ R3;1 \ R3;2 ¼ 1, meaning
that feature F3 (‘‘QRS-dur”) can discriminate heartbeats
case-1 (NORM case) and case-2 (LBBB case). Similarly,
let eR
¼ Ri;k \ Ri;j ¼ 0 if Ri,k and Ri,j overlap, that is, feature
Fi is not obtained and heartbeats case-k and case-j cannot
be discriminated. In this case, the value of NFi
remains unchanged. For instance, if R1,1 = [0.695,
2.69] mV, R1,2 = [0.205,2.06] mV (see Fig. 4a), then
eR
¼ R1;1 \ R1;2 ¼ 0, meaning that feature F1 (‘‘H-QR”)
cannot discriminate heartbeats case-1 (NORM case)
and case-2 (LBBB case).
Step 2-2: Sort NFi, i = 1,2,. . ., 9, with the order of
decreasing values, and then select the index with the
highest value, that is,
i ¼ argðmaxfNFig; i ¼ 1; 2; . . . ; 9Þ: ð3Þ
For instance, suppose that NF1 = 6, NF2 = 1, NF3 = 3, the
sorted sequence becomes NF1, NF3 and NF2. Thus, the
index with the biggest value is NF1 and the sequence
of sub-indexes NFi with the order of decreasing values
of NFi is 1, 3, and 2.
Step 2-3: Obtain qualitative feature Fi. A feature Fi is
selected as a qualitative feature if it satisfies both the
following conditions:
Condition 1: Feature Fi can discriminate heartbeats
case-k and case-j, where sub-indexes i are obtained
from Step 2-2.
Condition 2: The qualitative feature for discriminate
between heartbeats case-k and case-j is not found
yet, where k, j = 1,2,3,4,5, and k – j.
If the feature Fi is selected as qualitative feature, then
both heartbeat cases k and j are recorded in data items
for the feature Fi and OUT Fi (that is, Fi is a qualitative
feature). If the feature Fi cannot be selected as qualitative
feature, then go to Step 2-4. Fig. 5 shows the flowchart
of Step 2-3. Remark 2-3: Index k and
j = 1, 2, 3, 4,5 denote the heartbeat cases of ‘‘NORM”,
‘‘LBBB”, ‘‘RBBB”, ‘‘VPC” and ‘‘APC”, respectively.
Step 2-4: Obtain the next qualitative feature.If the
qualitative features obtained from Step 2-3 are
enough to discriminate each heartbeat case, then
go to Step 2-5, otherwise go to Step 2-3.
Step 2-5: End.Four qualitative features QRS-dur,
QTP-int, Ratio-RR, and Area-R0ST0 were selected after
performing the above procedure QFS. Each heartbeat
case had its own range of values for each qualitative
feature, as shown in Fig. 4c–e and i. Thus, these specific
value ranges can be adopted to determine
whether the patient has the case of cardiac arrhythmia
by the LDA method.

I think Question and Answer forum would be a better place to ask this question as you are seeking help in implementing an algorithm. Looks like it is part of a paper. It is a good option to contact the author of a paper.

Hi
"One thing to remember is in 500Hz sampled signal No to R-Location will be found below 350 samples. In 4th Level decomposition order this value is around 20" & "Firstly, If you observe the waveform, it will be very clear that from R location if you select a window of Rloc-100 to Rloc-50 and find the maximum, than that maxima is P peak"
these numbers are Initial data ????
if no,can say me how are obtained????

It's a normal heart beat sample. For normal human beings it is around 70 BPM, i.e. 1 beat par second.So if you are taking 500 samples par second, likely you will get a peak at about 350th sample. Hypothesis, nothing else.

Hi,
Good day.
I am working on " ecg recognition using intelligent technique" & i encountered the problem of how to collect the ecg arrhythmia cases with .dat format, for example : I need to collect many cases of "Atrail Premature beats" from the records (100, 118, 200, 201, 202) within the "MIT-BIH arrhythmia database" but I found it difficult.
I don’t know to collect them from that database; so, could you please let me know the proper “interval length” that I must cut, and how to do so? can u send me the data set u used in ur research ?
Kind regards,
Nidhal

From top drop down box select the Appropriate dataset
From Top right corner select Export Signal as mat. You will get a downloadable link to .mat file at the bottom of the page.
Sampling rate for Most of the ECG signals in Physionet ECG is 500Hz, a recording of 1 Minute is suffice for your problem.

1min recording is suffice. ECG data will not act as pattern. You need to extract pattern from it. ST segment, T amplitude, QT interval, RR interval, T' are some of the patterns that Identifies arrhythmia.

hi Brother. Thanks for all the information, i really appreciate the help you posted. however i was confused regarding how to calculate the onset of Q wave and offset of S wave. You said i should look for zero crossing, however many ECG's available are not on zero baseline. what to do with them an dhow to calculate onset/offset in this case. i mean i will have to study each signal to decide on the baseline but how can this process be automated?

The logic behind onset of Q is:
1) Find Zero Crossing at the start of Q
2) If found, mark that as Onset.
3) If not found: for peak or Q, Traverse back over a Window and look for a point which is <=0 & Its previous Point >0 ---> This point is Onset.

4) Similarly for Offset, from S traverse towards Right side, find a point which is <0, and it's next point>=0 this is S offset.

thanks for such a swift reply. however my confusion remains the same. in many ecg signals that i have observed in MITDB do not have zero aa the baseline i.e the Q wave may not start form zero rather would start from a higher value r a lower value. how to calculate the onset of Q then?

I came across another baseline drift removal technique which states that i have to apply 200ms Median filter followed by 600ms Median filter on ECG signals from MIT-BIH. Can you please help me out as to what is meant by 200ms median filter? does it mean i need to apply median filter over different 200ms segments of ECG signals?