1. Introduction
The Auditory Brainstem Response (ABR) is the electrical activity of the auditory nerve generated in the brainstem. It is recorded by electrodes of non-invasive tools such as EEG. ABR is associated with similar response to non-verbal and verbal stimuli [1, 2]. The use of verbal stimuli is preferable to other stimuli as verbal or speech stimuli provide biological processes underpinning normal auditory processing and auditory processing disorders [2].
ABRs are represented in temporal and spectral phases of stimulus by different neural population in different regions of the brainstem. The signals of ABRs are classified in two parts; non-periodic brief stimuli evoked by transient responses and periodic long-term stimuli evoked by sustained responses [2, 3]. More specifically, speech or the complex Auditory Brainstem Response (cABR) can preserve pitch, formants, and the timing of speech stimuli, crucial in speech understanding both in calm and within the sight of foundation commotion [1-3].
Different factors such as different languages according to each racial, music or speech experience, the period of auditory training (short-term or long-term) and hearing loss disorders can influence ABRs [2,3]. These factors shape the morphology of ABR signals and could reflect (induce) the plasticity of the neural networks at the level of brainstem [1]. In addition, differences in age, complex speech stimuli elements and the frequency of stimuli can evoke different responses [3, 4].
Consonant-Vowel (CV) combinations have rich consonant structures, dynamic abundance adjustments, and fast spectrotemporal vacillations due to changes in the filter function of the vocal tracts in source-filter model. Different types of CVs influence neural phase locking and cause changes in latency time. Stimuli with high frequencies has short latency time and process in cochlea, but stimuli with low frequencies has long latency time and process in apical of cochlea [2, 4, 5].
The source of the human voice is the vocal fold vibration. It vibrates at the fundamental voice pitch and the filter is the function of passages above larynx which shapes the voice coming up from the vocal chords. The source-filter model could be represented by a spectrograph. For example, different synthesized CVs such as /ba/, /da/, and /ga/ create various spectrograms [3]. The duration of CVs vary in phonation length from 40 to 500 ms. The duration of phonation affects pitch and other acoustic features such as harmonics, formants, and formant transition [1, 3,4]. The formant transition of vowels involve F1, F2, and F3 components and F0 is the fundamental frequency, also the vibration rate of vocal chords could determine the formant transition [1, 3]. The particular segments of the brainstem reaction mirror the acoustic attributes of pitch and formants independently. High frequencies and lower frequencies are responded by basal regions of the cochlea and apical regions, respectively [1].
Different methods of processing were used for analysis and extracting the main ABR waveform elements from ABR signals through detecting major and minor peaks and valleys. Generally, ABR signal analysis is performed in temporal and frequency domains [2, 3]. Extracting features of brainstem response and consonant-vowel stimuli are implemented through the linear and non-linear automatic methods. Using automatic methods could increase the quality of response assessment, peak detection, and improve the quality of the assessment of processing, i.e. automatically stopping averaging sand could ignore recording of unnecessary sweeps in ABRs [6-8].
Numerous automatic methods such as zero crossing method [5, 9], adaptive signal enhancement [5, 10], multi-filters [5, 11], single-trial covariance analysis [5, 12], and automatic peak picking are presented so far [5, 13, 14]. The most common reported strategies of automatic ABR analysis is the correlation coefficient between two consecutive ABR signals [5, 15]. A subjective evaluation by audiologists is the most common method of analysis for assessing the ABR response. This evaluation may differ from one audiologist to another [9, 14, 16, 17].
Automatic methods could solve representational bias and increase assessment accuracy. Automatic methods of identification promote objective methods. Although a number of objective methods have been developed in automatic evaluation of ABR, a few have been implemented in commercial devices. This article aimed to provide another target strategy for the automatic assessment of the nature of ABR signals and evident proof of the peaks in view of the utilization of templates waves. Also this article examines the synthetic use of preprocessing methods like Signal-to-Noise Ratio (SNR) of the response, correlation coefficient between grand average signal and each stimuli (/da/,/ba/,/ga/) responses signal, and stimuli signal and each response signal of cABR in Persian speakers by MATLAB software.
By using this innovative method, recording unnecessary sweeps and detecting each unnecessary and non-standard peaks are avoided. Finally, thanks to this method, language patterns of Persian speakers could be extracted and compared with the studies on English speakers [2].

2. Methods
This part describes a simple novel objective technique of extracting cABR features such as amplitude, width, latency, as well as automatic detection of peaks.

Participants
A total of 27 adult (13 female and 14 male) students with the Mean±SD age of 24.34±1.95 years (age range: 22-29 years) from Tehran University of Medical Sciences participated in this study. All participants were native monolingual speakers of Persian with normal hearing and no neurological disorders. The hearing thresholds of participants were 20 dB HL or higher at octave frequencies (250–8000 Hz).

Stimuli and presentation
Three diotic synthesized consonant-vowel combinations including /da/, /ba/, and /ga/ with 170 ms duration at the rate of 20 kHz were presented to each person. Stimuli were obtained from the study of Kraus and et al. [2]. Formant transition with duration of 50 ms and linearly rising included formant number 1 (400–720 Hz), flat formant number 4 (3300 Hz), formant number 5 (3750 Hz), and formant number 6 (4900 Hz) (Figure 1). Initial frication were centered at frequencies around formant number 4 (F4) and formant number 5 (F5) in 10 ms.
After 50 ms of formant transition period, formant number 2 (F2) and formant number 3 (F3) remained consistent at their progression endpoint in 1240 Hz and 2500 Hz, respectively [2, 18]. The starting point of F2 and F3 were the portions of different stimuli. For [ba], F2 and F3 rose from 900 Hz and 2400 Hz, respectively. For [da], F2 and F3 tumbled down from 1700 and 2580 Hz, individually. For [ga], F2 and F3 reduced from 3000 and 3100

Hz, individually. These blended boost have an identical and consistent F0 in whole length [2, 18].
The details of formant transition related to these three stimuli are presented in Table 1. After passing 50 ms of formant transition period, F2 and F3 in 1240 to 2500 Hz, respectively remained constant at their transition endpoint frequencies of 1240 and 2500 Hz. The dissimilarities reflected in F2 and F3, for [ba], F2 and F3 rose from 900 Hz and 2400 Hz. For [da], F2 and F3 descent from 1700 and 2580. For [ga], F2 and F3 decreased from 3000 and 3100, respectively.
The F0 of these three diotic stimulus were identical and constant in the entire duration [2, 10]. The diotic stimuli was presented at the rate of 4.65/s, and both stimulus polarities (build-up and rarefaction) were exhibited. The trial of stimulus was introduced to the correct ear through Etymotic’s ER-3 headphones (Etymotic Research, Elk Grove Village, IL), at the force of 83 db SPL. Video-taped program were presented to all subjects to promote their cooperation and stillness [2, 10].

Recording parameters
Continuous g.tec EEG was used for recording evoked potentials synchronized with auditory stimuli. Electrodes were located from Cz to ipsilateral earlobe, with forehead served as ground, band pass filtered from 0.05 Hz to 3000 Hz, and digitized at 20000 Hz. All electrodes were made with Ag/AgCl and their impedance was lower than 5 kΩ. For each stimuli, EEG was processed offline for creating average signals. An EEG was divided into 230 ms epochs (45 ms pre stimulus onset to 185 ms post stimulus) and each epoch was band pass filtered from 70 to 2000 Hz for isolating the brainstem response frequencies.
An artifact criterion of ±35 mV was applied in reject myogenic artifacts. The processed epochs were separately averaged for each stimuli according to their polarity, then they were summed up to isolate the neural response [2, 10, 19]. Final averaged for each stimuli ranged between 4000 to 4100 sweeps per subject for each stimulus [2].

Analysis
Formant transition period analysis
The formant transition is a part of response that corresponds to the onset with duration of 0-70 ms. According to different stimuli, latency is altered in this portion. To isolate formant transition part and eliminate low-frequency activity that could obscure variation of latency, response waveform was additionally high-pass filtered at 300 Hz. First 70 ms of transition waveform of grand average response was selected for temporal analysis. For Fast Fourier Transform analysis (FFT), 18-58 ms of formant transition average was calculated in 50 Hz wide bins surrounding F0 and next 10 harmonics.
The range of 400-720 Hz referring to F0 frequency was chosen, then all 10 harmonic peaks marked for each stimulus by MATLAB software via Intraclass Correlation Coefficients (ICC), and Paired t test for evaluating frequency bin difference were performed in each stimulus. ABRs of each subject had 16 peaks in this portion, and peaks 1, and 2 were the starting point and called onset response. Peaks 3, 5, 6, 7, 9, 10, 12, 13, 15, and 16 were the major peaks, and peaks 5, 8, 11, and 14 were the minor peaks. The peaks 15 and 16 were the end point of transient response where the acoustic properties of the three stimulus were identical.

The grand-average signal marked by manual method is shown in Figure 2. The grand average signal for each stimulus with the use of the reference lookup Table is shown in Figure 2.A, 2.B, 2.C. As per the visual analysis report of an audiologist, the onset of response revealed in 9 ms, with every ~10 ms major peaks of activity occurring around 23 ms. According to this theory, we estimated the grand average of each of three stimuli response and divided it into 7 epochs with duration of 10 ms. Starting epoch of this portion included 2 peaks (1, 2) that was called the starting point of response. Also, endpoint included 2 peaks (15, 16), called as the offset, but other epochs included 3 peaks (positive, negative major peaks and minor peaks). In this technique, after estimating positive and negative major peaks, maximum and minimum amplitude of each signal were evaluated by calculating f(c). In this phase, we set limitation periods for identifying direction curvature and detection peaks in each epoch, according to Equation (1).
(1)
Two audiology experts, professional in speech ABR, marked grand average signal by visual method, and obtained the reference lookup Table. In addition, latencies and amplitudes were measured manually by two audiologists and by this technique, automatically. Using the information of reference lookup Table, periodic limitation time was set for each response and each epoch, then we used cross-correlation between grand average signal of response and response signal of each subject. In this step, the response signal shifted to grand average signal and produced template signal.
Coefficient correlation between template signal and stimuli signal was then applied to minimize the effect of artifacts. Finally, using earlier described periodic limitation and using pre-processing technique, all response signals of each subject were marked automatically. The values of latencies and amplitude were tabulated in different lookup Table for each stimulus and each subject.
We set individual variance for marking peaks, due to brainstem jittering in this automatic method, which means that the tolerance of each epoch was ±2ms. ICC analyses were performed utilizing the non-normalized latencies on 4 gatherings of peaks, onset peaks 1 and 2, major peaks 3, 4, 6, 7, 9, 10, 12, and 13, minor peaks 5, 8, 11, and 14, and end-point peaks 15 and 16. A 3 x K repeated measures ICC (where 3 is the quantity of stimulus conditions and K is the quantity of peaks) was led on each gathering. For gatherings of peaks which the stimulus×peak connection was critical, repeated measures ICC and Paired t test were performed to examine contrasts between stimulus latency.

Frequency domain measures
A Fast Fourier Transform (FFT) analysis was used for evaluating the spectral domain of response. We selected 18-58 ms of formant transition response time and 2000 Hz range of frequency. The average response was calculated in 50 Hz wide bins surrounding F0 and next 10 harmonics. The range of 400-720 Hz referring to F0 frequency was chosen and then all 10 harmonic peaks were marked for each stimulus by MATLAB software. Repeated measures ICC and Paired t test were used for evaluating the significance of frequency bins for each stimulus.

3. Results
We compared automatic and manual methods and then compared individual differences in Persian speakers and English speakers. The automatic algorithm could correctly detect the locations of each peak in ABR signals. Additionally, the performance of this objective method is estimated 90% for all peaks, but overlooking peak 16 of /da/ response, the performance rate is 95% and further details are presented in Tables 2, 3, and 4. The mean score and standard deviation of the non-normalized latencies for 16 peaks picked for each stimulus condition by automatic and manual methods are described in Table 5, 6, 7. The response waveform of the first 70 ms of transition portion, were used for comparing objective and subjective method in Persian speakers. Figures 3, 3a, and 3b illustrate automatic method, and Figures 4, 4a, and 4b illustrate manual method.
The within-subject main effect of stimulus (F2,52=6.888, P=0.002) was significant and stimulus X peak interaction (F2,53=0.863, P=0.428) for the end point was non-significant. Also, English speakers had the same significant result for within-subject main effect of stimulus. In onset peaks 1 and 2, no significance within-subject main effect of stimulus (F1.45, 37.54=2.144, P=0.147) was found, and stimulus X peak interaction (F2,52=2.339, P=0.107) was non-significant. Same result was obtained for English speakers. Also minor peaks have greater between-stimulus latency differences than major peaks and their latency time is longer than major peaks. The same is true in English speakers.

Frequency domain
For analysis of transient portion in frequency domain, a range of 18-58 ms transition part was chosen. Grand averaged signal was plotted. Significant and non-significant differences were seen in this range for each stimulus. Figure 5 shows grand averaged signal and the next 10 harmonics marked by the manual method [3]. Figures 5, 5a, 5b show peak detection in 10 harmonics waveform for each stimulus by the automatic method. There is no significance between stim X peak interaction and the main within-subject effect of stimulus for each bins, because P values are greater than 0.05.
The results were the same for English speakers but the results of the Paired t test for follow-up scores between these two groups were different for each harmonic. The follow-up Paired t tests for Farsi speakers harmonic were performed to assess between-stimulus

A: Schematic of automatic /ba/ Grand average; B: Schematic of automatic /ga/ Grand average; C: Schematic of automatic/da/ grand average
differences. The results are displayed in Table 8. This study revealed that in harmonics 1, 3, 6, 9, 10, each stimulus is not significant. In harmonic 2, only /da/ and /ba/ are significant. In harmonic 4, two pairs of stimuli /da/+/ga/ and /ba/+/ga/ are significant. In harmonic 5, all stimuli are significant. In harmonic 7 only /ba/ and /ga/ are significant, and in harmonic 8, two pairs of stimuli /ba/+/ga/ and /da/+/ga/ are significant.

4. Discussion
The purpose of this study was to describe new simple automatic peak detection for extracting features of cABR. In this study, a cross-correlation between grand average signal of all subject response and response signal was used for shifting signals of each subject across the grand average signal and producing template sig

nal. With the use of limitation periodic time, this automatic quality evaluation method was compared to a subjective evaluation by two experts of audiology and electro physiology. The results indicated that automatic method presents a 90% correlation coefficients averaged with the visual assessment. It means that our first hypothesis (automatic algorithm used for detecting peaks) is similar to visual analysis.
The figures show an important bias among experts in the subjective method, which means that visual judgment is not really exhaustive [15, 16, 18]. Comparing the subjective and objective method showed that automatic methods are uniform, worldwide, useful, and eliminate human bias. There is no limitation for the number of data [11]. Therefore, automatic algorithm could extract latency time in high accuracy and improve fractional milliseconds.

The second purpose of this study was to compare the relationship between the automatic and manual method. Each method showed that brainstem response among manufactured voiced stop consonants /ga/, /da/, and /ba/ recurrence advances are extraordinary only in F2 and F3. Since the scope of F2 and F3 are over the phase-locking capacities of the brainstem recurrence, they would be shown as latency contrasts among responses. It implies responses to /ga/ and would have the most punctual latencies. As it contains the most astounding F2 and F3 frequencies, responses to /ba/would have the most recent latencies because of having the least F2 and F3 frequencies, and responses to [da] would have middle latency responses [2].
Skoe et al. used the fast Fourier analysis and mentioned that F2 and F3 recurrence ranges are higher than the phase-locking of the brainstem response, consequently recurrence contrasts are characterized as latency differences due to responses. Harmonics figure illustrated that the most similarities are between /da/ and /ga/ neural encoding, and the most dissimilarities are between /ba/ and /ga/ of their neural encoding. This explains that the formant frequency of each stimulus causes dissimilarity or similarity in auditory response [2]. The same results were obtained for our third hypothesis that automatic algorithm could be used as a co-observer in clinics. The latency time of onset response of /ba/, /da/ and /ga/ for Persian speakers, are later than English speakers. We hypothesized that these differences of latency time between Persian speakers and English speakers are related to differences in age, language training, and plasticity of brainstem in these two groups.

Language experience
Language experience plays a critical role in the development of neural encoding in auditory system at the place of cortical and subcortical levels [20]. Evidence has shown that when native people listen to their native

stimuli, the F0 of brainstem response are larger than the non-native speakers [20-22].
According to their early learning, the formation and functional properties of neural organization means high skills in detecting and predicting native language [20]. Neural representation of pitch reveals that language experience can affect the behavior of action potential and sound processing in brainstem and cortex [22, 23]. Cellular adaptations lead to plasticity in brainstem and cortex, which implies vast somatic synapses, quick discharge time course, quick AMPA receptor energy, plasticity prompts brief synaptic responses that advances insignificant worldly summation, balanced flagging, short-latency spikes, and a short hard-headed period [23, 24].

Age
Maturation influences the transmission time. It influences the peripheral auditory pathway maturity amid the initial 2 months of life, while the central transmission time abbreviates up to the age of 5 to 8 years. The III-II and V-IV inter peak latencies demonstrated maturational changes like those of V-I IPLs, interestingly, II-I and IV-III indicated little changes. An unmistakable increment of the amplitude of peak V up to age of 4 and a consequent decreasing tendency was observed. In this study, Persian speakers were between 22 and 28 years old, and English speakers between 8 and 12 years. This age difference leads to differences in morphology of V-I peaks and amplitude of response [25, 26]. Results indicated that separate latency and amplitude norms for English speakers and Persian speakers have worthwhile significance to cABR measures [25].
By overviewing the result of this study, it seems that using more professional audiologists in speech ABR field could increase the precision of grand average in the visual method and facilitates comparing the methods. By reducing the time of recording, artificial noise and mental fatigue could decrease. Future studies could use Linear Discriminant Analysis (LDA) classifier for analyzing or extracting the effect of mental fatigue on latency time.

Gender
Gender is a physiological factor that can affect brainstem auditory, evoked potential responses, and latencies. Head size and Body Mass Index (BMI) in different age and genders are different; these differences could show up in latencies of peaks I-III [26].

5. Conclusion
The automatic algorithm could detect all 16 peaks in brainstem response signals, and extract latency time with high accuracy. There is no visual bias in this algorithm. Time limitation including individual variance provides higher precision for calculating latency time. The high dissimilarity between different experts and automatic algorithm is in peaks 15 and 16, because there are offset and the endpoint of signals and downing in most artifacts.

Ethical ConsiderationsCompliance with ethical guidelines
Prior to study, all participants signed a written informed consent, approved by the Ethics Committee of Tehran University of Medical Sciences. They were rewarded for their participation.

Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Conflict of interest
No irreconcilable interest were identified with this research. The authors certify that they have no affiliation with or involvement in any organization or entity with any financial, or non-financial interest in the subject matter or materials dismissed in this manuscript.

Acknowledgments
This study was a part of master's thesis of Negar Amirian, supported by Islamic Azad University, Central Tehran Branch. The authors gratefully acknowledge Dr. Bram Van Dun for his unforgettable support during this study and marking the peaks of brainstem response waves, Dr. Zahra Shirjiyan for her kind assistance, Dr. Mohsin Reza Heydari, Dr. Ali Akbar Tahaei, and Dr. Mohsen Ahadi for their guidance regarding neuroscience and auditory system during this study.

Blumstein SE, Isaacs E, Mertus J. The role of the gross spectral shape as a perceptual cue to place of articulation in initial stop consonants. The Journal of the Acoustical Society of America. 1982; 72(1):43-50. [DOI:10.1121/1.388023] [PMID]

3. Blumstein SE, Isaacs E, Mertus J. The role of the gross spectral shape as a perceptual cue to place of articulation in initial stop consonants. The Journal of the Acoustical Society of America. 1982; 72(1):43-50. [DOI:10.1121/1.388023] [PMID] [DOI:10.1121/1.388023]