etd@IISc Community:http://hdl.handle.net/2005/12016-10-25T02:22:43Z2016-10-25T02:22:43ZA Workload Based Lookup Table For Minimal Power Operation Under Supply And Body Bias ControlSreejith, Khttp://hdl.handle.net/2005/10302011-01-25T20:30:31Z2011-01-24T18:30:00ZTitle: A Workload Based Lookup Table For Minimal Power Operation Under Supply And Body Bias Control
Authors: Sreejith, K
Abstract: Dynamic Voltage Scaling (DVS) and Adaptive body bias (ABB) techniques respectively try to reduce the dynamic and static power components of an integrated circuit. Ideally, the two techniques can be combined to find the optimal operating voltages (VDD and VBB) to minimize power consumption. A combination of the DVS and ABB may warrant the circuit to operate at voltages (supply and body bias) different from the values specified by the two methods working independently. Also, this VDD and VBB values for minimal power consumption varies with the workload of the circuit. The workload can be used as an index to select the optimal VDD/VBB values to minimize the total power consumption. This paper examines the optimal voltages for minimal power operation for typical data path circuits like adders and multiply-accumulate (MAC) units across various process, voltage, and temperature conditions and under different workloads. In addition, a workload based look up table to minimize the power consumption is also proposed. Simulation results for an adder and a multiply-accumulate circuit block indicate a power saving of 12-30% over standard DVS scheme.2011-01-24T18:30:00ZWhy only two ears? Some indicators from the study of source separation using two sensorsJoseph, Jobyhttp://hdl.handle.net/2005/552006-03-03T04:06:56Z2005-02-10T05:12:41ZTitle: Why only two ears? Some indicators from the study of source separation using two sensors
Authors: Joseph, Joby
Abstract: In this thesis we develop algorithms for estimating broadband source signals from a mixture using only two sensors. This is motivated by what is known in the literature as cocktail party effect, the ability of human beings to listen to the desired source from a mixture of sources with at most two ears. Such a study lets us, achieve a better understanding of the auditory pathway in the brain and confirmation of the results from physiology and psychoacoustics, have a clue to search for an equivalent structure in the brain which corresponds to the modification which improves the algorithm, come up with a benchmark system to automate the evaluation of the systems like 'surround sound', perform speech recognition in noisy environments. Moreover, it is possible that, what we learn about the replication of the functional units in the brain may help us in replacing those using signal processing units for patients suffering due to the defects in these units.
There are two parts to the thesis. In the first part we assume the source signals to be broadband and having strong spectral overlap. Channel is assumed to have a few strong multipaths. We propose an algorithm to estimate all the strong multi-paths from each source to the sensors for more than two sources with measurement from two sensors. Because the channel matrix is not invertible when the number of sources is more than the number of sensors, we make use of the estimates of the multi-path delays for each source to improve the SIR of the sources. In the second part we look at a specific scenario of colored signals and channel being one with a prominent direct path. Speech signals as the sources in a weakly reverberant room and a pair of microphones as the sensors satisfy these conditions. We consider the case with and without a head like structure between the microphones. The head like structure we used was a cubical block of wood. We propose an algorithm for separating sources under such a scenario. We identify the features of speech and the channel which makes it possible for the human auditory system to solve the cocktail party problem. These properties are the same as that satisfied by our model. The algorithm works well in a partly acoustically treated room, (with three persons speaking and two microphones and data acquired using standard PC setup) and not so well in a heavily reverberant scenario.
We see that there are similarities in the processing steps involved in the algorithm and what we know of the way our auditory system works, especially so in the regions before the auditory cortex in the auditory pathway. Based on the above experiments we give reasons to support the hypothesis about why all the known organisms need to have only two ears and not more but may have more than two eyes to their advantage. Our results also indicate that part of pitch estimation for individual sources might be occurring in the brain after separating the individual source components. This might solve the dilemma of having to do multi-pitch estimation. Recent works suggest that there are parallel pathways in the brain up to the primary auditory cortex which deal with temporal cue based processing and spatial cue based processing. Our model seem to mimic the pathway which makes use of the spatial cues.2005-02-10T05:12:41ZWho Spoke What And Where? A Latent Variable Framework For Acoustic Scene AnalysisSundar, Harshavardhanhttp://hdl.handle.net/2005/25692016-09-15T11:46:20Z2016-09-14T18:30:00ZTitle: Who Spoke What And Where? A Latent Variable Framework For Acoustic Scene Analysis
Authors: Sundar, Harshavardhan
Abstract: Speech is by far the most natural form of communication between human beings. It is intuitive, expressive and contains information at several cognitive levels. We as humans, are perceptive to several of these cognitive levels of information, as we can gather the information pertaining to the identity of the speaker, the speaker's gender, emotion, location, the language, and so on, in addition to the content of what is being spoken. This makes speech based human machine interaction (HMI), both desirable and challenging for the same set of reasons. For HMI to be natural for humans, it is imperative that a machine understands information present in speech, at least at the level of speaker identity, language, location in space, and the summary of what is being spoken.
Although one can draw parallels between the human-human interaction and HMI, the two differ in their purpose. We, as humans, interact with a machine, mostly in the context of getting a task done more efficiently, than is possible without the machine. Thus, typically in HMI, controlling the machine in a specific manner is the primary goal. In this context, it can be argued that, HMI, with a limited vocabulary containing specific commands, would suffice for a more efficient use of the machine.
In this thesis, we address the problem of ``Who spoke what and where", in the context of a machine understanding the information pertaining to identities of the speakers, their locations in space and the keywords they spoke, thus considering three levels of information - speaker identity (who), location (where) and keywords (what). This can be addressed with the help of multiple sensors like microphones, video camera, proximity sensors, motion detectors, etc., and combining all these modalities. However, we explore the use of only microphones to address this issue. In practical scenarios, often there are times, wherein, multiple people are talking at the same time. Thus, the goal of this thesis is to detect all the speakers, their keywords, and their locations in mixture signals containing speech from simultaneous speakers. Addressing this problem of ``Who spoke what and where" using only microphone signals, forms a part of acoustic scene analysis (ASA) of speech based acoustic events.
We divide the problem of ``who spoke what and where" into two sub-problems: ``Who spoke what?" and ``Who spoke where". Each of these problems is cast in a generic latent variable (LV) framework to capture information in speech at different levels. We associate a LV to represent each of these levels and model the relationship between the levels using conditional dependency.
The sub-problem of ``who spoke what" is addressed using single channel microphone signal, by modeling the mixture signal in terms of LV mass functions of speaker identity, the conditional mass function of the keyword spoken given the speaker identity, and a speaker-specific-keyword model. The LV mass functions are estimated in a Maximum likelihood (ML) framework using the Expectation Maximization (EM) algorithm using Student's-t Mixture Model (tMM) as speaker-specific-keyword models. Motivated by HMI in a home environment, we have created our own database. In mixture signals, containing two speakers uttering the keywords simultaneously, the proposed framework achieves an accuracy of 82 % for detecting both the speakers and their respective keywords.
The other sub-problem of ``who spoke where?" is addressed in two stages. In the first stage, the enclosure is discretized into sectors. The speakers and the sectors in which they are located are detected in an approach similar to the one employed for ``who spoke what" using signals collected from a Uniform Circular Array (UCA). However, in place of speaker-specific-keyword models, we use tMM based speaker models trained on clean speech, along with a simple Delay and Sum Beamformer (DSB). In the second stage, the speakers are localized within the active sectors using a novel region constrained localization technique based on time difference of arrival (TDOA). Since the problem being addressed is a multi-label classification task, we use the average Hamming score (accuracy) as the performance metric. Although the proposed approach yields an accuracy of 100 % in an anechoic setting for detecting both the speakers and their corresponding sectors in two-speaker mixture signals, the performance degrades to an accuracy of 67 % in a reverberant setting, with a $60$ dB reverberation time (RT60) of 300 ms. To improve the performance under reverberation, prior knowledge of the location of multiple sources is derived using a novel technique derived from geometrical insights into TDOA estimation. With this prior knowledge, the accuracy of the proposed approach improves to 91 %. It is worthwhile to note that, the accuracies are computed for mixture signals containing more than 90 % overlap of competing speakers.
The proposed LV framework offers a convenient methodology to represent information at broad levels. In this thesis, we have shown its use with three different levels. This can be extended to several such levels to be applicable for a generic analysis of the acoustic scene consisting of broad levels of events. It will turn out that not all levels are dependent on each other and hence the LV dependencies can be minimized by independence assumption, which will lead to solving several smaller sub-problems, as we have shown above. The LV framework is also attractive to incorporate prior knowledge about the acoustic setting, which is combined with the evidence from the data to derive the information about the presence of an acoustic event. The performance of the framework, is dependent on the choice of stochastic models, which model the likelihood function of the data given the presence of acoustic events. However, it provides an access to compare and contrast the use of different stochastic models for representing the likelihood function.2016-09-14T18:30:00ZWeighted Average Based Clock Synchronization Protocols For Wireless Sensor NetworksSwain, Amulya Ratnahttp://hdl.handle.net/2005/25512016-07-12T07:28:17Z2016-07-11T18:30:00ZTitle: Weighted Average Based Clock Synchronization Protocols For Wireless Sensor Networks
Authors: Swain, Amulya Ratna
Abstract: Wireless Sensor Networks (WSNs) consist of a large number of resource constrained sensor nodes equipped with various sensing devices which can monitor events in the real world. There are various applications such as environmental monitoring, target tracking forest fire detection, etc., which require clock synchronization among the sensor nodes with certain accuracy. However, a major constraint in the design of clock synchronization protocols in WSNs is that sensor nodes of WSNs have limited energy and computing resources. Clock synchronization process in the WSNs is carried out at each sensor node either synchronously, i.e., periodically during the same real-time interval, which we call synchronization phase, or asynchronously, i.e., independently without worrying about what other nodes are doing for clock synchronization. A disadvantage of asynchronous clock synchronization protocols is that they require the sensor nodes to remain awake all the time. Therefore, they cannot be integrated with any sleep-wakeup scheduling scheme of sensor nodes, which is a major technique to reduce energy consumption in WSNs. On the other hand, synchronous clock synchronization protocols can be easily integrated with the synchronous sleep-wakeup scheduling scheme of sensor nodes, and at the same time, they can provide support to achieve sleep-wakeup scheduling of sensor nodes. Essentially, there are two ways to synchronize the clocks of a WSN, viz. internal clock synchronization and external clock synchronization. The existing approaches to internal clock synchronization in WSNs are mostly hop-by-hop in nature, which is difficult to maintain. There are also many application scenarios where external clock synchronization is the only option to synchronize the clocks of a WSN. Besides, it is also desired that the internal clock synchronization protocol used is fault-tolerant to message loss and node failures. Moreover, when the external source fails or reference node fails, the external clock synchronization protocol should revert back to internal clock synchronization protocol with/without using any reference node. Towards this goal, first we propose three fully distributed synchronous clock synchronization protocols, called Energy Efficient and Fault-tolerant Clock Synchronization (EFCS) protocol, Weighted Average Based Internal Clock Synchronization (WICS) protocol, and Weighted Average Based External Clock Synchronization (WECS) protocol, for WSNs making use of peer-to-peer approach. These three protocols are dynamically interchangeable depending upon the availability of external source or reference nodes. In order to ensure consistency of the synchronization error in the long run, the neighboring nodes need to be synchronized with each other at about the same real time, which requires that the synchronization phases of the neighboring nodes always overlap with each other. To realize this objective, we propose a novel technique of pullback, which ensures that the synchronization phases of the neighboring nodes always overlap. In order to further improve the synchronization accuracy of the EFCS, WICS, and WECS protocol, we have proposed a generic technique which can be applied to any of these protocols, and the improved protocols are referred as IEFCS, IWICS, and IWECS respectively. We then give an argument to show that the synchronization error in the improved protocols is much less than that in the original protocols. We have analyzed these protocols for bounds on synchronization error, and shown that the synchronization error is always upper bounded. We have evaluated the performance of these protocols through simulation and experimental studies, and shown that the synchronization accuracy achieved by these protocols is of the order of a few clock ticks even in very large networks. The proposed protocols make use of estimated drift rate to provide logical time from the physical clock value at any instant and at the same time ensure the monotonicity of logical time even though physical clock is updated at the end of each synchronization phase. We have also proposed an energy aware routing protocol with sleep scheduling, which can be integrated with the proposed clock synchronization protocols to reduce energy consumption in WSNs further.2016-07-11T18:30:00Z