Adaptive gain equalizer (AGE) is a commonly used single-channel speech enhancement algorithm. AGE and its variants has been widely used for speech enhancement applications. There are two broad categories of these variants. The first deals with its improvement in time-frequency domain with readjustment of the used parameters and the second one deals with performing the main filtering operation in modulation frequency domain. This paper evaluates the working of AGE in modulation frequency domain with the use of a demodulation technique which solves the demodulation process as a convex optimization problem. The performance of the modified AGE is compared with the traditional AGE and another modulation frequency domain AGE based on demodulation using the spectral center-of-gravity. These used performance measures are Signal to Noise Ratio Improvement(SNRI), Spectral Distortion(SD) and Mean Option Score(MOS).

his paper presents a single channel speech enhancement technique based on sub-band modulator Kalman filtering for laryngeal (normal) and alaryngeal (Esophageal speech) speech signals. The noisy speech signal is decomposed into sub-bands and subsequently each sub-band is demodulated into its modulator and carrier components. Kalman filter is applied to modulators of all sub-bands without altering the carriers. Performance of the proposed system has been validated by Mean Opinion Score (MOS) for laryngeal and Harmonic to Noise Ratio (HNR) for alaryngeal speech. An improvement of 20% has been observed in MOS over sub-band Kalman filtering for laryngeal speech, while 3 to 4 dB enhancement in HNR has been observed for alaryngeal speech over the full-band Kalman filtering.

Video streaming services are offered over the Internet and since the service providers do not have full control over the network conditions all the way to the end user, streaming technologies have been developed to maintain the quality of service in these varying network conditions i.e. so called adaptive video streaming. In order to cater for users’ Quality of Experience (QoE) requirements, HTTP based adaptive streaming solutions of video services have become popular. However, the keys to ensure the users a good QoE with this technology is still not completely understood. User QoE feedback is therefore instrumental in improving this understanding. Controlled laboratory based perceptual quality experiments that involve a panel of human viewers are considered to be the most valid method of the assessment of QoE. Besides laboratory based subjective experiments, crowdsourcing based subjective assessment of video quality is gaining popularity as an alternative method. This article presents insights into a study that investigates perceptual preferences of various adaptive video streaming scenarios through crowdsourcing based and laboratory based subjective assessment. The major novel contribution of this study is the application of Paired Comparison based subjective assessment in a crowdsourcing environment. The obtained results provide some novel indications, besides confirming the earlier published trends, of perceptual preferences for adaptive scenarios of video streaming. Our study suggests that in a network environment with fluctuations in the bandwidth, a medium or low video bitrate which can be kept constant is the best approach. Moreover, if there are only a few drops in bandwidth, one can choose a medium or high bitrate with a single or few buffering events.

In real-time video streaming, video quality can be degraded due to network performance issues. Among other artefacts, video freezing and video jumping are factors that influence user experience. Service providers, operators and manufacturers are interested in evaluating the quality of experience (QoE) objectively because subjective assessment of QoE is expensive and, in many user cases, subjective assessment is not possible to perform. Different algorithms have been proposed and implemented in this regard. Some of them are in the recommendation list of the ITU Telecommunication Standardization Sector (ITU-T). In this paper, we study the effect of the freezing artefact on user experience and compare the mean opinion score of these videos with the results of two algorithms, the perceptual evaluation of video quality (PEVQ) and temporal quality metric (TQM). Both metrics are part of the ITU-T Recommendation J.247 Annex B and C. PEVQ is a full-reference video quality metric, whereas TQM is a no-reference quality metric. Another contribution of this paper is the study of the impact of different resolutions and frame rates on user experience and how accurately PEVQ and TQM measure varying frame rates.

Video streaming and multimedia applications are getting popular with the growth of networks. In real-time video streaming, video quality can be degraded due to network performance issues. Among other artifacts, freezing and frame dropping are factors that influence user experience. Service providers, operators, and researchers are interested to measure the Quality of Experience objectively. Different algorithms have been proposed and implemented in this regard. Some of them are in the recommendation list of the ITU Telecommunication Standardization Sector (ITU-T). In this paper, we study the effect of the freezing artifact on user experience and compare the mean opinion score (MOS) of these videos with the results of two algorithms, Perceptual Evaluation of Video Quality (PEVQ) and Temporal Quality Metric, both being part of ITU-T Recommendation J.247 Annex B and C, respectively. Another contribution of this paper is the investigation of the impact of different resolutions and frame rates on user experience.

In order to estimate subjective video quality, we usually deal with a large number of features and a small sample set. Applying regression on complex datasets may lead to imprecise solutions due to possibly irrelevant or noisy features as well as the effect of overfitting. In this work, we propose a No-Reference (NR) method for the estimation of the quality of videos that are impaired by both compression artifacts and packet losses. Particularly, in an effort to establish a robust regression model that generalizes well to unknown data and to increase Mean Opinion Score (MOS) estimation accuracy, we propose a frame-level MOS estimation approach, where the MOS estimate of a sequence is obtained by averaging the perframe MOS estimates, instead of performing regression directly at the sequence-level. Since it is impractical to obtain the actual perframe MOS values through subjective experiments, we propose an objective metric able to do this task. Thus, our proposed NR method has the dual benefit of offering improved sequence-level MOS estimation accuracy, while giving an indication of the relative quality of each individual video frame.

In this work, we propose a No-Reference (NR) bitstream-based model for predicting the quality of H.264/AVC video sequences, a effected by both compression artifacts and transmission impairments. The concept of the article is based on a feature extraction procedure, where a large number of features are calculated from the impaired bitstream. Many of the features are mostly proposed in this work, while the specific c set of the features as a whole is applied for the first time for making NR video quality predictions. All feature observations are taken as input to the Least Absolute Shrinkage and Selection Operator (LASSO) regression method. LASSO indicates the most important features, and using only them, it is able to estimate the Mean Opinion Score (MOS) with high accuracy. Indicatively, we point out that only 13 features are able to produce a Pearson Correlation Coefficient of 0:92 with the MOS. Interestingly, the performance statistics we computed in order to assess our method for predicting the Structural Similarity Index and the Video Quality Metric are equally good. Thus, the obtained experimental results verifi ed the suitability of the features selected by LASSO as well as the ability of LASSO in making accurate predictions through sparse modeling.

The growing consumer interest in video communication has increased the users' awareness in the visual quality of the delivered media. This in turn increases, at the service provider end, the need for intelligent methodologies of optimal techniques for adapting to varying network conditions. Recent studies show that constraints on the bandwidth of transmission media should not always be translated to an increase in compression ratio to lower the bitrate of the video. Instead, a suitable option for adaptive streaming is to scale down the video temporally or spatially before encoding to maintain a desirable level of perceptual quality, while the viewing resolution is constant. Most of the existing studies to examine these scenarios are either limited to low resolution videos or lack in provisioning of subjective assessment of quality. We present here the results of our campaign of subjective quality assessment experiments done on a range of spatial and temporal resolutions, up to VGA and 30 frames per second respectively, under a number of bitrate conditions. The analysis shows, among other things, that keeping the spatial resolution is perceptually preferred among the three parameters that have impact on the video quality, even in the case with high temporal activity.

The overwhelming trend of the usage of multimedia services has raised the consumers' awareness about quality. Both service providers and consumers are interested in the delivered level of perceptual quality. The perceptual quality of an original video signal can get degraded due to compression and due to its transmission over a lossy network. Video quality assessment (VQA) has to be performed in order to gauge the level of video quality. Generally, it can be performed by following subjective methods, where a panel of humans judges the quality of video, or by using objective methods, where a computational model yields an estimate of the quality. Objective methods and specifically No-Reference (NR) or Reduced-Reference (RR) methods are preferable because they are practical for implementation in real-time scenarios. This doctoral thesis begins with a review of existing approaches proposed in the area of NR image and video quality assessment. In the review, recently proposed methods of visual quality assessment are classified into three categories. This is followed by the chapters related to the description of studies on the development of NR and RR methods as well as on conducting subjective experiments of VQA. In the case of NR methods, the required features are extracted from the coded bitstream of a video, and in the case of RR methods additional pixel-based information is used. Specifically, NR methods are developed with the help of suitable techniques of regression using artificial neural networks and least-squares support vector machines. Subsequently, in a later study, linear regression techniques are used to elaborate the interpretability of NR and RR models with respect to the selection of perceptually significant features. The presented studies on subjective experiments are performed using laboratory based and crowdsourcing platforms. In the laboratory based experiments, the focus has been on using standardized methods in order to generate datasets that can be used to validate objective methods of VQA. The subjective experiments performed through crowdsourcing relate to the investigation of non-standard methods in order to determine perceptual preference of various adaptation scenarios in the context of adaptive streaming of high-definition videos. Lastly, the use of adaptive gain equalizer in the modulation frequency domain for speech enhancement has been examined. To this end, two methods of demodulating speech signals namely spectral center of gravity carrier estimation and convex optimization have been studied.

This paper evaluates speech enhancement by filtering in the modulation frequency domain, as an alternative to filtering in conventional frequency domain. Adaptive Gain Equalizer (AGE) is a commonly used single-channel speech enhancement algorithm. A recently introduced class of signal transformations called modulation transform has successfully made its place alongside classical time/frequency representations. This paper presents an implementation of AGE within modulation system, for the purpose of enhancing the speech signal. The successful implementation of the proposed system has been validated with various performance measurements, i.e., Signal to Noise Ratio Improvement (SNRI), Mean Opinion Score (MOS) and Spectral Distortion (SD). A spectrogram analysis is also presented to further substantiate the performance of this work

Popularity of the streaming media content such as videos can be ascribed to the perceptual quality, to some extent, of the content. The traditional methods of audio/video quality assessment lack in provision of the input from higher cognitive of the human perception. Some studies have revealed that liking or disliking of a certain content can bias the human judgement towards video quality. In this paper, we have examined the impact of the use of semantic quality indicators namely audio content, audio quality, video content, and video quality in the assessment of quality of a video. Further, we have proposed a methodology to use these indicators for designing a prediction model for the popularity of streaming videos.

Reduced-reference (RR) and no-reference (NR) models for video quality estimation, using featuresthat account for the impact of coding artifacts, spatio-temporal complexity, and packet losses, are proposed. Thepurpose of this study is to analyze a number of potentially quality-relevant features in order to select the mostsuitable set of features for building the desired models. The proposed sets of features have not been used in theliterature and some of the features are used for the first time in this study. The features are employed by the leastabsolute shrinkage and selection operator (LASSO), which selects only the most influential of them toward per-ceptual quality. For comparison, we apply feature selection in the complete feature sets and ridge regression onthe reduced sets. The models are validated using a database of H.264/AVC encoded videos that were subjec-tively assessed for quality in an ITU-T compliant laboratory. We infer that just two features selected by RRLASSO and two bitstream-based features selected by NR LASSO are able to estimate perceptual qualitywith high accuracy, higher than that of ridge, which uses more features. The comparisons with competingworks and two full-reference metrics also verify the superiority of our models.

In the video encoding process, the motion estimation usually consumes a large part of the encoder computations. This paper presents motion estimation techniques, targeted mainly for MPEG-4 video encoding but also applicable for other video codecs e.g. H.264. A high quality adaptive algorithm with adjustable complexity, based on partially blind prediction for motion estimation, is proposed.The computational complexity of motion estimation is reduced with minor loss in the video quality. In the paper, the quality metrics PSNR, BD PSNR and PEVQ are used, and the possible trade off between complexity and visual quality is studied.

The growing need of quick and online estimation of video quality necessitates the study of new frontiers in the area of no-reference visual quality assessment. Bitstream-layer model based video quality predictors use certain visual quality relevant features from the encoded video bitstream to estimate the quality. Contemporary techniques vary in the number and nature of features employed and the use of prediction model. This paper proposes a prediction model with a concise set of bitstream based features and a machine learning based quality predictor. Several full reference quality metrics are predicted using the proposed model with reasonably good levels of accuracy, monotonicity and consistency.

There is a growing need for robust methods for reference free perceptual quality measurements due to the increasing use of video in hand-held multimedia devices. These methods are supposed to consider pertinent artifacts introduced by the compression algorithm selected for source coding. This paper proposes a model that uses readily available encoder parameters as input to an artificial neural network to predict objective quality metrics for compressed video without using any reference and without need for decoding. The results verify its robustness for prediction of objective quality metrics in general and for PEVQ and PSNR in particular. The paper also focuses on reducing the complexity of the neural network.

The field of perceptual quality assessment has gone through a wide range of developments and it is still growing. In particular, the area of no-reference (NR) image and video quality assessment has progressed rapidly during the last decade. In this article, we present a classification and review of latest published research work in the area of NR image and video quality assessment. The NR methods of visual quality assessment considered for review are structured into categories and subcategories based on the types of methodologies used for the underlying processing employed for quality estimation. Overall, the classification has been done into three categories, namely, pixel-based methods, bitstream-based methods, and hybrid methods of the aforementioned two categories. We believe that the review presented in this article will be helpful for practitioners as well as for researchers to keep abreast of the recent developments in the area of NR image and video quality assessment. This article can be used for various purposes such as gaining a structured overview of the field and to carry out performance comparisons for the state-of-the-art methods.

This article presents a case study, performed at Blekinge Institute of Technology (BTH), Sweden, about the topic selection routines for a graduate thesis. The study focuses on the international graduate students who are having different academic cultures of their respective countries. Given that BTH has succeeded in the provision of an academic environment that has been efficient in absorbing different academic cultures in a productive manner at a reasonably good scale. However, in a multi-cultural educational environment, it is a challenge for most international students to adapt to the new academic culture and select the graduate thesis topic according to their real potential. Our findings gathered through an online survey, questionnaire, and focus group discussion is presented. The conclusions indicate, albeit, BTH has well defined routines for the thesis selection, the international graduate students face problems at the stage of thesis selection. The article concludes with suggestions to refine the thesis selection process at the micro level to help both students and staff.

Advancements in the video processing area have been proliferated by services that require low delay. Such services involve applications being offered at various temporal and spatial resolutions. It necessitates to study the impacts of related video coding conditions upon perceptual quality. But most of studies concerned with quality assessment of videos affected by coding distortions lack in variety of spatio-temporal resolutions. This paper presents a work done on quality assessment of videos encoded by state-of-the-art H.264/AVC standard at different bitrates and frame rates. Overall, 120 test scenarios for video sequences having different spatial and temporal spectral information were studied. The used coded bistreams in this work and the corresponding subjective assessment scores have been made public for the research community to facilitate further studies

In order to cater for user’s quality of experience (QoE) requirements, HTTP adaptive streaming (HAS) based solutions of video services have become popular recently. User QoE feedback can be instrumental in improving the capabilities of such services. Perceptual quality experiments that involve humans are considered to be the most valid method of the assessment of QoE. Besides lab-based subjective experiments, crowdsourcing based subjective assessment of video quality is gaining popularity as an alternative method. This paper presents insights into a study that investigates perceptual preferences of various adaptive video streaming scenarios through crowdsourcing based subjective quality assessment.

With the recent increased popularity and high usage of HTTP Adaptive Streaming (HAS) techniques, various studies have been carried out in this area which generally focused on the technical enhancement of HAS technology and applications. However, a lack of common HAS standard led to multiple proprietary approaches which have been developed by major Internet companies. In the emerging MPEG-DASH standard the packagings of the video content and HTTP syntax have been standardized; but all the details of the adaptation behavior are left to the client implementation. Nevertheless, to design an adaptation algorithm which optimizes the viewing experience of the enduser, the multimedia service providers need to know about the Quality of Experience (QoE) of different adaptation schemes. Taking this into account, the objective of this experiment was to study the QoE of a HAS-based video broadcast model. The experiment has been carried out through a subjective study of the end user response to various possible clients' behavior for changing the video quality taking different QoE-influence factors into account. The experimental conclusions have made a good insight into the QoE of different adaptation schemes which can be exploited by HAS clients for designing the adaptation algorithms.

The growing popularity of adaptive streaming-based video delivery nowadays has raised the interest about the user's perception when experiencing quality adaptation. The impact of the video content characteristics on user's perceptual quality has already become evident. The aim of this study is to investigate the influence of this factor on the quality of experience of adaptive streaming scenarios. Our results show that the perceptual quality of adaptation strategies applied on videos with high spatial and low temporal amount of activity is significantly lower compared to the other content types.

In wireless networks, due to limited bandwidth and packet losses, seamless and ubiquitous delivery of high-quality video streaming services is a big challenge for the operators. In order to improve the process of online video quality monitoring, the presence of no reference (NR) objective video quality assessment (VQA) methods is required. In some networks, the video decoder on the reception side adopts a mechanism in which last correctly received frame is frozen and displayed on video display terminal until the next correct frame is received. This phenomenon, employed as an error concealment technique, can cause a perceptual jerkiness on the video display terminal. In this paper, we have proposed an enhanced model of objective VQA based on the estimation of jerkiness. A study of three contemporary NR methods, used for objective VQA and online monitoring of videos, has been included along with subjective VQA tests. The subjective tests were performed for a set of video sequences with specific spatial and temporal information. The proposed NR method is based on our careful observations from the subjective test results and our main focus is to cater the effect of multiple frame freeze impairments in video steaming. Comparison with other NR methods shows that the proposed method performs better, in terms of estimating the impact of multiple frame freezing impairments, and has more affinity with the subjective test results.

A digital Video Stabilization (DVS) system removes the unwanted shaking in the videos acquired by hand-held cameras and preserves the panning. In this paper, a digital video stabilization system is proposed based upon adaptive cerebellar model articulation controller (CMAC) filtering. A CMAC is a manifestation of the associative memory learning structure present in the cerebellum of human being. Adaptive CMAC filtering has favorable properties of small size, good generalization, rapid learning and dynamic response. Thus, it is more suitable for high-speed signal processing applications. The adaptive CMAC is used to adjust the coefficients of IIR filter employed in the proposed model. The training of CMAC is based upon fuzzy rule. The efficacy of the proposed adaptive CMAC filtering has been validated by evaluating it on a set of test video sequences.