PeerJ Computer Science Preprints: Artificial Intelligencehttps://peerj.com/preprints/index.atom?journal=cs&subject=8300Artificial Intelligence articles published in PeerJ Computer Science PreprintsMultilayered model of speechhttps://peerj.com/preprints/15762017-11-202017-11-20Andrey Chistyakov
Today there are no grammar systems, which allow the creation of a fundamentally new word and concept. All existing grammar systems only work by referring to previously chosen terms, on the bases of which all definitions are created. Implementation of operation on grammar systems are introduced for creation of new terms. The main prerequisite of the research was the rejection of finding a universal solution, which is true for any person. Instead of this, all people were divided into groups according MBTI classification. Each group was expected to have uniform perception of new knowledge. Assessment has been conducted to each group. A mathematical model was created as result of the communication. The scheme of dividing words and sentences into components are shown in the first section of the article. The second section shows a notation construction from the components and the notation packing in memory. The third section shows the ability for conscious memory access (self-action and self-image). As a result, the model of human speech was structured, in where it is possible to create new terms from new knowledge independently.

Today there are no grammar systems, which allow the creation of a fundamentally new word and concept. All existing grammar systems only work by referring to previously chosen terms, on the bases of which all definitions are created. Implementation of operation on grammar systems are introduced for creation of new terms. The main prerequisite of the research was the rejection of finding a universal solution, which is true for any person. Instead of this, all people were divided into groups according MBTI classification. Each group was expected to have uniform perception of new knowledge. Assessment has been conducted to each group. A mathematical model was created as result of the communication. The scheme of dividing words and sentences into components are shown in the first section of the article. The second section shows a notation construction from the components and the notation packing in memory. The third section shows the ability for conscious memory access (self-action and self-image). As a result, the model of human speech was structured, in where it is possible to create new terms from new knowledge independently.

Assistive guidance system for the visually impairedhttps://peerj.com/preprints/34102017-11-142017-11-14Rohit TakharTushar SharmaUdit AroraSohit Verma
In recent years, with the improvement in imaging technology, the quality of small cameras have significantly improved. Coupled with the introduction of credit-card sized single-board computers such as Raspberry Pi, it is now possible to integrate a small camera with a wearable computer. This paper aims to develop a low cost product, using a webcam and Raspberry Pi, for visually-impaired people, which can assist them in detecting and recognising pedestrian crosswalks and staircases. There are two steps involved in detection and recognition of the obstacles i.e pedestrian crosswalks and staircases. In detection algorithm, we extract Haar features from the video frames and push these features to our Haar classifier. In recognition algorithm, we first convert the RGB image to HSV and apply histogram equalization to make the pixel intensity uniform. This is followed by image segmentation and contour detection. These detected contours are passed through a pre-processor which extracts the region of interests (ROI). We applied different statistical methods on these ROI to differentiate between staircases and pedestrian crosswalks. The detection and recognition results on our datasets demonstrate the effectiveness of our system.

In recent years, with the improvement in imaging technology, the quality of small cameras have significantly improved. Coupled with the introduction of credit-card sized single-board computers such as Raspberry Pi, it is now possible to integrate a small camera with a wearable computer. This paper aims to develop a low cost product, using a webcam and Raspberry Pi, for visually-impaired people, which can assist them in detecting and recognising pedestrian crosswalks and staircases. There are two steps involved in detection and recognition of the obstacles i.e pedestrian crosswalks and staircases. In detection algorithm, we extract Haar features from the video frames and push these features to our Haar classifier. In recognition algorithm, we first convert the RGB image to HSV and apply histogram equalization to make the pixel intensity uniform. This is followed by image segmentation and contour detection. These detected contours are passed through a pre-processor which extracts the region of interests (ROI). We applied different statistical methods on these ROI to differentiate between staircases and pedestrian crosswalks. The detection and recognition results on our datasets demonstrate the effectiveness of our system.

Cricket umpire assistance and ball tracking system using a single smartphone camerahttps://peerj.com/preprints/34022017-11-102017-11-10Udit AroraSohit VermaSarthak SahniTushar Sharma
Several ball tracking algorithms have been reported in literature. However, most of them use high-quality video and multiple cameras, and the emphasis has been on coordinating the cameras or visualizing the tracking results. This paper aims to develop a system for assisting the umpire in the sport of Cricket in making decisions like detection of no-balls, wide-balls, leg before wicket and bouncers, with the help of a single smartphone camera. It involves the implementation of Computer Vision algorithms for object detection and motion tracking, as well as the integration of machine learning algorithms to optimize the results.
Techniques like Histogram of Gradients (HOG) and Support Vector Machine (SVM) are used for object classification and recognition. Frame subtraction, minimum enclosing circle, and contour detection algorithms are optimized and used for the detection of a cricket ball. These algorithms are applied using the Open Source Python Library - OpenCV. Machine Learning techniques - Linear and Quadratic Regression are used to track and predict the motion of the ball. It also involves the use of open source Python library VPython for the visual representation of the results. The paper describes the design and structure for the approach undertaken in the system for analyzing and visualizing off-air low-quality cricket videos.

Several ball tracking algorithms have been reported in literature. However, most of them use high-quality video and multiple cameras, and the emphasis has been on coordinating the cameras or visualizing the tracking results. This paper aims to develop a system for assisting the umpire in the sport of Cricket in making decisions like detection of no-balls, wide-balls, leg before wicket and bouncers, with the help of a single smartphone camera. It involves the implementation of Computer Vision algorithms for object detection and motion tracking, as well as the integration of machine learning algorithms to optimize the results.

Techniques like Histogram of Gradients (HOG) and Support Vector Machine (SVM) are used for object classification and recognition. Frame subtraction, minimum enclosing circle, and contour detection algorithms are optimized and used for the detection of a cricket ball. These algorithms are applied using the Open Source Python Library - OpenCV. Machine Learning techniques - Linear and Quadratic Regression are used to track and predict the motion of the ball. It also involves the use of open source Python library VPython for the visual representation of the results. The paper describes the design and structure for the approach undertaken in the system for analyzing and visualizing off-air low-quality cricket videos.

GenHap: A novel computational method based on genetic algorithms for haplotype assemblyhttps://peerj.com/preprints/32462017-09-122017-09-12Andrea TangherloniSimone SpolaorLeonardo RundoMarco S NobileIvan MerelliPaolo CazzanigaDaniela BesozziGiancarlo MauriPietro Liò
The process of inferring a full haplotype of a cell is known as haplotyping, which consists in assigning all heterozygous Single Nucleotide Polymorphisms (SNPs) to exactly one of the two chromosomes. In this work, we propose a novel computational method for haplotype assembly based on Genetic Algorithms (GAs), named GenHap. Our approach could efficiently solve large instances of the weighted Minimum Error Correction (wMEC) problem, yielding optimal solutions by means of a global search process. wMEC consists in computing the two haplotypes that partition the sequencing reads into two unambiguous sets with the least number of corrections to the SNP values. Since wMEC was proven to be an NP-hard problem, we tackle this problem exploiting GAs, a population-based optimization strategy that mimics Darwinian processes. In GAs, a population composed of randomly generated individuals undergoes a selection mechanism and is modified by genetic operators. Based on a quality measure (i.e., the fitness value), inspired by Darwin’s “survival of the fittest” laws, each individual is involved in a selection process.
Our preliminary experimental results show that GenHap is able to achieve correct solutions in short running times. Moreover, this approach can be used to compute haplotypes in organisms with different ploidity. The proposed evolutionary technique has the advantage that it could be formulated and extended using a multi-objective fitness function taking into account additional insights, such as the methylation patterns of the different chromosomes or the gene proximity in maps achieved through Chromosome Conformation Capture (3C) experiments.

The process of inferring a full haplotype of a cell is known as haplotyping, which consists in assigning all heterozygous Single Nucleotide Polymorphisms (SNPs) to exactly one of the two chromosomes. In this work, we propose a novel computational method for haplotype assembly based on Genetic Algorithms (GAs), named GenHap. Our approach could efficiently solve large instances of the weighted Minimum Error Correction (wMEC) problem, yielding optimal solutions by means of a global search process. wMEC consists in computing the two haplotypes that partition the sequencing reads into two unambiguous sets with the least number of corrections to the SNP values. Since wMEC was proven to be an NP-hard problem, we tackle this problem exploiting GAs, a population-based optimization strategy that mimics Darwinian processes. In GAs, a population composedof randomly generated individuals undergoes a selection mechanism and is modified by genetic operators. Based on a quality measure (i.e., the fitness value), inspired by Darwin’s “survival of the fittest” laws, each individual is involved in a selection process.

Our preliminary experimental results show that GenHap is able to achieve correct solutions in short running times. Moreover, this approach can be used to compute haplotypes in organisms with different ploidity. The proposed evolutionary technique has the advantage that it could be formulated and extended using a multi-objective fitness function taking into account additional insights, such as the methylation patterns of the different chromosomes or the gene proximity in maps achieved through Chromosome Conformation Capture (3C) experiments.

PGxO: A very lite ontology to reconcile pharmacogenomic knowledge unitshttps://peerj.com/preprints/31402017-08-112017-08-11Pierre MonninClément JonquetJoël LegrandAmedeo NapoliAdrien Coulet
We present in this article a lightweight ontology named PGxO and a set of rules for its instantiation, which we developed as a frame for reconciling and tracing pharmacogenomics (PGx) knowledge. PGx studies how genomic variations impact variations in drug response phenotypes. Knowledge in PGx is typically composed of units that have the form of ternary relationships gene variant–drug–adverse event, stating that an adverse event may occur for patients having the gene variant when being exposed to the drug. These knowledge units (i) are available in reference databases, such as PharmGKB, are reported in the scientific biomedical literature and (ii) may be discovered by mining clinical data such as Electronic Health Records (EHRs). Therefore, knowledge in PGx is heterogeneously described (i.e., with various quality, granularity, vocabulary, etc.). It is consequently worth to extract, then compare, assertions from distinct resources. Using PGxO, one can represent multiple provenances for pharmacogenomic knowledge units, and reconcile duplicates when they come from distinct sources.

We present in this article a lightweight ontology named PGxO and a set of rules for its instantiation, which we developed as a frame for reconciling and tracing pharmacogenomics (PGx) knowledge. PGx studies how genomic variations impact variations in drug response phenotypes. Knowledge in PGx is typically composed of units that have the form of ternary relationships gene variant–drug–adverse event, stating that an adverse event may occur for patients having the gene variant when being exposed to the drug. These knowledge units (i) are available in reference databases, such as PharmGKB, are reported in the scientific biomedical literature and (ii) may be discovered by mining clinical data such as Electronic Health Records (EHRs). Therefore, knowledge in PGx is heterogeneously described (i.e., with various quality, granularity, vocabulary, etc.). It is consequently worth to extract, then compare, assertions from distinct resources. Using PGxO, one can represent multiple provenances for pharmacogenomic knowledge units, and reconcile duplicates when they come from distinct sources.

Multi-label classification of frog species via deep learninghttps://peerj.com/preprints/30072017-06-062017-06-06Jie Xie
Acoustic classification of frogs has received increasing attention for its promising application in ecological studies. Various studies have been proposed for classifying frog species, but most recordings are assumed to have only a single species. In this study, a method to classify multiple frog species in an audio clip is presented. To be specific, continuous frog recordings are first cropped into audio clips (10 seconds). Then, various time-frequency representations are generated for each 10-s recording. Next, instead of using traditional hand-crafted features, a deep learning algorithm is used to find the most important feature. Finally, a binary relevance based multi-label classification approach is proposed to classify simultaneously vocalizing frog species with our proposed features. Experimental results show that our proposed features extracted using deep learning can achieve better classification performance when compared to hand-crafted features for frog call classification.

Acoustic classification of frogs has received increasing attention for its promising application in ecological studies. Various studies have been proposed for classifying frog species, but most recordings are assumed to have only a single species. In this study, a method to classify multiple frog species in an audio clip is presented. To be specific, continuous frog recordings are first cropped into audio clips (10 seconds). Then, various time-frequency representations are generated for each 10-s recording. Next, instead of using traditional hand-crafted features, a deep learning algorithm is used to find the most important feature. Finally, a binary relevance based multi-label classification approach is proposed to classify simultaneously vocalizing frog species with our proposed features. Experimental results show that our proposed features extracted using deep learning can achieve better classification performance when compared to hand-crafted features for frog call classification.

Parallel and in-process compilation of individuals for genetic programming on GPUhttps://peerj.com/preprints/29362017-04-192017-04-19Hakan AyralSongül Albayrak
Three approaches to implement genetic programming on GPU hardware are compilation, interpretation and direct generation of machine code. The compiled approach is known to have a prohibitive overhead compared to other two.
This paper investigates methods to accelerate compilation of individuals for genetic programming on GPU hardware. We apply in-process compilation to minimize the compilation overhead at each generation; and we investigate ways to parallelize in-process compilation. In-process compilation doesn’t lend itself to trivial parallelization with threads; we propose a multiprocess parallelization using memory sharing and operating systems interprocess communication primitives. With parallelized compilation we achieve further reductions on compilation overhead. Another contribution of this work is the code framework we built in C# for the experiments. The framework makes it possible to build arbitrary grammatical genetic programming experiments that run on GPU with minimal extra coding effort, and is available as open source.

Three approaches to implement genetic programming on GPU hardware are compilation, interpretation and direct generation of machine code. The compiled approach is known to have a prohibitive overhead compared to other two.

This paper investigates methods to accelerate compilation of individuals for genetic programming on GPU hardware. We apply in-process compilation to minimize the compilation overhead at each generation; and we investigate ways to parallelize in-process compilation. In-process compilation doesn’t lend itself to trivial parallelization with threads; we propose a multiprocess parallelization using memory sharing and operating systems interprocess communication primitives. With parallelized compilation we achieve further reductions on compilation overhead. Another contribution of this work is the code framework we built in C# for the experiments. The framework makes it possible to build arbitrary grammatical genetic programming experiments that run on GPU with minimal extra coding effort, and is available as open source.

Accelerating the XGBoost algorithm using GPU computinghttps://peerj.com/preprints/29112017-04-042017-04-04Rory MitchellEibe Frank
We present a CUDA based implementation of a decision tree construction algorithm within the gradient boosting library XGBoost. The tree construction algorithm is executed entirely on the GPU and shows high performance with a variety of datasets and settings, including sparse input matrices. Individual boosting iterations are parallelized, combining two approaches. An interleaved approach is used for shallow trees, switching to a more conventional radix sort based approach for larger depths. We show speedups of between 3-6x using a Titan X compared to a 4 core i7 CPU, and 1.2x using a Titan X compared to 2x Xeon CPUs (24 cores). We show that it is possible to process the Higgs dataset (10 million instances, 28 features) entirely within GPU memory. The algorithm is made available as a plug-in within the XGBoost library and fully supports all XGBoost features including classification, regression and ranking tasks.

We present a CUDA based implementation of a decision tree construction algorithm within the gradient boosting library XGBoost. The tree construction algorithm is executed entirely on the GPU and shows high performance with a variety of datasets and settings, including sparse input matrices. Individual boosting iterations are parallelized, combining two approaches. An interleaved approach is used for shallow trees, switching to a more conventional radix sort based approach for larger depths. We show speedups of between 3-6x using a Titan X compared to a 4 core i7 CPU, and 1.2x using a Titan X compared to 2x Xeon CPUs (24 cores). We show that it is possible to process the Higgs dataset (10 million instances, 28 features) entirely within GPU memory. The algorithm is made available as a plug-in within the XGBoost library and fully supports all XGBoost features including classification, regression and ranking tasks.

Finding the optimal bayesian network given a constraint graphhttps://peerj.com/preprints/28722017-03-142017-03-14Jacob M SchreiberWilliam S Noble
Despite recent algorithmic improvements, learning the optimal structure of a Bayesian network from data is typically infeasible past a few dozen variables. Fortunately, domain knowledge can frequently be exploited to achieve dramatic computational savings, and in many cases domain knowledge can even make structure learning tractable. Several methods have previously been described for representing this type of structural prior knowledge, including global orderings, super-structures, and constraint rules. While super-structures and constraint rules are flexible in terms of what prior knowledge they can encode, they achieve savings in memory and computational time simply by avoiding considering invalid graphs. We introduce the concept of a "constraint graph" as an intuitive method for incorporating rich prior knowledge into the structure learning task. We describe how this graph can be used to reduce the memory cost and computational time required to find the optimal graph subject to the encoded constraints, beyond merely eliminating invalid graphs. In particular, we show that a constraint graph can break the structure learning task into independent subproblems even in the presence of cyclic prior knowledge. These subproblems are well suited to being solved in parallel on a single machine or distributed across many machines without excessive communication cost.

Despite recent algorithmic improvements, learning the optimal structure of a Bayesian network from data is typically infeasible past a few dozen variables. Fortunately, domain knowledge can frequently be exploited to achieve dramatic computational savings, and in many cases domain knowledge can even make structure learning tractable. Several methods have previously been described for representing this type of structural prior knowledge, including global orderings, super-structures, and constraint rules. While super-structures and constraint rules are flexible in terms of what prior knowledge they can encode, they achieve savings in memory and computational time simply by avoiding considering invalid graphs. We introduce the concept of a "constraint graph" as an intuitive method for incorporating rich prior knowledge into the structure learning task. We describe how this graph can be used to reduce the memory cost and computational time required to find the optimal graph subject to the encoded constraints, beyond merely eliminating invalid graphs. In particular, we show that a constraint graph can break the structure learning task into independent subproblems even in the presence of cyclic prior knowledge. These subproblems are well suited to being solved in parallel on a single machine or distributed across many machines without excessive communication cost.

Aesthetic local search of wind farm layoutshttps://peerj.com/preprints/28642017-03-102017-03-10Michael MayoMaisa Daoud
The visual impact of wind farm layouts has seen little consideration in the literature on the wind farm layout optimisation problem to date. Most existing algorithms focus on optimising layouts for power or cost of energy alone. In this paper, we consider the geometry of wind farm layouts and whether it is possible to bi-optimise a layout for both energy efficiency and the degree of visual impact that the layout exhibits. We develop a novel optimisation approach for solving the problem, with our approach towards measuring mathematically the degree of visual impact drawing inspiration from the field of architecture. To evaluate our ideas, we demonstrate them on three benchmark problems for the wind farm layout optimisation problem in conjunction with two recently published stochastic local search algorithms. Optimal patterned layouts are shown to be very close in terms of energy efficiency to optimal non-patterned layouts.

The visual impact of wind farm layouts has seen little consideration in the literature on the wind farm layout optimisation problem to date. Most existing algorithms focus on optimising layouts for power or cost of energy alone. In this paper, we consider the geometry of wind farm layouts and whether it is possible to bi-optimise a layout for both energy efficiency and the degree of visual impact that the layout exhibits. We develop a novel optimisation approach for solving the problem, with our approach towards measuring mathematically the degree of visual impact drawing inspiration from the field of architecture. To evaluate our ideas, we demonstrate them on three benchmark problems for the wind farm layout optimisation problem in conjunction with two recently published stochastic local search algorithms. Optimal patterned layouts are shown to be very close in terms of energy efficiency to optimal non-patterned layouts.