2016/12/24

What is inconvenience of for loops in R? It is that results you get will be gone away. So we have created a package to store the results automatically. To do it, you only need to cast one line spell magic_for(). In this text, we tell you about how to use the magic.

1. Overview

for() is one of the most popular functions in R. As you know, it is used to create loops.

magic_for() takes a function name, and reconstructs for() to remember values passed to the specified function in for loops. We call it magicalization. Once you call magic_for(), as you just exectute for() as usual, the result will be stored in memory automatically.

Here, let’s use magic_result_as_vector() to access the stored values.

magic_result_as_vector() # Get the result
#> [1] 1 4 9

This is one of the functions to obtain results from magicalized for loops, and means to take out the results as a vector.

Even if the number of observed variables increases, you can do it the same way.

New Feature

Developers are divided in policy to manage R packages on GitHub. If a package is going to be developed in "develop" branch, you may want to install the package from the branch.

gh_install_packages() has ref argument to specify Git references. For instance, you can install awaptools from the "develop" branch as follows:

githubinstall("awaptools", ref = "develop")

You may sometimes encounter failing to install packages because its repository HEAD is broken. In such case, you can specify a tag or commit to ref. In almost cases, tags are added on an unbroken commit. For instance, you can install densratio from the “v0.0.3” tag as follows:

githubinstall("densratio", ref = "v0.0.3")

Even if you cannot find such tags, you can install packages from any commit that is not broken. For instance, you can install densratio from the “e8233e6” commit as follows:

githubinstall("densratio", ref = "e8233e6")

Finally, you may find a patch for fixing bugs as a pull request. In such case, you can specify pull requests to ref using github_pull(). For instance, you can install dplyr from the pull request #2058 as follows:

2016/06/15

1. Overview

A growing number of R packages are created by various people in the world. A part of the cause of it is the devtools package that makes it easy to develop R packages [1]. The devtools package not only facilitates the process to develop R packages but also provides an another way to distribute R packages.

When developers publish R packages, the CRAN [2] is commonly used. You can install the packages that are available on CRAN using install.package(). For example, you can install dplyr package as follows:

Therefore, developers can distribute R packages that is developing on GitHub. Moreover, there are some developers that they have no intention to submit to CRAN. For instance, Twitter, Inc. provides AnomalyDetection package on GitHub but it will not be available on CRAN [3]. You can install such packages easily using devtools.

library(devtools)
install_github("twitter/AnomalyDetection")

There is a difference between install.packages() and install_github() in the required argument. install.packages() takes package names, while install_github() needs repository names. It means that when you want to install a package on GitHub you must remember its repository name correctly.

The trouble is that the usernames of GitHub are often hard to remember. Developers consider the package names so that users can understand the functionalities intuitively. However, they often decide username incautiously. For instance, ggfortify is a great package on GitHub, but who created it? What is the username? The answer is sinhrks[4]. It seems to be difficult to remember it.

The githubinstall package provides a way to install packages on GitHub by only the package names just like install.packages().

The function suggests GitHub repositories. If you type ‘1’ and ‘enter’, then installation of the package will begin. The suggestion is made of looking for the list of R packages on GitHub. The list is provided by Gepuro Task Views.

3.5. Show the Source Code of Functions on GitHub

gh_show_source() looks for the source code of a given function on GitHub, and tries to open the place on Web browser.

gh_show_source("mutate", "dplyr")

If you have loaded the package that the function belongs to, you can input the function directly.

library(dplyr)
gh_show_source(mutate)

This function may do not work well with Safari.

3.6. Update the List of R Packages

The githubinstall package uses Gepuro Task Views for getting the list of R packages on GitHub. Gepuro Task Views is crawling the GitHub and updates information every day. The package downloads the list of R packages from Gepuro Task Views each time it was loaded. Thus, you can always use the newest list of packages on a new R session.

However, you may use an R session for a long time. In such case, gh_update_package_list() is useful.

gh_update_package_list() updates the downloaded list of the R packages explicitly.

2016/04/01

1. Overview

Density ratio estimation is described as follows: for given two data samples $x$ and $y$ from unknown distributions $p(x)$ and $q(y)$ respectively, estimate
$$
w(x) = \frac{p(x)}{q(x)}
$$
where $x$ and $y$ are $d$-dimensional real numbers.

The estimated density ratio function $w(x)$ can be used in many applications such as the inlier-based outlier detection [1] and covariate shift adaptation [2]. Other useful applications about density ratio estimation were summarized by Sugiyama et al. (2012) [3].

The package densratio provides a function densratio() that returns a result has the function to estimate density ratio compute_density_ratio().

The number of kernels is the number of kernels in the linear model. You can change by setting kernel_num parameter. In default, kernel_num = 100.

Bandwidth(sigma) is the Gaussian kernel bandwidth. In default, sigma = "auto", the algorithms automatically select the optimal value by cross validation. If you set sigma a number, that will be used. If you set a numeric vector, the algorithms select the optimal value in them by cross validation.

Centers are centers of Gaussian kernels in the linear model. These are selected at random from the data sample x underlying a numerator distribution p_nu(x). You can find the whole values in result$kernel_info$centers.

Kernel weights are alpha parameters in the linear model. It is optimaized by the algorithms. You can find the whole values in result$alpha.

The funtion to estimate density ratio is named compute_density_ratio().

4. Multi Dimensional Data Samples

In the above, the input data samples x and y were one dimensional. densratio() allows to input multidimensional data samples as matrix.