Estimating the relevance or proximity between vertices in a network is a fundamental building block of network analysis and is useful in a wide range of important applications such as network-aware searches and network structure prediction. In this paper, we (1) propose to use top-k shortest-path distance as a relevance measure, and (2) design an efficient indexing scheme for answering top-k distance queries. Although many indexing methods have been developed for standard (top-1) distance queries, no methods can be directly applied to top-k distance. Therefore, we develop a new framework for top-k distance queries based on 2-hop cover and then present an efficient indexing algorithm based on the recently proposed pruned landmark labeling scheme. The scalability, efficiency and robustness of our method are demonstrated in extensive experimental results. It can construct indices from large graphs comprising millions of vertices and tens of millions of edges within a reasonable running time. Having obtained the indices, we can compute the top-k distances within a few microseconds, six orders of magnitude faster than existing methods, which require a few seconds to compute these distances. Moreover, we demonstrate the usefulness of top-k distance as a relevance measure by applying them to link prediction, the most fundamental problem in graph data mining. We emphasize that the proposed indexing method enables the first use of top-k distance for such tasks.

In this paper, we propose a verification methodology of large-scale legal knowledge. With a revision of legal code, we are forced to revise also other affected code to keep the consistency of law. Thus, our task is to revise the affected area properly and to investigate its adequacy. In this study, we extend the notion of inconsistency besides of the ordinary logical inconsistency, to include the conceptual conflicts. We obtain these conflictions from taxonomy data, and thus, we can avoid tedious manual declarations of opponent words. In the verification process, we adopt extended disjunctive logic programming (EDLP) to tolerate multiple consequences for a given set of antecedents. In addition, we employ abductive logic programming (ALP) regarding the situations to which the rules are applied as premises. Also, we restrict a legal knowledge-base to acyclic program to avoid the circulation of definitions, to justify the relevance of verdicts. Therefore, detecting cyclic parts of legal knowledge would be one of our objectives. The system is composed of two subsystems; we implement the preprocessor in Ruby to facilitate string manipulation, and the verifier in Prolog to exert the logical inference. Also, we employ XML format in the system to retain readability. In this study, we verify actual code of ordinances of Toyama prefecture, and show the experimental results.

The Web technology enables numerous people to collaborate in creation. We designate it as massively collaborative creation via the Web. As an example of massively collaborative creation, we particularly examine video development on Nico Nico Douga, which is a video sharing website that is popular in Japan. We specifically examine videos on Hatsune Miku, a version of a singing synthesizer application software that has inspired not only song creation but also songwriting, illustration, and video editing. As described herein, creators of interact to create new contents through their social network. In this paper, we analyzed the process of developing thousands of videos based on creators' social networks and investigate relationships among creation activity and social networks. The social network reveals interesting features. Creators generate large and sparse social networks including some centralized communities, and such centralized community's members shared special tags. Different categories of creators have different roles in evolving the network, e.g., songwriters gather more links than other categories, implying that they are triggers to network evolution.

It is not easy to test software used in studies of machine learning with statistical frameworks. In particular, software for randomized algorithms such as Monte Carlo methods compromises testing process. Combined with underestimation of the importance of software testing in academic fields, many software programs without appropriate validation are being used and causing problems. In this article, we discuss the importance of writing test codes for software used in research, and present a practical way for testing, focusing on programs using Monte Carlo methods.

This paper proposes the following methods to search VOCALOID creators who publish music videos in Niconico video hosting service. For VOCALOID creator search, the user can utilize three clues: VOCALOID character name, music genre, and impressions. We defined the music genre by extending generic digital music genre with considering social tags annotated on VOCALOID music videos. We also implemented SVM-based music impression estimator utilizing viewer comments being over 0.8 points in F-values. We compared the proposal with three comparison methods in 12 search tasks and clarified the effectiveness of music genres and impressions.

Nowadays, anybody can easily express their opinion publicly through Consumer Generated Media. Because of this, a phenomenon of flooding criticism on the Internet, called flaming, frequently occurs. Although there are strong demands for flaming management, a service to reduce damage caused by a flaming after one occurs, it is very difficult to properly do so in practice. We are trying to keep the flaming from happening. It is necessary to identify the situation and the remark which are likely to cause flaming for our goal. Concretely, we propose methods to identify a potential tweet which will be a likely candidate of a flaming on Twitter, considering public opinion among Twitter users. Among three categories of flamings, our main focus is Struggles between Conflicting Values (SBCV), which is defined as a remark that forces one's own opinion about a topic on others. Forecasting of this type of flamings is potentially desired since most of its victims are celebrities, who need to care one's own social images. We proceed with a working hypothesis: a SBCV is caused by a gap between the polarity of the remark and that of public opinion. First, we have visualized the process how a remark gets flamed when its content is far from public opinion, by means of our original parameter daily polarity (dp). Second, we have built a highly accurate flaming prediction model with decision tree learning, using cumulative dp as an attribute along with parameters available from Twitter APIs. The experimental result suggests that the hypothesis is correct.

When the presence and the action of an android reach to those of human, andoroid can derive multi-modal action from human. How can human parties act with the android to organize the interaction and find the android as the social actor? We observed the development process of the play ``Three Sisters, Android Version'', and analyzed the multi-modal interaction between the android and human players in the process. As the result, the actors express the assessment of human likeness of the android with their utterances and body movements, and the border between human and machine was expressed with each modality in different way. Moreover, these expressions are not one-way product by the writer and director, but the product of repeated interactions between the actors and the android through the practice and rehearsals. Finally we discuss the possibility of ``media equation'' study using the direct observations of man-machine interaction.

In this study, we developed a new method of the long-term market analysis by using text-mining of news articles. Using our method, we conducted extrapolation tests to predict stock price averages by 19 industry and two market averages, TOPIX and Nikkei225 for about 10 years. As a result, 8 sectors in 21 sectors (about 40%) showed over about 60% accuracy, and 15 sectors in 21 sectors (over 70%) showed over about 55% accuracy. We also developed a web system of financial text-mining based on our method for financial professionals.

This paper focuses on a classification problem for volatile time series. One of the most popular approaches for time series classification is dynamic time warping and feature-based machine learning architectures. In many previous studies, these algorithms have performed satisfactorily on various datasets. However, most of these methods are not suitable for chaotic time series because the superficial changes in measured values are not essential for chaotic time series. In general, most time series datasets include both chaotic and non-chaotic time series; thus, it is necessary to extract the more essential features of a time series. In this paper, we propose a new approach for volatile time series classification. Our approach generates a novel feature by extracting the structure of the attractor using topological data analysis to represent the transition rules of the time series. As this feature represents the essential property of systems of the time series, our approach is effective for both chaotic and non-chaotic types. We applied a learning architecture inspired by a convolutional neural network to this feature and found that the proposed approach improves performance in a human activity recognition problem by 18.5% compared with conventional approaches.

Pokémon is one of the most famous video games, which has more than 3.4 million players around the world. The interesting part of this game is to guess invisible information and the character of the opponent. However, existing Non Player Character (NPC) of this game is not a good alternative opponent to a human player because the NPC does not have variety of characteristics. In this paper, we propose a novel method to represent reflection - impulsivity characteristics of NPC by differences of the first stage prior distribution in Bayesian estimation used for decision-making of the NPC. In the experiment, we ask human players to take on three types of the proposed NPC and to answer the impression of those NPCs. As the result, the players feel different impressions from the three types of NPCs although they cannot identify the three types of the character (reflection - intermediate - impulsivity).

The purpose of this paper is to test the Spiral of Silence theory in Internet society. Even today Noelle-Neumann's Spiral of Silence Theory is an important topic on the formation of public opinion. In the Spiral of Silence Theory up to now, the willingness to speak out has been handled as a dependent variable. However, there is significant bias in the question as to what extent the willingness to speak out actually influences the number of times a person speaks out. In addition, snowball sampling has been used, even in regard to the distribution of opinions of persons close to an individual. Accuracy increases because the attitudes of direct close users can be studied; however, only a small portion of close users can be studied. One defect of this approach is that it is actually quite costly. We use as a dependent variable the actual number of `tweets' on Twitter rather than willingness to speak out. In addition, for the attitude of close users, we used machine learning to estimate the attitudes of persons the users came in contact with, and we quantified homogeneity. We used and combined social investigations and behavior log analysis. With these, we were able to adopt simultaneously the following to a model: 1) individuals' internal situations, which can only be clarified by a questionnaire. 2) the actual quantity of behavior and the structure of communication networks, which can only be clarified through analysis of behavior logs. In the result, we found that a person's perception that their opinion in the majority and estimated homogeneity had a positive effect on the number of times a person spoke out. Our results suggest that the spiral of silence in regard to actual speaking out on Twitter.

At the time of the Great East Japan Earthquake, many Tweets of the disaster had been posted and Twitter had been effectively-utilized as an infrastructure for sharing disaster information and confirming safety. However in Twitter, there have been various kinds of information and also the volume is extremely huge, so a technology to effectively obtain the information on disaster or to filter users depending on their purpose of use are considered essential in order for Twitter to be effective at the time of disaster. Especially some kind of filtering mechanism to easily catch real humans' voices is assumed to be important for getting better performance out of Twitter at the time of disaster. The aim of this study is to numerically-express the characteristics of Twitter users by using the concept of entropy in response to each user's tweeting, replying, and retweeting activities, which are assumed to be the source of Twitter's real time feature, to show the details of Twitter users activities at the time of disaster, and to verify the possibility of this method for automatic user filtering. The real Twitter data distributed around the time of the earthquake is used to analyze, and especially in this paper, the difference of user attributes mainly between bot, cyborg and human is examined by using this data. From the experimental results, the characteristics of Twitter users were clarified with multidimensional quantitative values. The experimental results also showed the possibility for automatic user filtering.

The home locations of Twitter users can be estimated using a social network, which is generated by various relationships between users. There are many network-based location estimation methods with user relationships. However, the estimation accuracy of various methods and relationships is unclear. In this study, we estimate the users’home locations using four network-based location estimation methods on four types of social networks in Japan. We have obtained two results. (1) In the location estimation methods, the method that selects the most frequent location among the friends of the user shows the highest precision and recall. (2) In the four types of social networks, the relationship of follower has the highest precision and recall.

We suggest as an important tool in psychotherapy the use of onomatopoeia. Mood disorder and Anxiety disorder are among the most prevalent mental disorders, and Behavior therapy (BT) is an evidence-based psychological treatment suitable for these cases. Interoceptive sensation is important in BT, because it serves as a barometer for responses. On the other hand, standard assessment methods such as subjects unit of disturbance scale (SUDs) is not optimal. In a different approach, we feel a certain form of it, e.g. Doki-Doki, at the same time when feeling emotion. However, the SUDs is assessed without taking somesthesis into consideration. In addition, BT requires information on somesthesis in order to optimally perform the therapy. Here we propose a solution to this problem, based on using onomatopoeia for SUDs. It can assess appropriately the interoceptive sensations by which a patient is accompanied in anxiety. We report two clinical cases using onomatopoeia for SUDs. This makes for an improved therapy. The internal sense appears during the course of the disease. A treatment is thus provided which is not tied to a diagnosis name, but rather by emphasizing the ``internal sense,'' which is more effective in producing an improvement towards curing.

Cooperative behaviors are common in humans and are fundamental to our society. Theoretical and experimental studies have modeled environments in which the behaviors of humans, or agents, have been restricted to analyze their social behavior. However, it is important that such studies are generalized to less restrictive environments to understand human society. Social network games (SNGs) provide a particularly powerful tool for the quantitative study of human behavior. In SNGs, numerous players can behave more freely than in the environments used in previous studies; moreover, their relationships include apparent conflicts of interest and every action can be recorded. We focused on reciprocal altruism, one of the mechanisms that generate cooperative behavior. This study aims to investigate cooperative behavior based on reciprocal altruism in a less restrictive environment. For this purpose, we analyzed the social behavior underlying such cooperative behavior in an SNG. We focused on a game scenario in which the relationship between the players was similar to that in the Leader game. We defined cooperative behaviors by constructing a payoff matrix in the scenario. The results showed that players maintained cooperative behavior based on reciprocal altruism, and cooperators received more advantages than noncooperators. We found that players constructed reciprocal relationships based on two types of interactions, cooperative behavior and unproductive communication.

Japanese “onomatopoeic” words (also called mimetics and ideophones) are more frequent in spoken discourse, especially in informal daily conversations, than in writing. It is a common belief that onomatopoeia is particularly frequent in some areas, such as the Kinki region. To examine the plausibility of this folk dialectology, we investigated the frequency of onomatopoeia in the Minutes of the Diet as a corpus of spoken Japanese. We examined whether there is really a difference in the use of onomatopoeia among the eleven major regions of Japan. We analyzed the conversation data (limited to the last two decades) according to the hometowns of the speakers. The results revealed that there is no cross-regional difference in the overall frequency of onomatopoeia and non-onomatopoeic adverbs. However, a particular morphological type of onomatopoeia?i.e., “emphatic” onomatopoeia, such as hakkiri ‘clearly’?did show a regional variation in frequency. The results suggest that different types of onomatopoeia have different functions. The present study introduced a “macro-viewpoint” method that is based on a large-scale database. Further investigations into the functional aspect of onomatopoeia will also benefit from a dialectological method that adopts a “micro-viewpoint” on the detailed descriptions of a small number of speakers from each region. We hope that the present quantitative approach to the sociolinguistics of onomatopoeia will offer a new perspective on dialectology and on the effective utilization of onomatopoeia in the field of information science.

Topic models are generative models of documents, automatically clustering frequently co-occurring words (topics) from corpora. Topics can be used as stable features that represent the substances of documents, so that topic models have been extensively studied as technology for extracting latent information behind large data. Unfortunately, the typical time complexity of topic model computation is the product of the data size and the number of topics, therefore the traditional Markov chain Monte Carlo (MCMC) method cannot estimate many topics on large corpora within a realistic time. The data size is a common concern in Bayesian learning and there are general approaches to avoid it, such as variational Bayes and stochastic gradient MCMC. On the other hand, the number of topics is a specific problem to topic models and most solutions are proposed to the traditional Gibbs sampler. However, it is natural to solve these problems at once, because as the data size grows, so does the number of topics in corpora. Accordingly, we propose new methods coping with both data and topic scalability, by using fast computing techniques of the Gibbs sampler on stochastic gradient MCMC. Our experiments demonstrate that the proposed method outperforms the state-of-the-art of traditional MCMC in mini-batch setting, showing a better mixing rate and faster updating.

We describe a procedure for constructing a website for publishing open data by focusing on the case of Open DATA METI, a website of the Ministry of Economy, Trade, and Industry. We developed two sites for publishing open data: a data catalog site and one for searching linked open data (LOD). The former allows users to find relevant data they want to use, and the latter allows them to utilize the found data by connecting them. To implement the data catalog site, we constructed a site tailored to the needs of the organization. Then we extracted a large amount of metadata from the individual open data and put it on the site. These activities would have taken a lot of time if we had used the existing methods, so we devised our own solutions for them. To implement the LOD searching site, we converted the data into LOD form in the Resource Description Framework (RDF). We focused on converting statistical data into tables, which are widely used. Regarding the conversion, there were several kinds of missing information that we needed to associate with the data in the tables. We created a template for incorporating the necessary information for LOD in the original table. The conversion into LOD was automatically done using the template.

Humans develop their concept of an object by classifying it into a category, and acquire language by interacting with others at the same time. Thus, the meaning of a word can be learnt by connecting the recognized word and concept. We consider such an ability to be important in allowing robots to flexibly develop their knowledge of language and concepts. Accordingly, we propose a method that enables robots to acquire such knowledge. The object concept is formed by classifying multimodal information acquired from objects, and the language model is acquired from human speech describing object features. We propose a stochastic model of language and concepts, and knowledge is learnt by estimating the model parameters. The important point is that language and concepts are interdependent. There is a high probability that the same words will be uttered to objects in the same category. Similarly, objects to which the same words are uttered are highly likely to have the same features. Using this relation, the accuracy of both speech recognition and object classification can be improved by the proposed method. However, it is difficult to directly estimate the parameters of the proposed model, because there are many parameters that are required. Therefore, we approximate the proposed model, and estimate its parameters using a nested Pitman--Yor language model and multimodal latent Dirichlet allocation to acquire the language and concept, respectively. The experimental results show that the accuracy of speech recognition and object classification is improved by the proposed method.

Using emotional expressions in a conversation is an efficient way to convey one’s thoughts. Emotional expressions of the persuader have a strong impact to the recipient’s attitude in a negotiation. Studies for a persuasive dialog system, which tries to lead users to the system’s specific goals, show that incorporating users’ emotional factors can enhance the system to persuade users. However, in a human-human negotiation, the persuader can have better outcomes not only through considering the emotion of the other person but also through expressing his or her own emotions. In this paper, we propose an example-based persuasive dialog system with expressive emotion capability. The proposed dialog system is trained by newly collected corpus with statistical learning. Emotional states and the user’s acceptance rate of the persuasion are annotated. Experimental results through crowdsourcing suggested that the system using emotional expressions has a potential to persuade some users who prefer to be used emotional expressions, effectively.