Experimentation is a valuable tool in computer science. Random experimentation is, however, a danger. This post examines current research practices and casts doubt on their benefits. Inspired by the French writer Rousseau (1972)—society corrupts the individual—, my conviction is that the experimental community corrupts the researcher.

The scientific method has evolved throughout the centuries, and philosophers have had a distinguished role in that change by questioning the beliefs used to guide discoveries; by challenging new ways of thinking, sometimes without providing any answers, for the pleasure of asking questions (Russell 1997). This passion and awakened mind are missing in the current computer science community. Empiricists, especially, write large amounts of plain technical reports tracing experiments, oblivious to the beauty of essays, the excitement of sharing outlandish ideas. Going through the literature has often become a mechanical skimming/scanning task, seeking for numbers that highlight those decimals that the proposed techniques outperform their competitors by. These results sustain a sort of research, based on a mere parameter tuning of established algorithms.
The purpose of this viewpoint is to show the disenchantment that would-be researchers—even tenured researchers—suffer and to denounce the proliferation of questionable practices that are killing innovation.
We first review the effect of the modern obsession for publishing and to what extent academic research has distorted experimental science. Next, we see what calls the current methodology into question and why approaches are simply ignored or incomprehensibly revived.

The perversion of the community
Pure research is becoming less attractive nowadays. Many research lines are abandoned since investors are more interested in applications—despite the relevance of fundamental investigation. Hence, the groups that subsist are because either their research is leading in applicative domains or their volume of publications is high. What is behind these numbers? Parnas (2007) makes a strong point about them and guts every single perversion of the community—authorship in pacts, monthly instalments, tailor-made conferences—that have encouraged superficial research made from overly large groups, repetition, insignificant studies, half-baked ideas… Publish or perish has wreaked havoc on daily investigation, at least in the halls of European academia.
Eventually, impact factors, h-index, g-index have become the fallacious indicators of good research/ers and fired up the paper factory. Fresh Ph.D. students are burdened junk writers as soon as they learn that their career will be measured by these statistics. The pressure is intensified by supervisors, assessed by the same yardstick, who need to keep CVs up-to-date or repay colleagues in the favour chain which promotes quantity over substance.
This compulsive publishing has plagued conferences and journals with so many papers that it is getting difficult to track innovative ideas. The more one reads, the more one bumps into similar attempts, déjà vus—which slow down the learning curve and discourage further reading. Showing off the abilities of regular methods to non-technical experts and cherry-picking results from much wider experimentation are the most common schemes. The raison d’être of empiricism has been abused and now entails repeated preliminary results with no further continuation.

Experimental computer science
Experimental computer science, defined as “an apparatus to be measured, a hypothesis to be tested, and systematic analysis of the data (to see whether its supports the hypothesis)” by Denning (1980), is recurrent in machine learning, algorithm development, and software engineering. Nevertheless, experimental methodology has been twisted; instead of sustaining conjectures, experiments are run to provide material to decide them retroactively—to build a posteriori theories.
Machine learning, for instance, is based on trials with performance measures, learners, and data. The combination of these elements made Langley (1988) encourage practitioners to join empirical testing, as a process of theory formation. Competition testing—a term coined by Hooker (1995) in relation to heuristics—has been the subsequent chaos of such a call. Many years later, no new learning paradigm has been introduced, some progress in standards has been made, and micro-tuning of the existing techniques is the trendy research—the latter being the gold mine for publications. Superiority of techniques is claimed usually following a three-step procedure: selection of a few data sets, selection of referenced learners to compare with, and extraction of performance conclusions supported by erroneous statistical tests. With a pessimistic but very realistic description of the scene, Demsar (2008) warned of the misuse of such experimentation. Conventional statistical models are designed to test single learners in isolation; they are ill-suited to perform multiple comparisons.
Hypothesis testing is useful to say whether the probability of the apparent accuracy of a learner is due to chance, but its power goes down as the number of data sets examined increases. Then, it is worth determining what the ideal size of the test set is, what problems have to be involved, and empowering the testing methodology by sufficient data analysis. These—old claims—are things that one expects to be delighted with when reading papers. Yet, they are complicated milestones and many negative results are derived from the studies. Although these are meaningful to lead progress as well, the community does not consider them. This forces researchers to move back to the classical developments. In addition, groundless rejections cause frustration in researchers, which is reflected in their subsequent reviews. In turn, after being taught that going against the mass culture is not profitable, they will unwittingly stop promising ideas, frustrating new generations again.

Gaming the system in lieu of research
In validating incoming contributions, the clout of journals and reviewers, and the inertia of the scientific community as a society have a lot to do.
Current research is like politics—each tendency has its own press. No matter the thoroughness of the content, if the work submitted to a journal is not aligned with the thought of its staff, it will never get the green light. This results in contributions focused on pre-empting reviewers’ opinion than disseminating the work. Demsar (2008) suggests the web-to-peer review. This unlikely idea, which appears to enable critical and fair evaluations of “correctness, interestingness, usefulness, beauty, novelty“, also evidences the urge to adopt other measures of productivity and recognition to end with the fake tenure of rigour and biased opinions. The new peer-review process should give back credibility to publications, and researchers should not be able to game it.
Indeed, references have a crucial role in the shallow statistics above. Everyone knows they provide the information for the productivity computation. Thus, self-citations, citations to friends and the community clique, or citations to particular journals are some of the mechanisms to scale. Citing has lost its sense: guiding the reader to obtain the background necessary to understand the paper.
A reinterpreted experimental science and a deep knowledge of the system have been the mean for academic researchers to satisfy a demanding productivity. Unfortunately, this praxis is learnt by the new generation of researchers who will mistake research for poor scientific journalism/scientific patter. Publications should be the recognition to mature works and should slow down to gain in quality.

This thesis takes a close view of data complexity and its role shaping the behaviour of machine learning techniques in supervised learning and explores the generation of synthetic data sets through complexity estimates. The work has been built upon four principles which have naturally followed one another. (1) A critique about the current methodologies used by the machine learning community to evaluate the performance of new learners unleashes (2) the interest for alternative estimates based on the analysis of data complexity and its study. However, both the early stage of the complexity measures and the limited availability of real-world problems for testing inspire (3) the generation of synthetic problems, which becomes the backbone of this thesis, and (4) the proposal of artificial benchmarks resembling real-world problems.

The ultimate goal of this research flow is, in the long run, to provide practitioners (1) with some guidelines to choose the most suitable learner given a problem and (2) with a collection of benchmarks to either assess the performance of the learners or test their limitations.

DCoL provides the implementation of a set of measures designed to characterize the apparent complexity of data sets for supervised learning, which were originally proposed by Ho and Basu (2002). More specifically, the implemented measures focus on the complexity of the class boundary and estimate (1) the overlaps in the feature values from different classes, (2) the class separability, and (3) the geometry, topology, and density of manifolds. In addition, two other complementary functionalities, (4) stratified k-fold partitioning and (5) routines to transform m-class data sets (m > 2) into m two-class data sets, are included in the library. The source code can be compiled across multiple platforms (Linux, MacOS X, and Ms Windows) and can be easily configured and run from the command line.

Practitioners are encouraged to consider the use of this software in the analysis of their data. A closer reading of data complexity can help them to understand the performance of machine learning techniques and their behavior.

My name is Marc and I come from Andorra is the new Yes, we can in NYU Stern School of Business. Last week, Marc Visent won the election for the block leader and became the first Andorran representative in the history of this university.

It could seem an uninteresting and isolated fact, but this young, maybe not-so-young student could be a candidate to lead my country to a thriving future, so I think that it is worth watching his progress and how well he is doing his job in the adventure that undertook last July.

Graduated in computer engineering and law as well, with a brilliant career of three year in a prestigious law firm, Marc decided to give a new spin to his background and applied for an MBA at Stern. With a creative application Marc was accepted and cherry-picked for the summer start. In just a couple of month, this inborn story teller has captivated the student community and has converted Andorra into his mark, to the extent his classmates acknowledged that to fit into Stern culture one should make sure to know where Andorra is.

Therefore, so far, as an Andorran, I feel proud of him and firmly believe that we need more ambassadors like him. Many congratulations Marc and keep working hard!

Classifier domains of competence: The landscape contest is a research competition aimed at finding out the relation between data complexity and the performance of learners. Comparing your techniques to those of other participants on targeted-complexity problems may contribute to enrich our understanding of the behavior of machine learning techniques and open further research lines.

The contest will take place on August 22, during the 20th International Conference on Pattern Recognition (ICPR 2010) at Istanbul, Turkey.

The landscape contest involves the running and evaluation of classifier systems over synthetic data sets. Over the last two decades, the pattern recognition and machine learning communities have developed many supervised learning techniques. Nevertheless, the competitiveness of such techniques has always been claimed over a small and repetitive set of problems. This contest provides a new and configurable testing framework, reliable enough to test the robustness of each technique and detect its limitations.

INSTRUCTION FOR PARTICIPANTS

Contest participants are allowed to use any type of technique. However, we highly encourage and appreciate the use of novel algorithms.

Participants are required to submit the results by email to the organizers.Submission e-mail: nmacia@salle.url.eduMeet the submission deadline: Wednesday May 26, 2010

The contest is divided into two phases: (1) offline test and (2) live test. For the offline test, participants should run their algorithms over two sets of problems, S1 and S2. However, the real competition, the live test, will take place during the conference. Two more collections of problems, S3 and S4, will be presented.

S1: Collection of data sets spread along the complexity space to train the learner. All the instances will be duly labeled.

S2: Collection of data sets spread along the complexity space with no class labeling to test the learner performance.

S3: Collection of data sets with no class labeling, like S2 to be run for a limited period of time.

S4: Collection of data sets with no class labeling covering specific regions of the complexity space to determine the neighborhood dominance.

For the offline test, the results report consists of:

1. Labeling the data sets of the collection S2.

The procedure is the following:

Train the learner using Dn-trn.arff in S1.

Provide the rate of the correctly classified instances over a 10-fold cross validation.

Label the corresponding data set Dn-tst.arff in S2.

Store the n models generated for each data set to perform the live contest on August 22. Be ready to load them on this day.

2. Describing the techniques used.

A brief summary (1~2 pages) of the machine learning technique/s used in the experiments must be submitted. We expect details such as the learning paradigm, configuration parameters, strength and limitations, and computational cost.

IMPORTANT DATES

* May 26, 2010: Deadline for submission of the results and technical report

Classifier domains of competence: The landscape contest is a research competition aimed at finding out the relation between data complexity and the performance of learners. Comparing your techniques to those of other participants may contribute to enrich our understanding of the behavior of machine learning and open further research lines. Contest participants are allowed to use any type of technique. However, we highly encourage and appreciate the use of novel algorithms.

The contest will take place on August 22, during the 20th International Conference on Pattern Recognition (ICPR 2010) at Istanbul, Turkey.

We are planning to have a day workshop during the ICPR 2010, so that participants will be able to present and discuss their results.

We encourage everyone to participate and share with us your work! For further details about dates and submission, please visit The landscape contest webpage.

How to design the engineer of the future was the object of the workshop EF2009, which was held on November 12 at La Salle, in Barcelona.

The two keynotes given by Ms. Lueny Morell, How can engineering education address the challenges of the 21st century?, and Prof. David E. Goldberg, The missing basics: What engineers don’t know and why they don’t know them, were enlightening and inspiring, and an excellent starting point for professors, students, and engineers eager to take part in a new era for engineering. Both told their beautiful and encouraging story about change. Stories that took place in very different lands and under different circumstances, but with a common aim, implementing approaches to train the new generation of engineers, and with a common villain, the education Establishment.

A passionate Lueny Morell made an impressive show and captivated her audience. According to her experience, the change has to come from professors. Their role is to take care of students, teach them how to face problems, stop punishing failure and help them to succeed by using innovative techniques.

Creativity, flexibility, communication, and leadership were mentioned many times along the workshop as the characteristics of the engineer of the future. This set of soft skills should be present in the engineering education program to provide the student with the capacity of questioning, labeling, modeling, decomposing, measuring, ideating, and communicating, skills that have been coined by Prof. Goldberg as “the missing basics”.

Nowadays, current engineers rush to MBAs to acquire these non-technical skills. Taking this kind of courses, however, is not a guarantee to be able to master them. In fact, in the panel debates, the most inspiring ideas came up from “the normal guy with no MBA”, Mr. Miguel Vidal. With fresh and creative comments, he pointed out demotivation as a recurrent trait of the present engineers, feeding the Prof. Goldberg’s proposal for, in his bottom-up change, recovering “the joy of engineering”.

Unfortunately, time run out and capacity of synthesis was not in the agenda. Nobody gave insight into how to teach and assess these skills, and questions such as how to awake these abilities in students and whether or not engineers of the past will be able to build the engineer of the future remained unanswered.