Pages

Sunday, November 30, 2014

Do you use simulation as a research tool? If you write papers with results based on simulation and submit them for peer-review, then be warned: if I should review your paper then I will probably recommend it is rejected. Why? Because all of the many simulation-based papers I've reviewed in the last couple of years have been flawed. These papers invariably fall into the pattern: propose new/improved/extended algorithm X; test X in simulation S and provide test results T; on the basis of T declare X to work; the end.

So, what exactly is wrong with these papers? Here are my most common review questions and criticisms.

Which simulation tool did you use? Was it a well-known robot simulator, like Webots or Player-Stage-Gazebo, or a custom written simulation..? It's amazing how many papers describe X, then simply write "We have tested X in simulation, and the results are..."

If your simulation was custom built, how did you validate the correctness of your simulator? Without such validation how can you have any confidence in the the results you describe in your paper? Even if you didn't carry out any validation, please give us a clue about your simulator; is it for instance sensor-based (i.e. models specific robot sensors, like infra-red collision sensors, or cameras)? Does it model physics in 3D (i.e. dynamics), or 2D kinematics?

You must specify the robots that you are using to test your algorithm X. Are they particular real-world robots, like e-pucks or the NAO, or are they an abstraction of a robot, i.e. an idealised robot? If the latter describe that idealised robot: does it have a body with sensors and actuators, or is your idealised robot just a point moving in space? How does it interact with other robots and its environment?

How is your robot modelled in the simulator? If you're using a well-know simulator and one if its pre-defined library robots then this is an easy question to answer. But for a custom designed simulator or an idealised robot it is very important to explain how your robot is modelled. Equally important is how your robot model is controlled, since the algorithm X you are testing is - presumably - instantiated or coded within the controller. It's surprising how many papers leave this to the reader's imagination.

In your results section you must provide some analysis of how the limitations of the simulator, the simulated environment and the modelled robot, are likely to have affected your results. It is very important that your interpretation of your results, and any conclusions you draw about algorithm X, explicitly take account of these limitations. All robot simulators, no matter how well proven and well debugged, are simplified models of real robots and real environments. The so-called reality gap is especially problematical if you are evolving robots in simulation, but even if you are not, you cannot confidently interpret your results without understanding the reality gap.

If you are using an existing simulator then specify exactly which version of the simulator you used, and provide somewhere - a link perhaps to a github project - your robot model and controller code. If your simulator is custom built then you need to provide access to all of your code. Without this your work is unrepeatable and therefore of very limited value.

Ok. At this point I should confess that I've made most of these mistakes in my own papers. In fact one of my most cited papers was based on a simple custom built simulation model with little or no explanation of how I validated the simulation. But that was 15 years ago, and what was acceptable then is not ok now.

Modern simulation tools are powerful but also dangerous. Dangerous because it is too easy to assume that they are telling us the truth. Especially beguiling is the renderer, which provides an animated visualisation of the simulated world and the robots in it. Often the renderer provides all kinds of fancy effects borrowed from video games, like shadows, lighting and reflections, which all serve to strengthen the illusion that what we are seeing is real. I puzzle and disappoint my students because, when they proudly show me their work, I insist that they turn off the renderer. I don't want to see a (cool) animation of simulated robots, instead I want to see (dull) graphs or other numerical data showing how the improved algorithm is being tested and validated, in simulation.

An engineering simulation is a scientific instrument* and, like any scientific instrument, it must be (i) fit for purpose, (ii) setup and calibrated for the task in hand, and (iii) understood - especially its limitations - so that any results obtained using it are carefully interpreted and qualified in the light of those limitations.

In my view we should all be practising level 0 open science - but don't underestimate the challenge of even this minimal set of practices; making data sets and source code, etc, available, with the aim of enabling our work to be reproducible, is not straightforward.

Level 0 open science is all one way, from your lab to the world. Level 1 introduces public engagement via blogging and social media, and the potential for feedback and two-way dialogue. Again this is challenging, both because of the time cost and the scary - if you're not used to it - prospect of inviting all kinds of questions and comments about your work. In my experience the effort is totally worthwhile - those questions often make me really think, and in ways that questions from other researchers working in the same field do not.

Level 2 builds on levels 0 and 1 by adding open notebook science. This takes real courage because it opens up the process, complete with all the failures as well as successes, the bad ideas as well as the good; open notebook science exposes science for what it really is - a messy non-linear process full of uncertainty and doubts, with lots of blind alleys and very human dramas within the team. Have I done open notebook science? No. I've considered it for recent projects, but ruled it out because we didn't have the time and resources or, if I'm honest, team members who were fully persuaded that it was a good idea.

Open science comes at a cost. It slows down projects. But I think that is a good, even necessary, thing. We should be building those costs into our project budgets and work programmes, and if that means increasing the budget by 25% then so be it. After all, what is the alternative? Closed science..? Closed science is irresponsible science.