Commentaires 0

Retranscription du document

A Cognitive Vision Systemfor Space RoboticsFaisal Z.Qureshi1,Demetri Terzopoulos1,2,and Piotr Jasiobedzki31Dept.of Computer Science,University of Toronto,Toronto,ON M5S 3GA,Canadafaisal,dt@cs.toronto.edu2Courant Institute,New York University,New York,NY 10003,USAdt@nyu.edu3MDRobotics Limited,Brampton,ON L6S 4J3,Canadapjasiobe@mdrobotics.caAbstract.We present a cognitively-controlled vision systemthat combines low-level object recognition and tracking with high-level symbolic reasoning withthe practical purpose of solving difcult space robotics problemssatellite ren-dezvous and docking.The reasoning module,which encodes a model of the en-vironment,performs deliberation to 1) guide the vision systemin a task-directedmanner,2) activate vision modules depending on the progress of the task,3) vali-date the performance of the vision system,and 4) suggest corrections to the visionsystem when the latter is performing poorly.Reasoning and related elements,among them intention,context,and memory,contribute to improve the perfor-mance (i.e.,robustness,reliability,and usability).We demonstrate the vision sys-tem controlling a robotic arm that autonomously captures a free-ying satellite.Currently such operations are performed either manually or by constructing de-tailed control scripts.The manual approach is costly and exposes the astronautsto danger,while the scripted approach is tedious and error-prone.Therefore,thereis substantial interest in performing these operations autonomously,and the workpresented here is a step in this direction.To the best of our knowledge,this is theonly satellite-capturing system that relies exclusively on vision to estimate thepose of the satellite and can deal with an uncooperative satellite.1 IntroductionSince the earliest days of the eld,computer vision researchers have struggled with thechallenge of effectively combining low-level vision with classical articial intelligence.Some of the earliest work involved the combination of image analysis and symbolic AIto construct autonomous robots [1,2].These attempts met with limited success becausethe vision problem was hard,and the focus of vision research shifted from vertically-integrated,embodied vision systems to low-level,stand-alone vision systems.Currentlyavailable low- and medium-level vision systems are sufciently competent to supportsubsequent levels of processing.Consequently,there is now a renewed interest in high-level,or cognitive vision,which is necessary if we are to realize autonomous robotscapable of performing useful work.In this paper,we present an embodied,task-orientedvision system that combines object recognition and tracking with high-level symbolicreasoning.The latter encodes a symbolic model of the environment and uses the modelto guide the vision systemin a task-directed manner.We demonstrate the system guiding a robotic manipulator during a satellite servic-ing operation involving rendezvous and docking with a mockup satellite under lightingconditions similar to those in orbit.On-orbit satellite servicing is the task of maintain-ing and repairing a satellite in its orbit.It extends the operational life of the satellite,mitigates technical risks,and reduces on-orbit losses,so it is of particular interest tomultiple stakeholders,including satellite operators,manufacturers,and insurance com-panies.Currently,on-orbit satellite servicing operations are carried out manually;i.e.,by an astronaut.However,manned missions usually have a high price tag and there arehuman safety concerns.Unmanned,tele-operated,ground-controlled missions are in-feasible due to communications delays,intermittence,and limited bandwidth betweenthe ground and the servicer.A viable option is to develop the capability of autonomoussatellite rendezvous and docking (AR&D).Most national and international space agen-cies realize the important future role of AR&D and have technology programs to de-velop this capability [3,4].Autonomy entails that the on-board controller be capable of estimating and trackingthe pose (position and orientation) of the target satellite and guiding the servicing space-craft as it 1) approaches the satellite,2) manoeuvres itself to get into docking position,and 3) docks with the satellite.Our vision systemmeets these challenges by controllingthe visual process and reasoning about the events that occur in orbitthese abilitiesfall under the domain of cognitive vision. Our system functions as follows:(Step 1)captured images are processed to estimate the current position and orientation of thesatellite (Fig.1),(Step 2) behavior-based perception and memory units use contextualinformation to construct a symbolic description of the scene,(Step 3) the cognitivemodule uses knowledge about scene dynamics encoded using situation calculus to con-struct a scene interpretation,and nally (Step 4) the cognitive module formulates a planto achieve the current goal.The scene interpretation constructed in Step 3 provides amechanism to verify the ndings of the vision system.The ability to plan allows thesystemto handle unforeseen situations.To our knowledge,the system described here is unique inasmuch as it is the onlyAR&Dsystemthat uses vision as its primary sensor and that can deal with an uncooper-ative target satellite.Other AR&Dsystems either deal with cooperative target satellites,where the satellite itself communicates with the servicer craft about its heading andFig.1.Images observed during satellite capture.The left and center images were captured usingthe shuttle bay cameras.The right image was captured by the end-effector camera.The centerimage shows the arm in hovering position prior to the nal capture phase.The shuttle crew usethese images during satellite rendezvous and capture to locate the satellite at a distance of ap-proximately 100m,to approach it,and to capture it with the Canadarmthe shuttle manipulator.pose,or use other sensing aids,such as radars and geostationary position satellite sys-tems [5].1.1 Related WorkThe state of the art in space robotics is the Mars Exploration Rover,Spirit,that is nowvisiting Mars [6].Spirit is primarily a tele-operated robot that is capable of taking pic-tures,driving,and operating instruments in response to commands transmitted fromtheground.It lacks any cognitive or reasoning abilities.The most successful autonomousrobot to date that has cognitive abilities is Minerva, which takes visitors on toursthrough the Smithsonian's National Museum of American History;however,vision isnot Minerva's primary sensor [7].Minerva has a host of other sensors at its disposalincluding laser range nders and sonars.Such sensors are undesirable for space opera-tions,which have severe weight/energy limitations.A survey of work about constructing high-level descriptions from video can byfound in [8].Knowledge modeling for the purposes of scene interpretation can either behand-crafted [9] or automatic [10] (as in machine learning).The second approach is notfeasible for our application:It requires a large training set,which is difcult to gatherin our domain,in order to ensure that the systemlearns all the relevant knowledge,andit is not always clear what the systemhas learnt.Scene descriptions constructed in [11]are richer than those in our system,and their construction approach is more sound;however,they do not use scene descriptions to control the visual process and formulateplans to achieve goals.In the next section,we explain the object recognition and tracking module.Sec-tion 3 describes the high-level vision module.Section 4 describes the physical setupand presents results.Section 5 presents our conclusions.2 Object Recognition and TrackingThe object recognition and tracking module [12] processes images from a calibratedpassive video camera-pair mounted on the end-effector of the robotic manipulator andcomputes an estimate of the relative position and orientation of the target satellite.ItSparse 3DComputationAcquisitionTrackingTarget PoseEstimation &TrackingMonitoring3D Model (Satellite)3D Model (Target)3D DataTargetDetectionVisionServerUser InterfaceData LogConfigurationControlid, 3D pose, 3D motion, confidenceServicerController3D location, 3D motion3D motion3D pose3D pose3D motion3D poseidFig.2.Object recognition and tracking system.supports medium and short range satellite proximity operations;i.e.,approximatelyfrom20mto 0.2m.During the mediumrange operation,the vision systemcameras vieweither the com-plete satellite or a signicant portion of it (image 1 in Fig.3),and the system relies onnatural features observed in stereo images to estimate the motion and pose of the satel-lite.The mediumrange operation consists of the following three phases: In the rst phase (model-free motion estimation),the vision systemcombines stereoand structure-from-motion to indirectly estimate the satellite motion in the camerareference frame by solving for the camera motion,which is just the opposite of thesatellite motion [13]. The second phase (motion-based pose acquisition) performs binary template match-ing to estimate the pose of the satellite without using prior information [14].Itmatches a model of the observed satellite with the 3D data produced by the lastphase and computes a rigid transformation,generally comprising 3 translations and3 rotations,that represent the relative pose of the satellite.The six degrees of free-dom (DOFs) of the pose are solved in two steps.The rst step,which is motivatedby the observation that most satellites have an elongated structure,determines themajor axis of the satellite,and the second step solves the four unresolved DOFsthe rotation around the major axis and the three translationsby an exhaustive 3Dtemplate matching over the remaining four DOFs. The last phase (model-based pose tracking) tracks the satellite with high precisionand update rate by iteratively matching the 3D data with the model using a versionof the iterative closest point algorithm[15].This scheme does not match high-levelfeatures in the scene with the model at every iteration.This reduces its sensitiv-ity to partial shadows,occlusion,and local loss of data caused by reections andimage saturation.Under normal operative conditions,model based tracking returnsan estimate of the satellite's pose at 2Hz with an accuracy on the order of a fewcentimeters and a few degrees.At close range,the target satellite is only partially visible and it can not be viewedsimultaneously from both cameras (the second and third images in Fig.3);hence,thevision system processes monocular images.The constraints on the approach trajectoryFig.3.Images froma sequence recorded during an experiment (rst image at 5m;third at 0.2m)ensure that the docking interface on the target satellite is visible from close range,somarkers on the docking interface are used to determine the pose and attitude of thesatellite efciently and reliably at close range [12].Here,visual features are detected byprocessing an image window centered around their predicted locations.These featuresare then matched against a model to estimate the pose of the satellite.The pose esti-mation algorithm requires at least 4 points to compute the pose.When more than fourpoints are visible,sampling techniques choose the group of points that gives the bestpose information.For the short range vision module,the accuracy is on the order of afraction of a degree and 1mmright before docking.The vision system can be congured on the y depending upon the requirementsof a specic mission.It provides commands to activate/initialize/deactivate a particularconguration.The vision systemreturns a 4x4 matrix that species the relative pose ofthe satellite,a value between 0 and 1 quantifying the condence in that estimate,andvarious ags that describe the state of the vision system.3 Cognitive Vision ControllerThe cognitive vision controller controls the image recognition and tracking module bytaking into account several factors,including 1) the current task,2) the current state ofthe environment,3) the advice fromthe symbolic reasoning module,and 4) the charac-teristics of the vision module,including processing times,operational ranges,and noise.It consists of a behavior-based,reactive perception and memory unit and a high-leveldeliberative unit.The behavior-based unit acts as an interface between the detailed,con-tinuous world of the vision system and the abstract,discrete world representation usedby the cognitive controller.This design facilitates a vision controller whose decisionsreect both short-termand long-termconsiderations.3.1 Perception and Memory:Symbolic Scene DescriptionThe perception and memory unit performs many critical functions.First,it providestight feedback loops between sensing and action that are required for reexive behavior,such as closing the cameras'shutters when detecting strong glare in order to preventharm.Second,it corroborates the readings from the vision system by matching themagainst the internal world model.Third,it maintains an abstracted world state (AWS)that represents the world at a symbolic level and is used by the deliberative module.Fourth,it resolves the issues of perception delays by projecting the internal world modelSensorFusion&SignalSmoothingMatchWorkingMemoryProjectionUpdate memory or flagdangerEgomotionPassage of timeAbstractedWorldStateSatellite DistanceCaptured1.5m 5m.5mMediumNearCloseFarSatellite Pose Confidence0 1CaptureMonitorGoodBadBad Good0.80.67ActiveBehavior(a) (b)Fig.4.(a) Behavior-based perception and memory unit.(b) The abstracted world state repre-sents the world symbolically.For example,the satellite is either Captured,Close,Near,Medium,or Far.The conversion from numerical quantities in the memory center to the symbols in theabstracted world state takes into account the current situation.For example,translation from nu-merical value of satellite pose condence to the symbolic value Good or Bad depends upon theactive behaviorfor behavior Monitor,satellite position condence is Good when it is greaterthan 0.67;whereas for behavior Capture satellite position condence is Good only when it isgreater than 0.8.at this instant.Fifth,it performs sensor fusion to combine information frommultiplesensors;e.g.,when the vision system returns multiple estimates of the satellite's pose.Finally,it ensures that the internal mental state reects the effects of egomotion and thepassage of time.At each instant,the perception unit receives the most current information from theactive vision congurations (Fig.2) and computes an estimate of the satellite positionand orientation.In doing so,it takes into account contextual information,such as thecurrent task,the predicted distance from the satellite,the operational ranges of variousvision congurations,and the condence values returned by the active congurations.An αβ tracker then validates and smoothes the computed pose.Validation is done bycomparing the new pose against the predicted pose using an adaptive threshold.The servicer craft sees its environment egocentrically.The memory center con-stantly updates the internal world representation to reect the current position,head-ing,and speed of the robot.It also ensures that in the absence of new readings fromthe perception center the condence in the world state should decrease with time.Thereactive module requires detailed sensory information,whereas the deliberative moduledeals with abstract features about the world.The memory center lters out unnecessarydetails from the sensory information and generates the AWS (Fig.4) which describesthe world symbolically.3.2 Symbolic Reasoning:Planning and Scene InterpretationThe symbolic reasoning module constructs plans 1) to accomplish goals and 2) to ex-plain the changes in the AWS.The plan that best explains the evolution of the AWS isan interpretation of the scene,as it consists of events that might have happened to bringabout the changes in the AWS.The cognitive vision systemmonitors the progress of thecurrent task by examining the AWS,which is maintained in real-time by the perceptionand memory module.Upon encountering an undesirable situation,the reasoning mod-ule tries to explain the errors by constructing an interpretation.If the reasoning modulesuccessfully nds a suitable interpretation,it suggests appropriate corrective steps;oth-erwise,it suggests the default procedure for handling anomalous situations.The current prototype consists of two planners:Planner A specializes in the satel-lite capturing task and Planner B monitors the abstracted world state and detects andresolves undesirable situations.We have developed the planners in GOLOG,which isan extension of the situation calculus [16].GOLOGuses logical statements to maintainan internal world state (uents) and describe what actions an agent can perform(primi-tive action predicates),when these actions are valid (precondition predicates),and howthese actions affect the world (successor state predicates).GOLOG provides high-levelconstructs,such as procedure calls,conditionals,loops,and non-deterministic choice,to specify complex procedures that model an agent and its environment.The logicalfoundations of GOLOG enable us to prove plan correctness properties,which is desir-able.aTurnon(_)aLatch(_)aErrorHandle(_)aSensor(_,_)aSearch(_)aMonitoraAlignaContactaGo(_,_,_)aSatAttCtrl(_)aCorrectSatSpeedfStatusfLatchfSensorfErrorfSatPosfSatPosConffSatCenterfSatAlignfSatSpeedfSatAttCtrlfSatContactaBadCameraaSelfShadowaGlareaSun(_)aRange(_)fSatPosConffSunfRangeInitial State:fRange(unknown),fSun(unknown),fSatPosConf(yes)Goal State:fSatConf(no)Initial State:fStatus(off), fLatch(unarmed), fSensor(all,off),fSatPos(medium), fSatPosConf(no), fSatCenter(no), fAlign(no),fSatAttCtrl(on), fSatContact(no), fSatSpeed(yes), fError(no)Goal State:fSatContact(yes)The Plan:aTurnon(on), aSensor(medium,on), aSearch(medium), aMonitor,aGo(medium,near,vis), aSensor(short,on), aSensor(medium,off),aAlign, aLatch(arm), aSatAttCtrl(off), aContactExplanation 1: aBadCamera (Default)Solution 1: aRetryExplanation 2: aSun(front), aGlareSolution 2: aAbortExplanation 3: aRange(near),aSun(behind), aSelfShadowSolution 3: aRetryAfterRandomIntervalActions FluentsFig.5.Examples of the plans generated by Planner A and Planner B.The planners cooperate to achieve the goalsafely capturing the satellite.The twoplanners interact through a plan execution and monitoring unit,which uses plan execu-tion control knowledge Upon receiving a new satellite capture task from the groundstation,the plan execution and monitoring module activates Planner A,which generatesa plan that transforms the current state of the world to the goal statea state where thesatellite is secured.Planner B,on the other hand,is only activated when the plan ex-ecution and monitoring module detects a problem,such as a sensor failure.Planner Bgenerates all plans that will transform the last known good world state to the currentbad world state.Next,it determines the most likely cause for the current fault by con-sidering each plan in turn.After identifying the cause,Planner B suggests corrections.In the current prototype,corrections consist of  abort mission,  retry immediately,and  retry after a random interval of time (the task is aborted if the total time exceedsthe maximum allowed time for the current task).Finally,after the successful handlingof the situation,Planner A resumes.4 ResultsWe have tested the cognitive vision controller in a simulated virtual environment and ina physical lab environment that faithfully reproduces the illumination conditions of thespace environmentstrong light source,very little ambient light,and harsh shadows.The physical setup consisted of the MDRobotics Ltd.proprietary Reuseable SpaceVehicle Payload Handling Simulator, comprising two Fanuc robotic manipulators andthe associated control software.One robot with the camera stereo pair mounted onits end effector acts as the servicer.The other robot carries a grapple xture-equippedsatellite mockup and exhibits realistic satellite motion.The cognitive vision controller met its requirements;i.e.,safely capturing the satel-lite using vision-based sensing (see Fig.3 for the kind of images used),while handlinganomalous situations.We performed 800 test runs in the simulated environment andover 25 test runs on the physical robots.The controller never jeopardized its own safetyor that of the target satellite.It gracefully recovered fromsensing errors.In most cases,it was able to guide the vision systemto re-acquire the satellite by identifying the causeand initiating a suitable search pattern.In situations where it could not resolve the error,it safely parked the manipulator and informed the ground station of its failure.Fig.6.The chaser robot captures the satellite using vision in harsh lighting conditions like thosein orbit.5 ConclusionFuture applications of computer vision shall require more than just low-level vision;they will also have a high-level AI component to guide the vision system in a task-directed and deliberative manner,diagnose sensing problems,and suggest correctivesteps.Also,an ALife inspired,reactive module that implements computational modelsof attention,context,and memory can act as the interface between the vision systemand the symbolic reasoning module.We have demonstrated such a system within thecontext of space robotics.Our practical vision systeminterfaces object recognition andtracking with classical AI through a behavior-based perception and memory unit,and itsuccessfully performs the complex task of autonomously capturing a free-ying satel-lite in harsh environmental conditions.After receiving a single high-level dock com-mand,the system successfully captured the target satellite in most of our tests,whilehandling anomalous situations using its reactive and reasoning abilities.AcknowledgmentsThe authors acknowledge the valuable technical contributions of R.Gillett,H.K.Ng,S.Greene,J.Richmond,Dr.M.Greenspan,M.Liu,and A.Chan.This work was fundedby MD Robotics Limited and Precarn Associates.References[1] Roberts,L.:Machine perception of 3-d solids.In Trippit,J.,Berkowitz,D.,Chapp,L.,Koester,C.,Vanderburgh,A.,eds.:Optical and Electro-Optical Information Processing,MIT Press (1965) 159197[2] Nilsson,N.J.:Shakey the robot.Technical Report 323,Articial Intelligence Center.SRIInternational,Menlo Park,USA (1984)[3] Wertz,J.,Bell,R.:Autonomous rendezvous and docking technologiesstatus andprospects.In:SPIE's 17th Annual International Symposium on Aerospace/Defense Sens-ing,Simulation,and Controls,Orlando,USA (2003)[4] Gurtuna,O.:Emerging space markets:Engines of growth for future space activities (2003)www.futuraspace.com/EmergingSpaceMarkets_fact_sheet.htm.[5] Polites,M.:An assessment of the technology of automated rendezvous and capture inspace.Technical Report NASA/TP-1998-208528,Marshall Space Flight Center,Alabama,USA (1998)[6] NASA,J.P.L.:Mars exploration rover mission home (2004) marsrovers.nasa.gov.[7] Burgard,W.,Cremers,A.B.,Fox,D.,Hahnel,D.,Lakemeyer,G.,Schulz,D.,Steiner,W.,Thrun,S.:Experiences with an interactive museumtour-guide robot.Articial Intelligence114 (1999) 355[8] Howarth,R.J.,Buxton,H.:Conceptual descriptions from monitoring and watching imagesequences.Image and Vision Computing 18 (2000) 105135[9] Arens,M.,Nagel,H.H.:Behavioral knowledge representation for the understanding andcreation of video sequences.In Gunther,A.,Kruse,R.,Neumann,B.,eds.:Proceedingsof the 26th German Conference on Articial Intelligence (KI-2003),Hamburg,Germany(2003) 149163[10] Fernyhough,J.,Cohn,A.G.,Hogg,D.C.:Constructing qualitative event models autmati-cally fromvideo input.Image and Vision Computing 18 (2000) 81103[11] Arens,M.,Ottlik,A.,Nagel,H.H.:Natural language texts for a cognitive vision system.Invan Harmelen,F.,ed.:Proceedings of the 15th European Conference on Articial Intelli-gence (ECAI-2002),Amsterdam,The Netherlands,IOS Press (2002) 455459[12] Jasiobedzki,P.,Greenspan,M.,Roth,G.,Ng,H.,Witcomb,N.:Video-based system forsatellite proximity operations.In:7th ESAWorkshop on Advanced Space Technologies forRobotics and Automation (ASTRA 2002),ESTEC,Noordwijk,The Netherlands (2002)[13] Roth,G.,Whitehead,A.:Using projective vision to nd camera positions in an imagesequence.In:Vision Interface (VI 2000),Montreal,Canada (2000) 8794[14] Greenspan,M.,Jasiobedzki,P.:Pose determination of a free-ying satellite.In:MotionTracking and Object Recognition (MTOR02),Las Vegas,USA (2002)[15] Jasiobedzki,P.,Greenspan,M.,Roth,G.:Pose determination and tracking for autonomoussatellite capture.In:Proceedings of the 6th International Symposium on Articial Intelli-gence and Robotics &Automation in Space (i-SAIRAS 01),Montreal,Canada (2001)[16] Lesp´erance,Y.,Reiter,R.,Lin,F.,Scherl,R.:GOLOG:A logic programming language fordynamic domains.Journal of Logic Programming 31 (1997) 5983