Video Collection

The dataset consists of 50 videos of cataract surgeries performed in Brest University Hospital between January 22, 2015 and September 10, 2015. Reasons for surgery included age-related cataract, traumatic cataract and refractive errors. Patients were 61 years old on average (minimum: 23, maximum: 83, standard deviation: 10). There were 38 females and 12 males. Informed consent was obtained from all patients. Surgeries were performed by three surgeons: a renowned expert (48 surgeries), a one-year experienced surgeon (1 surgery) and an intern (1 surgery). Surgeries were performed under an OPMI Lumera T microscope (Carl Zeiss Meditec, Jena, Germany). Videos were recorded with a 180I camera (Toshiba, Tokyo, Japan) and a MediCap USB200 recorder (MediCapture, Plymouth Meeting, USA). The frame definition was 1920x1080 pixels and the frame rate was approximately 30 frames per second. Videos had a duration of 10 minutes and 56 s on average (minimum: 6 minutes 23 s, maximum: 40 minutes 34 s, standard deviation: 6 minutes 5 s). In total, more than nine hours of surgery have been video recorded.

Reference Standard

Tool Usage Annotation

All surgical tools visible in microscope videos were first listed and labeled by the surgeons (see Fig 1). Then, the usage of each tool in videos was annotated independently by two non-M.D. experts. A tool was considered to be in use whenever it was in contact with the eyeball. Therefore, a timestamp was recorded by both experts whenever one tool came into contact with the eyeball, and also when it stopped touching the eyeball. Up to three tools may be used simultaneously: two by the surgeon (one per hand) and sometimes one by an assistant. Annotations were performed at the frame level, using a web interface connected to an SQL database.

1. biomarker

2. Charleux cannula

3. hydrodissection cannula

4. Rycroft cannula

5. viscoelastic cannula

6. cotton

7. capsulorhexis cystotome

8. Bonn forceps

9. capsulorhexis forceps

10. Troutman forceps

11. needle holder

12. irrigation / aspiration handpiece

13. phacoemulsifier handpiece

14. vitrectomy handpiece

15. implant injector

16. primary incision knife

17. secondary incision knife

18. micromanipulator

19. suture needle

20. Mendez ring

21. Vannas scissors

Adjudication

Finally, annotations from both experts were adjudicated: whenever expert 1 annotated that tool A was being used, while expert 2 annotated that tool B was being used instead of A, experts watched the video together and jointly determined the actual tool usage. However, the precise timing of tool/eyeball contacts was not adjudicated. Therefore, a probabilistic reference standard was obtained:

0: both experts agree that the tool is not being used,

1: both experts agree that the tool is being used,

0.5: experts disagree.

Inter-rater agreement, before and after adjudication, is reported in Table 1.

Tool

Before adjudication

After adjudication

biomarker

0.835

0.835

Charleux cannula

0.949

0.963

hydrodissection cannula

0.868

0.982

Rycroft cannula

0.882

0.919

viscoelastic cannula

0.860

0.975

cotton

0.947

0.947

capsulorhexis cystotome

0.994

0.995

Bonn forceps

0.793

0.798

capsulorhexis forceps

0.836

0.849

Troutman forceps

0.764

0.764

needle holder

0.630

0.630

irrigation/aspiration handpiece

0.995

0.995

phacoemulsifier handpiece

0.996

0.997

vitrectomy handpiece

0.998

0.998

implant injector

0.980

0.980

primary incision knife

0.959

0.961

secondary incision knife

0.846

0.852

micromanipulator

0.990

0.995

suture needle

0.893

0.893

Mendez ring

0.941

0.953

Vannas scissors

0.823

0.823

Example of Result

Tool usage, during a typical surgery without any complications, is illustrated in Fig. 2.

Training and Test Sets

The dataset was divided into a training set (25 videos) and a test set (25 videos). Division was made in such a way that 1) each tool appears in the same number of videos from both subsets (plus or minus one) and 2) the test set only contains videos from surgeries performed by the renowned expert. Apart from that, division was made at random. In total, the training set contains 4 hours and 42 minutes of video and the test set contains 4 hours and 24 minutes of video.