About Me

I am Aman Pratap Singh, a 2nd year undergraduate Computer Science and Engineering student at Indian Institute of Technology Bhubaneswar. I have experience in programming with multiple languages such as Python, C/C++, Java, Javascript etc. I frequently use Jupyter Notebooks for my lab assignments and similar purposes as it provides simple and user friendly interface for programming and simultaneously allows documenting the explanations and output of code. I like coding for fun and have worked on various small projects which can be found on my Github Profile.

Recently I have been involved in JupyterLab community. JupyterLab is a next-generation user interface for Project Jupyter and provides all features of classic Jupyter Notebook. I fixed few documentation and UI bugs in JupyterLab repository and created a JupyterLab extension which scraps comics from XKCD and shows it in JupyterLab.

I also enjoy Competitive Programming and actively participate in coding challenges on various sites like Codeforces, Codechef etc. I also have deep interest in Physics, Indian History and Cricket.

Why GSoC with CERN-HSF?

I have been always passionate about the projects which links Basic Sciences with Programming, which is surely the main inspiration for me to work with CERN. I eagerly want to work on this project since it will help many scientists and researchers in their research work. Since I am regular user of Jupyter Notebook, I strongly believe interactive programming greatly simplifies the efforts required in performing complex experiments and elucidate the output. This project integrate powerful backends for big computation with interactive programming environment of notebooks so I believe I am the perfect match for working on this project.

How much time will I be able to contribute to this project?

I will be working 6-8 hours per day for the entire duration of the project.

Other commitments during summer

I have my end semester exams from April 28 to Mar 5 during which I'll be little busy, other than that I do not have any commitments for summer.

Preferred medium for communication

I am perfectly fine with IRC, Email, Skype or any other similar medium of communication. My preferred language for communication is English.

Synopsis

Jupyter Notebook is an interactive computing environment that creates notebooks which contains computer code as well as rich text elements like equations, figures, plots, widgets and theory. These notebooks are easily understandable and can be executed to perform interactive data analysis, scientific computing and code prototyping.

In experiments like LHC(Large Hadron Collider), a very large amount of data (in order of petabytes) is generated. This huge amount of data is then processed using a collection of powerful computers at multiple computing sites by distributing the data in small chunks and processing them individually at remote distributed computing network and then finally collecting the result. These multiple sites are interconnected by a grid. These type of Grids can be accessed by a toolkit called Ganga.

Ganga is an open source iPython based interface tool to the computing grid which leverage the power of distributed computing grid and provide scientists an interface supported by a powerful backend where they can submit their computation intensive programs to Ganga as a batch job. After submitting the job, Ganga processes the program somewhere on the grid, it keeps track of status of the job and after completion of job it gives back output to the user. It can also provide job statistics and job errors, if any.

HTCondor is a workload management system created by University of Wisconsin-Madison. It is based on High-Throughput Computing which effectively utilizes the computing power of idle computers on a network or on a computing grid and offload computing intensive tasks on the idle machines available on a network or computing grid. It provides various features such as job queueing, job prioritization, resource monitoring and management etc. HTCondor provides intelligent resource management by match-making resources available on different machines and resources required by program.

This project aims to create a plugin for Jupyter Notebook and also integrate it to SWAN Notebook service which is a cloud data analysis service developed and powered by CERN. This plugin will easily submit and monitor batch computation jobs to HTCondor using Ganga toolkit. The plugin will display status of ongoing job, progress bar, job statistics and errors in Notebook itself and will also allow termination of ongoing jobs. The plugin shall provide user-friendly Notebook interface to easily perform computation intensive task on Notebook by integrating cell based structure of Notebook to submit jobs and peeking the progress and statistics of the job executed from a cell.

Benefits to Community

This project streamlines the process of large scale computation by providing an integration of powerful backend to Jupyter Notebook which is an interactive web application easily deployable on cloud and remotely accessible. The project will provide scientists and researchers a unified application to write interactive computing intensive program, executing it, monitoring its progress and run-time statistics as well as getting output on successful execution of the program. The project will enhance the process of large scale computing of batch jobs at CERN and other similar organizations.

Project Goals

Objectives

Create a plugin for Jupyter Notebook that can offload batch jobs from notebook.

Using HTCondor, apply the plugin to real batch jobs of CERN.

Test the plugin on CERN’s batch infrastructure.

Integrate the plugin to CERN’s notebook service SWAN.

Tasks

Create a plugin to submit and monitor batch computation jobs from notebook

Design a prototype of plugin for submitting and monitoring jobs from Jupyter Notebook.

Design the user interface and kernel side module prototype.

Determine an architecture for the plugin.

Explore all possible widgets and features of Jupyter Notebook that can be applied to the plugin.

Determine how plugin will interact with Ganga Toolkit.

Design an interface to display progress bar, job statistics and output of the job.

Implement Kernel side of the plugin.

Integrate the designed user interface with kernel side module.

Test the plugin on local backend server

Test the plugin by running small jobs on local backend server.

Perform tests for various corner cases that can arise.

Implement error handling mechanism of plugin

Intentionally create errors to test various event listeners.

Implement how plugin should respond in case of any unexpected request/error.

Write comprehensive documentation of the code written for Task 1.

Apply the plugin to real batch jobs at CERN using HTCondor

Apply the plugin to real and small batch jobs at CERN on local backend.

Test the plugin for complex but low computation real batch jobs at CERN.

Use HTCondor instead of local backend.

Change backend server from local to one provided by HTCondor.

Test the plugin for complex and relatively large computation batch jobs at CERN.

Implement some sample notebooks illustrating the process.

Ask for feedback from users and implement the suggestions.

Write comprehensive documentation of the code written for Task 2.

Deploy and test the plugin to CERN IT Infrastructure.

Test the plugin on CERN IT Infrastructure.

Integrate the plugin with SWAN notebook service.

Ask for feedback from users and implement the suggestions.

Write comprehensive documentation of the code written for Task 3.

Timeline

Duration

Task

March 27

Deadline for submitting Project Proposal

March 27 - April 23

Learn more about Ganga Toolkit.

Read Documentation and learn more about HTCondor

Learn more about Jupyter Notebook.

April 23 - May 14

Official Community Bonding Period

Get Involved with CERN, HTCondor and Jupyter community.

Know more about mentors such as their timezone, preferred medium of communication etc.