Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Cancer genomics big_datascience_meetup_july_14_2014

1.
Java and Scala
for
Cancer Genomics
By
Ayush Sarkar
Irvington High School
July 14, 2014
1

2.
2
Comparing Reference Genome and Subject Genome
R
S
There are gaps and mutations.
Analysis should be done to identify gaps and mutations.

3.
3
First we go through an Example Java
Program Creation and Execution
Next we looking into Java code in
BioJava
(open source project for Bioinformatics)
BioJava Location:
http://biojava.org/wiki/Main_Page

4.
4
An Example Java Program
to Check Validity and to Compare Two Strings
• In this example we examine the SerialNumber class, which is
used by the Home Software Company to validate software serial
numbers. A valid software serial number is in the form LLLLL-
DDDD-LLLL , where L indicates an alphabetic letter and D
indicates a numeric digit. For example, WRXTQ-7786-PGVZ is a
valid serial number. Notice that a serial number consists of three
groups of characters, delimited by hyphens.
• After checking the validity, a serial number assigned to a
customer will be compared to a serial number stored in database
to check for equality.
• This example shows steps similar to DNA sequence alignment.

5.
5
The fields first, second, and third are used to hold the first,
second, and third groups of characters in a serial number. The
valid field is set to true by the constructor to indicate a valid
serial number, or false to indicate an invalid serial number.
SerialNumber Class Definition
Class Instance Constructor
General Methods
Internal Variables
(Instance Variable)

6.
6
Method Description for the Class
• Constructor: The constructor accepts a string argument that contains a
serial number. The string is tokenized and its tokens are stored in the first ,
second , and third fields. The validate method is called.
• isValid: This method returns the value in the valid field.
• Validate: This method calls the isFirstGroupValid , isSecondGroupValid ,
and isThirdGroupValid methods to validate the first , second , and third
• fields.
• isThirdGroupValid: Methods to validate the first , second , and third
fields.
• isFirstGroupValid: This method returns true if the value stored in the
first field is valid. Otherwise, it returns false .
• isSecondGroupValid: This method returns true if the value stored in
the second field is valid. Otherwise, it returns false .
• isThirdGroupValid: This method returns true if the value stored in the
third field is valid. Otherwise, it returns false .
• EqualityTest: This method is called to check if serial number is equal to a
serial number in the database.

9.
9
BioJava
BioJava is an open-source project dedicated to providing
a Java framework for processing biological data. It includes
objects for manipulating biological sequences, file
parsers, access to BioSQL and Ensembl databases, tools for
making sequence analysis GUIs and powerful analysis and
statistical routines including a dynamic programming toolkit.
BioJava takes part in Google Summer of Code as part of
the OBF - the Open Bioinformatics Foundation. Please visit:
https://developers.google.com/open-source/soc/?csw=1

11.
11
BioJava
DNA translation follows the normal biological flow where a portion of DNA
(assumed to be CDS) is translated to mRNA. This is translated to a protein
sequence using codons.
ProteinSequence protein =
new DNASequence("ATG").getRNASequence().getProteinSequence();
The BioJava sequence I/O code is designed to be flexible and easy to
adapt for a wide variety of purposes. All methods take a
Java BufferedReader object, and return an iterator which allows you to
scan through the sequences in a file. For example:
BufferedReader br = new BufferedReader( new FileReader(fileName) );
SequenceIterator stream = SeqIOTools.readFastaDNA(br);
while (stream.hasNext())
{ Sequence seq = stream.nextSequence(); /
// do something with the sequence. }