/***
StyleSheet for use when a translation requires any css style changes.
This StyleSheet can be used directly by languages such as Chinese, Japanese and Korean which need larger font sizes.
***/
/*{{{*/
body {font-size:0.8em;}
#sidebarOptions {font-size:1.05em;}
#sidebarOptions a {font-style:normal;}
#sidebarOptions .sliderPanel {font-size:0.95em;}
.subtitle {font-size:0.8em;}
.viewer table.listView {font-size:0.95em;}
/*}}}*/

To get started with this blank [[TiddlyWiki]], you'll need to modify the following tiddlers:
* [[SiteTitle]] & [[SiteSubtitle]]: The title and subtitle of the site, as shown above (after saving, they will also appear in the browser title bar)
* [[MainMenu]]: The menu (usually on the left)
* [[DefaultTiddlers]]: Contains the names of the tiddlers that you want to appear when the TiddlyWiki is opened
You'll also need to enter your username for signing your edits: <<option txtUserName>>

|''Type:''|file|
|''URL:''|http://cs.nyu.edu/~yap/classes/datastruc/10f|
|''Workspace:''|(default)|
This tiddler was automatically created to record the details of this server

Type the text for '27 August 2010'

You may be curious about this web document.
It is written using a web publishing technology called ''~TiddlyWiki'' (or TW). The distinctive feature of TW is that all the contents of this wiki is in one single HTML file (so it is somewhat larger than the typical HTML file). You can even download a copy of this file and add your own content (let me know if you do something interesting that is sharable with the class).
I am not an expert on these technologies, but if you have any questions about them, you can ask me and I will try to find answers.
!! Click on one of these tabs for more information:
<<tabs tabsID
"Navigating These Pages" "?" [[NavigatingThesePages]]
"Wiki Technology" "?" [[WikiTechnology]]
"Tiddly Help" "?" [[TiddlyHelp]]
>>

The university and department has an academic integrity policy that must be taken seriously. Here is what it means for this class:
*All handed in assignments must represent your own work.
*If you use any program or solution from sources in the open literature, you must give full attribution.
*You should never copy work of other students, nor let your work be copied by others.
*We do encourage you to discuss problems and material with other students in the class, as this foster learning. But after the discussions, you must write up your own solutions separately.

Just looking ahead -- what next?
After you finish this course, a next natural course to take is
//Basic Algorithms// (V22.0310) where you learn basic paradigms for
constructing algorithms. These paradigms are also called ''algorithmic techniques''
and they often go hand-in-hand with data structures. You will further develop
skills in analysis of algorithms, which you already have a foretaste of
in data structures.

!! Introduction
Computer Programming is key to a great productivity tool.
More and more, programming is not just for technical applications, but is used
for personal interest applications including web use.
After learning the basics of programming, you soon realize that we need to
scale up to writing larger and more complex programs. To do this systematically,
one needs to develop some further theory and techniques.
''Data Structures'' is fundamental to this next step, and is the concern of this course.
What //is// a Data Structure? You can think of the ''array'' which you have
encountered in introductory programming as a data structure. But this array could be viewed
more abstractly, as a ''queue'' or a ''list'' data structure. This abstract view point is
the preferred one in this course. You may have also heard of the
''tree'' data structure (but be forewarned -- there are many, many variants of this
data structure). This course has three general goals:
* Introduction to basic Data Structures (queues, trees, etc)
* Applications of these Data Structures in algorithms
* Analysis of Data Structures
This third goal of //analysis// might not be obvious -- but part of understanding data structure
structures beyond using it mechanically is to understanding the //why//'s and //what if//'s.
Analysis will require some basic mathematical skills which we will provide.
Programming these concepts in {{{Java}}} will be an integral part of this course.
!Additional Information (click on any tab):
<<tabs cookie-topics
"Pre-requisites" "Do you have it?" [[Prerequisites]]
"Topics" "What will we learn?" [[DatastructureTopics]]
"Text Book" "Required text" [[TextBook]]
"Programming Requirement" "Java needed!" [[ProgrammingRequirement]]
"Beyond Data Structures" "What next?" [[BeyondDataStructures]]
>>

My absolute priority is to help you master the subject of Data Structures.
Do not let yourself to fall behind.
I want to be as available as possible, outside of regular lectures or office hours.
You can always make an appointment to see me.

How do we grade you?
You might find these [[instructions| http://cs.nyu.edu/~yap/classes/info/GraderInstr.htm]] for my graders and teaching assistants useful.

;Sep 7: First Class
* We went over the information in this webpage (Course Description and Mechanics)
** Course grade is curved: 30% homework/quizzes, 25% midterm, 35% final
* We introduce some mathematical background from Chapter 1.
** Logarithms: the basic definition: {{block{
@@ log~~b~~x = u@@ if and only if b^^u^^ = x }}}. Note that this assumes that the exponential function b^^u^^ is understood.
** Some conventions on bases of logarithms: @@ lg(x), ln(x), Log(x), and log(x)@@ (without specifying any base).
** The two laws of logarithm (see chapter 1)
* We also started using the big-Oh notation (from Chapter 2.1)
** MOTIVATION: How can we compare two functions f and g?
** @@Function g dominates function f@@ if: there is some x~~0~~ such that for all x>x~~0~~, we have g(x) >= f(x).
** @@Function g is Big-Oh of f @@ if: there is some constant C>0 such that f is dominated by Cf. Notationally, we write "@@g = O(f)@@"
* We discussed the Selection Problem.
;Preparation for next class:
:Please read Chapter 1, and especially about the Selection Problem in the first page. Historical background of selection problem: from Lewis Carroll, Oxford Don, author of Alice in Wonderland -- how to design a more perfect tennis tournament.

!!!Week of Sep 14: Professor Davies covered 2 lectures for this week.
Here are his notes:
!!!!Tue Sep 14:
-- Explained "asymptotic, order-of-magnitude, worst case, running time analysis" and why it is the "right" measure for algorithms
-- Explained o(F), O(F), omega(F), Omega(F), and Theta(F)
-- Gave the basic rules for doing order-of-magnitude comparison
!!!!Thur Sep 16:
: I will show how one derives o/m running times for simple algorithms, and time permitting do more examples of simple comparisons.
Example 1: Copy array A to array B. n is length of arrays A and B.
1. for (i=0; i<n ; i++)
2. { B[i] = A[i]; }
Analysis:
Step 2 is O(1)
Loop 1-2 executes n times
Total time: O(n)
Example 2: Search for value x in array A. n is length of array A
1. { found = false;
2. for (i=0; i<n; i++) {
3. if (A[i] == x) {
4. found = true;
5 exitloop; }
6. }
7. }
Analysis:
Step 1 is O(1).
Steps 4 and 5 are O(1)
Block 4-5 is O(1)
Conditional 3-5 is O(1)
Loop 2-6 executes at most n times (in the case where x is either the
last element or not in A). Total: O(n)
Block 1-7 is O(n)
Note this illustrates the difficulty with average case analysis. What is the
probability that a randomly chosen value x is _not_ in a randomly chosen
array A? Very ill-defined.
Students sometimes do this kind of calculation as follows:
Step 1 is O(1).
Steps 4 and 5 are O(1)
Block 4-5 is O(2)
Conditional 3-5 is O(2)
Loop 2-6 executes at most n times (in the case where x is either the
last element or not in A). Total: O(2n)
Block 1-7 is O(2n+1)
Strictly speaking this is not _incorrect_ but it looks _stupid_ because
O(2n+1) is the same thing as O(n). The expression inside a O(...) should
always be in the simplest form.
Example 3: Are there any repeated values in A? n=A.length
1. found=false;
2. for (i=0; i<n-1; i++) {
3. for (j=i+1; j<n; j++) {
4. if (A[i]==A[j]) {
5. found=true;
6. exitloop;
7. }
8. if (found) then exitloop;
9. }
10. }
Analysis:
1, 5, 6 are O(1)
Block 5-6 is O(1)
Conditional 4-7 is O(1)
Inner loop 3-8 executes at most O(n) times. Total time is O(n)
Outer loop 2-9 executes O(n) times. Total time is O(n^2)
Block 1-9 is O(n)
More careful analysis;
On the mth iteration of the outer loop (i=m), the inner loop
iterates at most n-(m+1) times. Therefore the total number of
iterations of the inner loop, over the entire execution of the algorithm,
(n-1) + (n-2) + ... + 3 + 2 + 1 = n(n-1)/2 is O(n^2)
In this case, you get the same answer (in terms of order of magnitude) but
in other cases, a more careful analysis gives a tighter bound.
Example 4. U and V are character strings of length k. Are they the same string?
function streql(U,V)
1. { for (i=0; i<k; i++) {
2. if (U[i] != V[i])
3. return false;
4. }
5. return true;
end streql.
Analysis: Lines 3 and 5 are O(1)
Conditional 2-3 is O(1)
Loop 1-4 has at most k iterations. Total time is O(k)
Block 1-5 is O(k)
Example 5. W is a string of k characters. T is an array of n k-character
strings. Is W one of the elements of T?
1. { for (i=0; i<n; i++) {
2. if (streql(W,T[i]))
3. return true;
4. }
5. return false;
6. }
Analysis:
Lines 3 and 5 are O(1)
Evaluating the condition in line 2 is O(k). So the conditional is O(k)
The loop 1-4 executes at most n times. Total running time is O(nk)
Block 1-6 is O(nk)
Two points to notice:
1. In analyzing a conditional you have to consider the time to evaluate
the condition. In analyzing a loop, you have to consider the time to
evaluate the loop operations. For example if you have the loop
for(i=f(x); g(i); i=h(i))
then the initialization i=f(x) is executed once, when execution enters the
loop. The increment i=h(i) and the continuation test g(i) are executed
on each iteration. So you have to include these in the time for the loop.
This didn't come up in the first few examples, because all these were O(1)
operations.
2. If you call a function, then the time required is the time associated
with the function on that particular argument.
Example 6. W is a string of k characters. T is a string of n characters.
Assume k < n. Is W a substring of T?
1. { found = false;
2. for (i=0; i < n-k; i++) {
3. match=true;
4. for (j=0; j<k; j++) {
5. if (W[j] != T[i+j]) {
6. match = false;
7. exitloop
8. }
9 }
10. if (match) {
11. found = true;
12. exitloop;
13. }
14. }
15. return found;
16.}
Analysis:
1. 3, 6, 7, 11, 12, 15 are O(1)
Conditional 5-8 is O(1)
Loop 4-8 executes at most k times. Running time is O(k)
Conditional 10-13 is O(1)
Block 3-13 is O(k)
Loop 2-14 executes n-k times; so time O(n-k)
Total running time is O((n-k)*k)
Block 1-15 is O((n-k)*k)
Question for thought: should we abbreviate this as O(nk)?

; What is an ADT or Abstract Data Type?
: e.g., a list of items, (A, B, C,...)
: abstract operations on list:
item FirstItem( ): A
void PrintList( )
boolean Find(item X)
item Find-kth(int K)
: So, the List ADT is characterized by this list of operations
: It is "abstract" because the ADT might be implemented in different
ways (e.g., as arrays or as linked lists).
; Implementation with Arrays
: (see Text)
; Implementation with Linked Lists
: (see Text)
; Java's List Interface:
: It is important to realize that it is a ''generic'' interface (i.e., the types of the list items are parametrized):
: Here is the outline:
{{{
public interface List <T> extends Collection <T> {
T get( int );
T set( int, T);
void add( int, T);
void remove( int );
ListIterator <T> listIter( );
}
}}}
; What is an Iterator?
: This is the concept of a ''current position'' as you go through a collection of items. So it must have a memory of where you are at any moment, and this is implemented as a private member:
{{{
public class MyArrayList<T> implements Iterable <T> {
}
}}}
; The Stack ADT
: This is characterized by two key methods: push and pop
: Push amounts to adding an item at the tail-end of the list, and pop amounts to removing an item from the front-end of the list. Hence a Stack is also known as a ''FIFO'' or first-in-first-out list.
> void push( Item );
> Item pop( );
: The stack is just a restricted version of a list, so the implementation can be directly be based on any List class.
; The Queue ADT
: This is characterized by two key methods: enqueue and dequeue
: Enqueue amounts to adding an item at the tail-end of the list, and dequeue amounts to removing an item from the tail-end of the list. Hence a Queue is also known as a ''LIFO'' or last-in-first-out list.
> void enqueue( Item );
> Item pop( );
: Again, a queue is trivially implemented by any List class.

; Trees in general
: A tree is a collection of nodes, and these nodes are related by a binary relation called (parent-child) relation.
: Trees are special kinds of graphs (specialization in two main ways: no cycles and connected)
: All our trees (if non-empty) have a distinguished node called the root.
: In general, a node can have zero or more children.
: Some terminology: @@root, leaf/internal node, parent/child, ancestor/descendant, depth/height, edge, path, traversal, subtree.@@ Also: proper ancestor/descendant, empty or non-empty tree.
; Binary Trees
: ''Definition of a binary tree T'': it is either an empty tree or it has a root R and two subtrees T~~R~~ and T~~L~~, both of which are also binary trees. If any subtree is non-empty, then its root is the child of R.
: This recursive definition is very important -- we can easily prove many properties about binary trees using this definition.
: We will focus on the special case where there are at most 2 children, calling them ''binary trees''.
: ''Full'' binary trees are those whose nodes have either zero or two children (so, one child only is disallowed).
: @@Basic Facts:@@
# A full binary tree with n leaves must have exactly n-1 internal nodes.
# A binary tree with n nodes has height at least log n.
; Binary Search Trees (BST)
: A BST is a binary tree T in which each node is associated with an Item. The Item comprise a pair (data, key) where key must be a comparable value (typically, integers). Moreover, the key in the root of T is less than the keys in any node of T~~R~~, and more than the keys in any node of T~~L~~.
: Remark: this definition implies that the keys are unique in a BST. If non-unique, we need to be a bit more careful in this definition.
: We can search for any key in a BST T in time O(height(T)). NOTE: what we return from the search is the data associated with the key. Recall that an item is a pair of (key, data) values.

Although our primary reference is the text book, I may find it useful
to augment it with my own notes, which would be posted here. These notes are a summary of what we discussed in class -- they are not meant to be complete in any way!
In addition, I have some lecture notes (generally at a graduate level)
on data structures and algorithms which may be found [[here|http://cs.nyu.edu/faculty/yap/bks/algo]],
in case you are interested.
#<<slider lect1 [[Lecture 1]] "Lecture 1: Introduction to Data Structures and its Analysis (some math background)" "">>
#<<slider rec1 [[Recitation 1]] "Recitation 1" "">>
#<<slider lect2 [[Lecture 2]] "Lecture 2 (Covered by Prof.Davis): Asymptotic Notations and Examples of Complexity Analysis" "">>
#<<slider lect3 [[Lecture 3]] "Lecture 3: ADT: Lists, Stacks, Queues" "">>
#<<slider lect4 [[Lecture 4]] "Lecture 4: Trees (binary search trees)" "">>

>Generally speaking, when you click on a clickable link, a short page of information (called a ''tiddler'') will be added to your current view, usually at the top of the current view. So the order in which these tiddlers are presented on your page is determined by your clicks! You can also close any tiddler by clicking on the ''close'' link on its upper right.

We assume that you have taken a basic programming course, and you
know the programming language Java.
In our department, it is the course V22.0101, Intro to Programming.
If you did not take this course, please talk to me about this.

But in this course (and in our department), we will require the programming language ''Java''.
You will be asked to program in Java in homework.
; General remark:
Once you learned how to program, most of this skill is transferrable
to any programming language you like.
Of course, different languages have different strengths and
weaknesses.

!!! Reading Guide:
: In each chapter, the sections or subsections can be classified into one of three categories:@@ "focus", "read" or "skip"@@. Here I will explicitly state which to @@"focus"@@ or to @@"skip"@@. All other sections should be read at some level of understanding. Note that Chapters 1-4 were from the Midterm Reading Guide (unchanged).
# Chapter 1: Focus: Section 1.2. Skip: Section 1.4.
# Chapter 2: Focus: Section 2.1, Subsection 2.4.2.
# Chapter 3: Focus: Section 3.5 (LinkedList).
# Chapter 4: Skip: Section 4.3.5. Focus: Section 4.4 (AVL tree), Section 4.5 (Splay tree), 4.8.2 (Maps).
# @@Chapter 5, Hashing: @@ The entire chapter will be in our focus. Some of these sections include 5.2 (Hash functions), 5.3 (Separate Chaining), 5.4 (Hash Tables without Linked Lists) and 5.7 (Extended Hashing). Please note that you must understand Theorem 5.1 (p.180) on quadratic probing.
# @@Chapter 6, Priority Queues:@@ Focus on Binary Heaps (6.3) but you may skip Subsection 6.3.4, also skip Skew Heaps (6.7) and Binomial Heaps (6.8). Remember that you have to internalize the algorithms so that you can do "hand-simulation" by drawing the results of insertion, build-heaps, deletion, etc.
# @@Chapter 7, Sorting:@@ Focus on Insertion Sort (7.2), Heapsort (7.5), Mergesort (7.6), Quicksort (7.7), and General Lower Bound (7.8). Solve recurrences such as the mergesort recurrence using the "EGVS" method -- expand, guess, verify and stop, and the transformation method. Understand the concept of a "Decision Tree" (I called this a "Tree Program in class" -- it is a binary tree in which each internal node has a comparison like a:b or b:c. Skip Analysis of Quicksort (7.7.5), Analysis of Selection (7.7.6), and External Sorting (7.10).
!!! Final Exam (Tue Dec 21, 2010)
# It will take place during regular class hours, in the usual room.
# Please prepare TWO sheets of 8"x11" (2-sided) "crip" notes. You can put anything in it, and it any size font.
# You may reuse the sheet from midterm, and prepare a new one. But you can also prepare two new sheets.
# The style of the exam will be as in midterm. But remember that the midterm in 1 hour, and the final is 2 hours.
!!! Midterm (Tue Oct 19, 2010)
# Please prepare an 8"x11" sheet (2-sided) of "crip" notes. You can put anything in it, and it any size font.
# Hint: for the Final Exam, you are allowed two such sheets, so one of them can be the one from Midterm.
# The exam will have a programming aspect in which you are to write simple Java code that are accurate and compilable.
# As for the algorithms, you must know how to do @@"hand-simulation"@@. E.g., give an AVL tree T and a key k, you should know how to insert k into T and draw the new tree correctly.
# Here are two sample midterms from the past (unfortunately, the closest I could get are from the course "Basic Algorithms" (V22.03010) which can be viewed as a slightly more advanced version of "Data Structures": [[2000 fall| http://cs.nyu.edu/~yap/classes/datastruc/10f/pickup/sample-midterm-basic2000f.pdf]] and [[2001 spring| http://cs.nyu.edu/~yap/classes/datastruc/10f/pickup/sample-midterm-basic2001s.pdf]].
# Since we did not have time to cover Splay Trees and B-Trees in Chapter 4, they will not be required for the midterm.
!!! Midterm Solutions (Thu Oct 21, 2010)
# [[Section 1|http://cs.nyu.edu/~yap/classes/datastruc/10f/pickup/mid_Sec1_Sol.pdf]]
# [[Section 2|http://cs.nyu.edu/~yap/classes/datastruc/10f/pickup/mid_Sec2_Sol.pdf]]
!!! Additional Lecture Notes
: I have some advanced lectures notes on topics we cover (mainly for Masters or PhD level courses). Feel free to consult them if you are interested.
# [[Search Trees | http://cs.nyu.edu/~yap/classes/datastruc/10f/pickup/l3_BASE.pdf]]
# [[Amortization (Splay Trees) | http://cs.nyu.edu/~yap/classes/datastruc/10f/pickup/l6_BASE.pdf]]

* General [[Collection | http://cs.nyu.edu/~yap/prog/]] of useful free tools.
: Let me know if something is severely in need of update.
: Following are direct links to some of them.
* [[Java | http://cs.nyu.edu/~yap/prog/java]]:
: Of course, this is the required programming language for this course.
* [[Gvim Editor | http://cs.nyu.edu/~yap/prog/vi]]:
: It is vital to learn a non-WYSIWYG editor if you are serious about programming. My favorite is gvim, but many people like emacs.
* [[Make Program | http://cs.nyu.edu/~yap/prog/make]]:
: This simple tool will make programming less tedious, and a window into related productivity tools.
* [[Tar Program | http://cs.nyu.edu/~yap/prog/tar]]:
: You need this to submit programming assignments.
* [[Cygwin | http://cs.nyu.edu/~yap/prog/cygwin]]:
: Cygwin is the painless way to get a unix-like environment on Windows
* [[ Eclipse | http://cs.nyu.edu/~yap/prog/eclipse]]: to be added
: Eclipse is one of many IDE (integrated development environments) but one of the most widely used.

You may contribute links to this page -- just send me a note of what you feel
would be helpful to this class.
<<tabs usefulLinks
[[Click any tab:]] "tips..." [[noTiddler]]
[[Various Software Tools]] "tips..." [[Software Tools]]
[[How are you graded?]] "tips..." [[Grading]]>>
!Contributions and Jokes
[[Computer Science Joke|http://cs.nyu.edu/~yap/classes/fun/stringJavaJoke.jpg]] from Philippe Juncker... OK, Java Programmers, the joke is on you.
[[Heaps and Trees from xkcd|http://xkcd.com/835]] from Shohan Hasan

Welcome to my Data Structures Course Page !!
If you are registered for my sections (viz., 1, 2 or 3) of this course,
I suggest you begin by sending me at email to say hello,
to introduce yourself. Just say zero or more words
about your self (e.g., what you look forward
to in this course). I will be sure to respond to you.
If you are curious about the technology behind this webpage,
you may click this link: [[About this document|About this document]].
- - - Chee Yap

> For an introduction to ~TiddlyWiki, please look (for example) at [[tiddlywiki.org's introduction|http://tiddlywiki.org/wiki/Introduction]]. The term '"wiki" should be familiar from the ubiquitous "wikipedia". Indeed, there are many variant wiki technologies, and "wikipedia" itself uses the variant called ''~MediaWiki''. Another one I have used is called ''~PmWiki'' (see my [[PmWiki|http://cs.nyu.edu/~yap/wiki/pm/]] site).