Step 3: Store and Explore the dataset

Step 4: Find the state with the Highest Population

# Based on the census2010 (update removed 2011) data, what is the population of the state with the highest population?
# 8. find the highest population in column census2010, and store the population number in "highestPopulation'
max(dfStates$census2010)

## [1] 37253956

# get the index of highest population first, and then get its state name
dfStates[which.max(dfStates$census2010), 1]

## [1] "California"

# 9. Sort the data, in increasing order, based on the census2010 data
# create a permutation which can rearrange column "census2010" into ascending order
sortOrder <- order(dfStates$census2010)
# rearrange the dataframe by the created permutation
sortState <- dfStates[sortOrder,]

Step 5: Explore the distribution of the states

# 10. Write a function that takes two parameters. The first is a vector and the second is a number.
# create a function called "Distribution" that takes two parameters
Distribution <- function(vector,number)
{
# only keep the elements within the vector that are less than the number, and store the number of eligible elements into the variable "count"
count <- length(vector[vector < number])
# calculate the percentage and return the results
return(count/length(vector))
}
# 12. test the function, the result should be 0.2
Distribution(c(1,2,3,4,5), 2)

## [1] 0.2

# 13. test the function with the vector ‘dfStates$Jul2011’, and the mean of dfStates$Jul2011’.
Distribution(dfStates$census2010, mean(dfStates$census2010))