For institutions that report cost of attendance for the full academic year an average yearly net price is generated by subtracting the average amount of federal, state or local government, or institutional grant and scholarship aid from the total cost of attendance. Total cost of attendance is the sum of published tuition and required fees, books and supplies and the weighted average room and board and other expenses.

The weighted average for room and board and other expenses is generated as follows: amount for on-campus room, board and other expenses * # of students living on-campus + amount for off-campus (with family) room, board and other expenses * # of students living off-campus with family + amount for off-campus (not with family) room, board and other expenses * # of students living off-campus not with family divided by the total # of students.

Students whose living arrangements are unknown are excluded from the calculation. For some institutions the # of students by living arrangement will be known, but dollar amounts will not be known. In this case the # of students with no corresponding dollar amount will be excluded from the denominator.

For institutions that report the cost of attendance for their largest program for the length of that program, the cost of attendance is generated as stated above, but is then divided by the number of months to complete the program to produce a monthly cost of attendance. The average amount of federal, state or local government, or institutional grant and scholarship aid is also divided by 12 to produce a monthly average. The average net price is generated by subtracting the average monthly grant and scholarship aid from the average monthly cost of attendance and the difference is multiplied by the number of months to complete the program.

Full-time, first-time degree/certificate-seeking undergraduates - A student enrolled in a 4- or 5-year bachelor's degree program, an associate's degree program, or a vocational or technical program below the baccalaureate level, who has no prior postsecondary experience, and is enrolled for 12 or more semester credits, or 12 or more quarter credits, or 24 or more contact hours a week each term.

The GINI coefficient is a measure of statistical dispersion intended to represent the equality of a distribution, and is the most commonly used measure of inequality. Values range from 0 to 1, with 0 being perfect equality. Note that the GINI is a measure that is looking at the spread of a distribution and does not necessarily imply a higher or lower average value of the distribution. For instance if everyone in a given distribution earned a salary of $1,000,000 the GINI of that distribution would be 0, or perfect equality. For more information on GINI, visit this Wikipedia article.

"The margin of error is a statistic expressing the amount of random sampling error in a survey's results. It asserts a likelihood (not a certainty) that the result from a sample is close to the number one would get if the whole population had been queried." (Wikipedia).

ACS data is computed at a 90% confidence level (p=0.10). This means that the U.S. Census Bureau is 90% confident (there is a 90% chance) that the true value for the population is within the bounds of the margin of error for the estimate.

For ACS data, estimates and margins of error are provided by the Census Bureau. For ACS PUMS data, estimates and margins of error are computed by aggregating the records in the PUMS files. The formula below shows how the standard error is calculated for PUMS estimates using the provided replicate weights:

Where, Xr is a replicate estimate, X is the full PUMS weighted estimate, and Z=1.645 for the 90% confidence interval

The number of records a technical variable we track for ACS PUMS data. Number of records, or the num_records variable tracks how many rows from the raw PUMS data file were collapsed to form the estimate. It can be useful in conjunction with the margin of error for understanding data quality.

Revealed Comparative Advantage or RCA is a calculation used to determine what is special or unique about a certain location/occupation or location/industry combination. The calculation (shown below) takes into account 2 shares; the share of the number of citizens in a location that work in a given occupation and the share of the total number of employees in that occupation vis-a-vis all other occupations. The reason this is useful is that if we were to use nominal values, the most populated locations would always dominate and on the flip-side if we were to use percentages, smaller locations with only a few employees in a rare occupation would dominate, biasing the dataset. Using an RCA calculation is a great way to find which classifications are being over (or under) expressed.

Data USA implements a “similarity” measure to suggest potentially relevant universities for comparison purposes. When users view a particular university profile, data for “similar” universities are automatically shown to provide additional context for the data and visualizations that users see. We measure similarity between universities is by analyzing relationships in admissions criteria (admission rates and SAT scores) alongside patterns in the relative concentration of graduates in particular areas of study.

To calculate the similarity metric, we segment universities into groups based on their parent Carnegie classification group. For every university in the same Carnegie group, we calculate the number of degree completions across the 2-digit Classification of Instructional Program (CIP) Codes. We then compute the logarithm of the revealed comparative advantage (RCA) for the course competitions for each school. Next, we join in data on admission rates and total SAT scores for the school’s 75th percentile. Then, we take the log RCAs values alongside the admissions and test data and feed it through a dimensionality reduction process known as t-distributed stochastic neighbor embedding. The metric for the t-SNE process is defined as the squared weighted euclidean distance between two university vectors. The weights are assigned as follows: Each of the 38 CIP codes have a combined overall weight of approximately 45%, while the admission rate and SAT scores comprise the remaining 55%. The result of the t-SNE process is a projection from the 40-dimensional vector to a two-dimensional vector where the most similar universities are those with the shortest distance.

Data from the Dartmouth Atlas reports only two racial categories: black and non-black. Separate analyses of the Hispanic population are challenging because fewer than half of self-designated Hispanics are coded as such in the Medicare data, Hispanics constitute less than 6% of the elderly population (as counted by the U.S. Census), and they are highly clustered in a few communities, making it difficult to compare communities and regions. Although racial designation for Asians and American Indians is more accurate, their small numbers (less than 3%) also limit the precision of race-specific analyses. At the same time, excluding any of these populations from the regional comparisons in this report was judged to be undesirable. We therefore restricted the analyses in the current report to blacks and non-blacks, and, for ease of exposition, we refer to the non-black population as white. These challenges, and the future growth of the Hispanic population, underscore the importance of improving the coding of race and ethnicity.

As defined by the Census Bureau, “Incorporated Places are those reported to the Census Bureau as legally in existence as of January 1, 2010, as reported in the latest Boundary and Annexation Survey (BAS), under the laws of their respective states. An incorporated place is established to provide governmental functions for a concentration of people as opposed to a minor civil division, which generally is created to provide services or administer an area without regard, necessarily, to population. Places always are within a single state or equivalent entity, but may extend across county and county subdivision boundaries. An incorporated place usually is a city, town, village, or borough, but can have other legal descriptions.”

As defined by the Census Bureau, “County Subdivisions are the primary divisions of counties and equivalent entities. They include census county divisions, census subareas, minor civil divisions, and unorganized territories and can be classified as either legal or statistical. Each county subdivision is assigned a five-character numeric Federal Information Processing Series (FIPS) code based on alphabetical sequence within state and an eight-digit National Standard feature identifier.”

On the Data USA platform users can view profiles for individual counties or a map of the entire United States showing data at the county level.

As defined by the Census Bureau, “Metropolitan and micropolitan statistical areas (metro and micro areas) are geographic entities delineated by the Office of Management and Budget (OMB) for use by Federal statistical agencies in collecting, tabulating, and publishing Federal statistics.”

According to the Census Bureau: "Census tracts are small, relatively permanent statistical subdivisions of a county or equivalent entity that are updated by local participants prior to each decennial census as part of the Census Bureau's Participant Statistical Areas Program. The Census Bureau delineates census tracts in situations where no local participant existed or where state, local, or tribal governments declined to participate. The primary purpose of census tracts is to provide a stable set of geographic units for the presentation of statistical data."