In this course, the second in the Geographic Information Systems (GIS) Specialization, you will go in-depth with common data types (such as raster and vector data), structures, quality and storage during four week-long modules:
Week 1: Learn about data models and formats, including a full understanding of vector data and raster concepts. You will also learn about the implications of a data’s scale and how to load layers from web services.
Week 2: Create a vector data model by using vector attribute tables, writing query strings, defining queries, and adding and calculating fields. You'll also learn how to create new data through the process of digitizing and you'll use the built-in Editor tools in ArcGIS.
Week 3: Learn about common data storage mechanisms within GIS, including geodatabases and shapefiles. Learn how to choose between them for your projects and how to optimize them for speed and size. You'll also work with rasters for the first time, using digital elevation models and creating slope and distance analysis products.
Week 4: Explore datasets and assess them for quality and uncertainty. You will also learn how to bring your maps and data to the Internet and create web maps quickly with ArcGIS Online.
Take GIS Data Formats, Design and Quality as a standalone course or as part of the Geographic Information Systems (GIS) Specialization. You should have equivalent experience to completing the first course in this specialization, Fundamentals of GIS, before taking this course. By completing the second class in the Specialization you will gain the skills needed to succeed in the full program.

Reviews

GU

This is a really nice course that lets to go deeper into GIS concepts, at this point one can feel power on the fingertips - ability to create and publish own online maps can not be underestimated!

OM

Jan 27, 2018

Filled StarFilled StarFilled StarFilled StarFilled Star

Good course, well structured to deliver the invaluable skills, ranging from data management to final output after processing. Good exposure to the toolbox, expecting more in the next course.

From the lesson

Data Quality and Creating Web Maps

The first half of this module goes over uncertainty and data quality, including a lecture on topology, which affects data relationships in your vector feature classes. In Lesson 8, guest lecturer Megan Nguyen will talk all about using ArcGIS Online, including sharing our maps with our colleagues.

Taught By

Nick Santos

Transcript

[MUSIC] Hello again, and welcome back. In this lecture, we'll continue where we left off in the last lecture by discussing types of uncertainty that affect our analysis, and how we can consider them when we design our spatial studies. In this lecture, we'll cover uncertainty from our measuring devices, uncertainty from how we represent and store our data And then uncertainty from how we analyze our data. The first type of uncertainty, Uncertainty in Measurement is probably the easiest type to understand intuitively. This is the error that is introduced because of the limits of our sensing devices or because of the conditions that we were collecting our data in. For example, some GPSs are only accurate to within eight meters or so. Within that area, we can't say for certain that the coordinates we have are the exact coordinates, whether they're here or they're here. But we know we're close. For a collection of measurements that are further apart from the margin of error, this is usually fine. But when trying to analyze distances between points that are all within the margin of error of the sensing device then we need to understand limitations that this imposes on our analysis. We can also put error abounds on our analysis and say things like mount Everest is 8850 meters tall plus or minus about five meters. We don't actually know the exact height. It could be 8855, or it could be 8845, or anywhere in between. Again, it may not matter, because our analysis may not rely on that high level of decision. But, there are many situations, you might need to take it into account. Additionally, it's important to remember that the earth is changing and is a dynamic system and this can affect the measurements we previously taken and how we worked with them with measurements we will take in the future. Back in the office, if we were creating data through the process of digitizing as we did in this course, we need to understand how we can introduce error into our data in the process. We discuss this a bit when we learned about digitizing, but know that sometimes you may not align correctly with the original data that you're digitizing. You'll be off by a little bit to one side, or you may end it too soon, or a little too late. And, so we're inserting tiny bits of error into our data. This is where it's important to know what scale data was digitized at, so you can know the limits of the analysis. At a more refined scale, you're going to see those errors start to creep in. This uncertainty also extends to different lineages of data. For a while in California there were multiple competing data sets for river information. If I wanted to analyze data built on one of those with data built on the other data set, I would need to take some sort of corrective measure that takes into account that these data were generated from different sources. The next major source of uncertainty is uncertainty in how we represent our data. The classic example of this is a mixed raster pixel where the underlying features don't necessarily align perfectly with the boundaries of the raster pixel. This is similar to the regionalization problems we discussed in the previous lecture but slightly different. Our choice of data type and parameters necessarily generalizes here where the real world is highly detailed, a raster contains just the one value. Our method for choosing the value that the raster contains base on the real world can significantly impact our data analysis because entire classifications of information can disappear base on the choice. Do we classify our data based upon Which one is most dominant within the raster cell? Or do we choose which one is at the center of the raster cell? Or do we use some other criteria? A similar problem occurs if we try to aggregate information in point data. Into polygon data. Some polygons may only contain a handful of points while others have detailed information in many points. These polygons with few data samples may be biased relative to the other polygons, if we're aggregating to polygons that weren't drawn based upon some shared theme with the points then maybe the polygon boundaries are biasing our aggregation. That is, the shape of the polygon features were chosen based on some other criteria than what we're choosing to aggregate on. Similarly, maybe some polygons only have points in one corner, meaning that the polygon is biased by missing information from the rest of the locations within the polygon. Imagine we have points representing crime incidents in a city, we could potentially aggregate statistics from these incidents to police districts in the city. But we might see different patterns doing that than if we to aggregate to electoral districts or utilities districts, etc. Each of these could be informative, but when boundaries don't mean anything to the original data and sometimes even when they do. You might get artifacts in your result. You will have to assess whether the trends you see are real or the result of a poorly chosen and poorly aligned aggregation. A more general case of this is the modifiable areal unit problem. Or the MAUP. Which basically says that when we're creating analysis zones, or polygons, the number, size and shape of these zones can dramatically affect the analysis. If you were to double the size of your analysis zones for a public health analysis for. For example, you might get very different results than with your original smaller zones choosing between them isn't simple and they are may not be an objective set of criteria to make the decision of what size and shape your zone should be. One final consideration is something we call the ecological fallacy. The ecological fallacy is a logical fallacy that deals with whether a characteristic of a zone or polygon is actually a characteristic of the locations or individuals within that zone. While we treat the data as if that's the case, we know from experience that it's often not. If our data has a median income level for a polygon. It's not that everyone within that zone makes that amount of money. There could be people making substantially less or substantially more. Similarly, if we make inferences based on this data, we need to understand that it may not and likely will not apply precisely to the individuals in the group. When we're constructing our data it's important that we understand the variability like this that discard in the pursuit of data that meets our needs. Okay. That's it for this lecture. In this lecture we discussed sources of error from our measurement tools as well as how we represent the data. We also discussed the modifiable areal unit problem and how it can distort our results and ecological fallacy and how we can incorrectly apply valid results to an invalid situation by applying inferences from a group or area to an individual. Up next we're going to talk all about topology. See you next time.

Explore our Catalog

Join for free and get personalized recommendations, updates and offers.