The Summarize Data module contains functions that calculate total counts, lengths, areas, and basic descriptive statistics of features and their attributes within areas or near other features.

aggregate_points calculates statistics about points that fall within specified areas or bins.
join_features calculates statistics about features that share a spatial, temporal, or attribute relationship with other features.
reconstruct_tracks calculates statistics about points or polygons that belong to the same track and reconstructs inputs into tracks.
summarize_attributes calculates statistics about feature or tabular data that share attributes.
summarize_within calculates statistics for area features and attributes that overlap each other.

Using a layer of point features and either a layer of area features or bins defined by a specified distance, this tool determines which points fall within each area or bin and calculates statistics about all the points within each area or bin. You may optionally apply time slicing with this tool.

For example

Given point locations of crime incidents, count the number of crimes per county or other administrative district.

This tool works with a layer of point features and a layer of areas features. Input area features can be from a polygon layer or they can be square or hexagonal bins calculated when the tool is run. The tool first determines which points fall within each specified area. After determining this point-in-area spatial relationship, statistics about all points in the area are calculated and assigned to the area. The most basic statistic is the count of the number of points within the area, but you can get other statistics as well.

For example, suppose you have point features of coffee shop locations and area features of counties, and you want to summarize coffee sales by county. Assuming the coffee shops have a TOTAL_SALES attribute, you can get the sum of all TOTAL_SALES within each county, the minimum or maximum TOTAL_SALES within each county, or other statistics such as the count, range, standard deviation, and variance.

This tool can also work with data that is time-enabled. If time is enabled on the input points, then the time slicing options are available. Time slicing allows you to calculate the point-in-area relationship while looking at a specific slice in time. For example, you could look at hourly intervals, which would result in outputs for each hour.

For an example with time, suppose you had point features of every transaction made at various coffee shop locations and no area layer. The data has been recorded over a year and each transaction has a location and a time stamp. Assuming each transaction has a TOTAL_SALES attribute, you can get the sum of all TOTAL_SALES within the space and time of interest. If these transactions are for a single city, we could generate areas that are 1-kilometer grids and look at weekly time slices to summarize the transactions in both time and space.

Argument

Description

point_layer

Required Input Points layer (features).

bin_type

Optional string parameter. If polygon_layer is not defined, it is required.
Choice list:[‘Square’, ‘Hexagon’]

# Usage Example: Using summary_fields on a layer.agg_pts_item=aggregate_points(input_points_layer,bin_size=0.5,bin_type='Hexagon',bin_size_unit='Miles',summary_fields=[{"statisticType":"Count","onStatisticField":"fieldName1"},{"statisticType":"Any","onStatisticField":"fieldName2"}])

The Build Multi-Variable Grid task works with one or more layers of point, line, or polygon
features. The task generates a grid of square or hexagonal bins and compiles information about
each input layer into each bin. For each input layer, this information can include the following
variables:

Distance to Nearest - The distance from each bin to the nearest feature.

Attribute of Nearest - An attribute value of the feature nearest to each bin.

Attribute Summary of Related - A statistical summary of all features within
search distance of each bin.

Only variables you specify in variable_calculations will be included in the result layer. These
variables can help you understand the proximity of your data throughout the extent of your
analysis. The results can help you answer questions such as the following:

Given multiple layers of public transportation infrastructure, where in the city is least
accessible by public transportation?

Given layers of lakes and rivers, what is the name of the water body closest to each
location in the US?

Given a layer of household income, where in the US is the variation of income in the
surrounding 50 miles the largest?

The result of Build Multi-Variable Grid can also be used in prediction and classification
workflows. The task allows you to calculate and compile information from many different data
sources into a single, spatially continuous layer in one step, reducing the amount of effort
required to build prediction and classification models.

Arguments

Description

input_layers

Required list of FeatureLayers. A list of input layers that will be
used in analysis. Each input layer follows the same formatting as
described in the Feature Input topic. This can be one of the
following:

A URL to a feature service layer with an optional filter to
select specific features

A URL to a big data catalog service layer with an optional
filter to select specific features

A feature collection

variable_calculations

Required list of dictionaries. A JSON array containing objects that
describe the input layers and the attributes that will be
calculated for each layer.

bin_size

Required float. The distance for the bins of type binType in the
output polygon layer. Enrichment attributes will be calculated at
the center of each bin. When generating bins, for Square, the
number and units specified determine the height and length of the
square. For Hexagon, the number and units specified determine the
distance between parallel sides.

bin_unit

Optional string. The distance unit for the bins that will be used
to calculate enrichment attributes.

Using either feature layers or tabular data, you can join features and records based on specific relationships between the input layers or tables. Joins will be determined by spatial, temporal, and attribute relationships, and summary statistics can be optionally calculated.

For example

Given point locations of crime incidents with a time, join the crime data to itself specifying a spatial relationship of crimes within 1 kilometer of each other and that occurred within 1 hour of each other to determine if there are a sequence of crimes close to each other in space and time.

Given a table of ZIP Codes with demographic information and area features representing residential buildings, join the demographic information to the residences so each residence now has the information.

The Join Features task works with two layers. Join Features joins attributes from one feature to another based on spatial, temporal, and attribute relationships or some combination of the three. The tool determines all input features that meet the specified join conditions and joins the second input layer to the first. You can optionally join all features to the matching features or summarize the matching features.

Join Features can be applied to points, lines, areas, and tables. A temporal join requires that your input data is time-enabled, and a spatial join requires that your data has a geometry.

Using a time-enabled layer of point or polygon features that represent an instant in time, this tool determines which input features belong in a track and will order the inputs sequentially in time. Statistics are optionally calculated for the input features within each track.

For example

Given point locations and time of hurricane measurements, calculate the mean wind speed and max wind pressure of the hurricane.

This tool works with a time-enabled layer of either point or polygon features that represent an instant in time. It first determines which features belong to a track using an identifier. Using the time at each location, the tracks are ordered sequentially and transformed into a line or polygon representing the path of movement over time. Optionally, the input may be buffered by a field, which will create a polygon at each location. These buffered points, or if the inputs are polygons, are then joined sequentially to create a track as a polygon where the width is representative of the attribute of interest. Resulting tracks have a start and end time, which represent temporally the first and last feature in a given track. When the tracks are created, statistics about the input features are calculated and assigned to the output track. The most basic statistic is the count of points within the area, but other statistics can be calculated as well.

Features in time-enabled layers can be represented in one of two ways:

Instant-A single moment in time
Interval-A start and end time
For example, suppose you have GPS measurements of hurricanes every 10 minutes. Each GPS measurement records the hurricane’s name, location, time of recording, and wind speed. With this information, you could create tracks for each hurricane using the name for track identification, and tracks for each hurricane would be generated. Additionally, you could calculate statistics such as the mean, max, and minimum wind speed of each hurricane, as well as the count of measurements within each track.

Using the same example, you could buffer your tracks by the wind speed. This would buffer each measurement by the wind speed field at that location, and join the buffered areas together, creating a polygon representative of the track path, as well as the changes in wind speed as the hurricanes progressed.

distance_split: A distance used to split tracks. Any features in the inputLayer that are in the same track and are greater than this distance apart will be split into a new track. The units of the distance values are supplied by the distance_unit parameter.

distance_split_unit: The distance unit to be used with the distance value specified in distanceSplit.

Values: Meters,Kilometers,Feet,Miles,NauticalMiles, or Yards

output_name: Output Features Name (str). Required parameter.

gis: Optional, the GIS on which this tool runs. If not specified, the active GIS is used.

Using either feature or tabular data, this tool summarizes statistics for specified fields.

For example

Given locations of grocery stores with a field COMPANY_NAME, summarize the stores by the company name to determine statistics for each company.

Given a table of grocery stores with fields COMPANY_NAME and COUNTY, summarize the stores by the company name and county to determine statistics for each company within each county.

This tool summarizes all the matching values in one or more fields and calculates statistics on them. The most basic statistic is the count of features that have been summarized together, but you can calculate more advanced statistics as well.

For example, suppose you have point features of store locations with a field representing the DISTRICT_MANAGER_NAME and you want to summarize coffee sales by manager. You can specify the field DISTRICT_MANAGER_NAME as the field to dissolve on, and all rows of data representing individual managers will be summarized. This means all store locations that are managed by Manager1 will be summarized into one row with summary statistics calculated. In this instance, statistics like the count of the number of stores and the sum of TOTAL_SALES for all stores that Manager1 manages would be calculated as well as for any other manager listed in the DISTRICT_MANAGER_NAME field.

Parameters:

input_layer: Input Features (feature input). Required parameter.

fields: Summary Fields (str). Required parameter.

summary_fields: Summary Statistics (str/list). Optional parameter.

output_name: Output Features Name (str). Required parameter.

gis: Optional, the GIS on which this tool runs. If not specified, the active GIS is used.

group_by_field: This is a field of the summarized_layer features that you can use to calculate statistics separately for each unique attribute value. For example, suppose the sumWithinLayer contains city boundaries and the summaryPolygons features are parcels. One of the fields of the parcels is Status which contains two values: VACANT and OCCUPIED. To calculate the total area of vacant and occupied parcels within the boundaries of cities, use Status as the groupByField field. This parameter is available at ArcGIS Enterprise 10.6.1+.

minority_majority: This boolean parameter is applicable only when a group_by_field is specified. If true, the minority (least dominant) or the majority (most dominant) attribute values for each group field are calculated. Two new fields are added to the resultLayer prefixed with Majority_ and Minority_. This parameter is available at ArcGIS Enterprise 10.6.1+. The default is false.

percent_shape: This boolean parameter is applicable only when a group_by_field is specified. If set to true, the percentage of each unique group_by_field value is calculated for each sum within layer polygon. The default is false. This parameter is available at ArcGIS Enterprise 10.6.1+.

output_name: Output Features Name (str). Required parameter.

gis: Optional, the GIS on which this tool runs. If not specified, the active GIS is used.