Masthead

Data Management

Data management is an important aspect of geospatial analysis and project management. Geospatial data files can be large, complex and difficult to manage. Understanding the data structure and implementing data management best practices will improve your workflow and reduce headaches down the road. File organization with a logical, clear structure and labeling system enables not only others to access your data, but makes it easier for you to find your own data as well.

Best Practices

When viewing data in ArcCatalog (or any ArcGIS application), you will only see one file representing the shapefile or raster; however, you can use Windows Explorer to view all the files associated with data. When copying geospatial data, it is recommended that you do so in ArcCatalog or by using a geoprocessing tool. However, if you do copy a file outside ArcGIS, be sure to copy all the files that make up the dataset.

Data Files

Vector (Shapefiles)

The most common type of vector data encountered is usually an ESRI shapefile. Shapefiles are vector data formats that store the location, shape and attributes of geographic features. They are made up of a set of related files, all associated shapefile files must have the same prefix (name) and should be located in the same location. Below are common file extensions for shapefiles.

Raster

Earlier in this section we reviewed a variety of raster formats. Many of the formats have associated files that contain information about the coordinate system, statistics, similar to shapefiles. Just as in shapefiles, it is important to keep all associated files in the same folder and to copy all files when moving data.

Auxiliary Files

An auxiliary (AUX or AUX.XML) file accompanies the raster and is stored in the same location. The auxiliary file stores any supplementary information that cannot be stored in the raster file itself. This can include: Color map Statistics, histogram, or table pointer to the pyramid file coordinate system, transformation and projection information.

World Files

Many rasters store the georeferencing information in the header of the image file. However, several image formats store this information in a separate ASCII world file. Where the georeferencing information is stored often depends on the capabilities of the software used to generate the files or the user's preference. An example of a world file for a TIFF raster data set is as follows: file name: raster.tif, associated World file: raster.tfw.

Header Files

The ENVI header file (.hdr) contains information for ENVI-format images (.dat files). ENVI creates a new header file whenever you save an image to ENVI raster format. The header file uses the same name as the image file, with the file extension .hdr. Both ArcGIS and ENVI read the header files, but without the header file both programs are unable to open the ENVI raster format (.dat files).

Pyramid Files

Pyramids files are used to improve performance and the speed of loading raster datasets. They are a reduced resolution (spatial) versions of the original raster dataset. They can contain many downsampled or reduced resolution layers. Pyramids speed up the display of raster data by retrieving only the data at a specified resolution that is required for the display. With pyramids, a lower-resolution copy of the data displays quickly when drawing the entire dataset. As you zoom in, levels with finer resolutions are drawn; performance is maintained because you're drawing successively smaller areas. Pyramids are stored in a single file in the same folder as the source raster. There are three main types of pyramid files:

ENVI automatically builds pyramids for each image while loading the image into the display. ArcGIS will often ask the user whether or not you want to generate pyramid layers.

Note that ENVI can read both .ovr and .enp pyramids, but ArcGIS can’t read .enp files. Therefore ArcGIS will create a new .ovr pyramid file. If you are trying to save space you can delete pyramid files (.ovr, .rrd, and .enp) but they will usually be re-created when the raster is opened. You also must be careful not to delete any of the other associated files or you risk corrupting your data.

Metadata

The term metadata generally refers to information that describes the contents of a data file. Metadata help a dataset be understood, re-used, and integrated with other datasets. The information described in a metadata record includes where the data were collected, who is responsible for the dataset, why the dataset was created, and how the data are organized. Metadata generally follow a standard format, making it easier to compare datasets and to transfer files electronically. Many data provided by government or commercial sources will include metadata. This is often provided as a text file (.txt) or XML file along with the data.

The ENVI header file contains metadata for ENVI-format images (.dat files). ENVI creates a new header file whenever you save an image to ENVI raster format. The header file contains a variety of information that can sometimes be pulled directly from the metadata provided with data. This can include the acquisition date, resolution, coordinate system, projection information and more. ENVI has built-in tools that allow you to edit and add additional information and fields to the metadata stored in the ENVI header.

Lab 5: Digital Images


← Back

Lab 5 →

Module Home






Contact Info

Humboldt State University
1 Harpst Street Arcata, CA 95521
skh28@humboldt.edu

© Copyright 2020 HSU - All rights reserved.