Data Management
Data management is an important aspect of geospatial analysis and project management. Geospatial data files can be large, complex and difficult to manage. Understanding the data structure and implementing data management best practices will improve your workflow and reduce headaches down the road. File organization with a logical, clear structure and labeling system enables not only others to access your data, but makes it easier for you to find your own data as well.
Best Practices
- Be consistent with your organizational structure and use a logical naming convention for files and folders.
- Use consistent file names and formats within a project. If using abbreviations in file or folder names, ensure that others are using the same abbreviations. Consider including a "readme" file along with the dataset that spells out any abbreviations or acronyms.
- Use underscores "_" or dashes "-" instead of spaces in file names, many computer programs don't accept spaces.
- Consider using ArcCatalog to move geospatial data. This will ensure that you move or copy all related file and don't end up with missing or corrupted data.
When viewing data in ArcCatalog (or any ArcGIS application), you will only see one file representing the shapefile or raster; however, you can use Windows Explorer to view all the files associated with data. When copying geospatial data, it is recommended that you do so in ArcCatalog or by using a geoprocessing tool. However, if you do copy a file outside ArcGIS, be sure to copy all the files that make up the dataset.
Data Files
Vector (Shapefiles)
The most common type of vector data encountered is usually an ESRI shapefile. Shapefiles are vector data formats that store the location, shape and attributes of geographic features. They are made up of a set of related files, all associated shapefile files must have the same prefix (name) and should be located in the same location. Below are common file extensions for shapefiles.
- .shp—The main file that stores the feature geometry; required.
- .shx—The index file that stores the index of the feature geometry; required.
- .dbf—The dBASE table that stores the attribute information of features; required.
- .prj—The file that stores the coordinate system information; required.
- .sbn and .sbx—The files that store the spatial index of the features and speed up loading times.
- .xml—Metadata for ArcGIS—stores information about the shapefile.
- .cpg—An optional file that can be used to specify the codepage for identifying the characterset to be used.
Raster
Earlier in this section we reviewed a variety of raster formats. Many of the formats have associated files that contain information about the coordinate system, statistics, similar to shapefiles. Just as in shapefiles, it is important to keep all associated files in the same folder and to copy all files when moving data.
Auxiliary Files
An auxiliary (AUX or AUX.XML) file accompanies the raster and is stored in the same location. The auxiliary file stores any supplementary information that cannot be stored in the raster file itself. This can include: Color map Statistics, histogram, or table pointer to the pyramid file coordinate system, transformation and projection information.
World Files
Many rasters store the georeferencing information in the header of the image file. However, several image formats store this information in a separate ASCII world file. Where the georeferencing information is stored often depends on the capabilities of the software used to generate the files or the user's preference. An example of a world file for a TIFF raster data set is as follows: file name: raster.tif, associated World file: raster.tfw.
Header Files
The ENVI header file (.hdr) contains information for ENVI-format images (.dat files). ENVI creates a new header file whenever you save an image to ENVI raster format. The header file uses the same name as the image file, with the file extension .hdr. Both ArcGIS and ENVI read the header files, but without the header file both programs are unable to open the ENVI raster format (.dat files).
Pyramid Files
Pyramids files are used to improve performance and the speed of loading raster datasets. They are a reduced resolution (spatial) versions of the original raster dataset. They can contain many downsampled or reduced resolution layers. Pyramids speed up the display of raster data by retrieving only the data at a specified resolution that is required for the display. With pyramids, a lower-resolution copy of the data displays quickly when drawing the entire dataset. As you zoom in, levels with finer resolutions are drawn; performance is maintained because you're drawing successively smaller areas. Pyramids are stored in a single file in the same folder as the source raster. There are three main types of pyramid files:
- Overview (.ovr) - Created by ArcGIS, read by both ENVI and Arc
- Reduced resolution dataset (.rrd) created by ERDAS Imagine but read by ArcGIS
- .enp. Created by ENVI and only read by ENVI
ENVI automatically builds pyramids for each image while loading the image into the display. ArcGIS will often ask the user whether or not you want to generate pyramid layers.
Note that ENVI can read both .ovr and .enp pyramids, but ArcGIS can’t read .enp files. Therefore ArcGIS will create a new .ovr pyramid file. If you are trying to save space you can delete pyramid files (.ovr, .rrd, and .enp) but they will usually be re-created when the raster is opened. You also must be careful not to delete any of the other associated files or you risk corrupting your data.
Metadata
The term metadata generally refers to information that describes the contents of a data file. Metadata help a dataset be understood, re-used, and integrated with other datasets. The information described in a metadata record includes where the data were collected, who is responsible for the dataset, why the dataset was created, and how the data are organized. Metadata generally follow a standard format, making it easier to compare datasets and to transfer files electronically. Many data provided by government or commercial sources will include metadata. This is often provided as a text file (.txt) or XML file along with the data.
The ENVI header file contains metadata for ENVI-format images (.dat files). ENVI creates a new header file whenever you save an image to ENVI raster format. The header file contains a variety of information that can sometimes be pulled directly from the metadata provided with data. This can include the acquisition date, resolution, coordinate system, projection information and more. ENVI has built-in tools that allow you to edit and add additional information and fields to the metadata stored in the ENVI header.