Support Services Corporate Home EarthVision Events
 
FAQs
Exploratory Data Analysis Tools

From DGInsider, Q3 1999

The Exploratory Data Analysis (EDA) Tools consist of five new graphs and an application for handling duplicate data points within a data set. The five graphs types are: histogram, probability plot, quantile-quantile plot, probability-probability plot, and scatterplot. All graphs generated by the EDA program can be saved and incorporated into reports or maps via the Graphic Editor or Base and Contour Maps programs. The graphs are fully customizable, and can be saved in the following formats: DGI plot file, annotation, cgm, DXF, hpgl2, and PostScript for incorporation into reports or maps.

Figure 1
Figure 2

The EDA tools are invaluable for examining data set statistics and the relationships between variables. The graphs are especially useful for detecting and correcting errors. For example, outlying data points can be interactively selected and removed so that they do not influence the statistics. Data can also be sorted into subsets, which can then be saved into new files for further use. The EDA interface also provides direct links to the Graphic Editor, Plot Viewer, and 3D Viewer for ease of further data examination.

Prior to the actual building of a Geologic Structure Builder (GSB) model, there are many steps in the handling of the data where errors may have been introduced. For this reason, one of the first steps to take before building a model is to check the data for irregularities. This process can be easily performed using the EDA histogram tool and the 3D Viewer, as featured in this article.

A histogram of borehole gamma ray readings is displayed in Figure 1. These readings reflect the shale and sand contents of the geological formation. In Figure 1, two overlapping populations can be seen: the lower population relates to the portion of the formation that is predominantly sand, and the upper population relates to the predominantly shale portion of the formation. The two populations overlap because of the gradual transitions between the sand and shale depositional environments. At the lower tail (0 - 5) and the higher tail (95 - 105), spikes exist in what would otherwise be a continuous curve. These spikes indicate that something is wrong with the data collection in these regions.

The spikes in the histogram can be examined closely by saving the bins and then displaying the data in the 3D Viewer. In Figure 2, the histogram has been isolated to the lower left portion of the data, and the bins that form the spikes have been selected.

Figure 3

The data within the colored and blank bins can then be saved to separate files, as shown in Figure 3.

After saving the spiked bin samples, they can then be displayed in the 3D Viewer. In the left portion of Figure 4, the low values are distributed in small clusters, as would be expected, based on the geology of this site. In the right portion of Figure 4, however, the points align to form a continuous path of a well, which would not be expected. Based on this observation, this well was determined to be incorrectly calibrated, and therefore removed from the data set. The spike in the data set at approximately 100 counts was also associated with a poorly measured well, and it too was removed from the data set.

With the erroneous well measurement is removed, the data were redisplayed in the histogram. As shown in Figure 5, the measurements now form smooth continuous overlapping populations. These data were then input to property modeling. If the errors had not been detected and removed, one portion of the formation would have been assigned too much sand, and another larger zone too much shale, which would have ramifications for fluid flow modeling based on the geological interpretation.

Figure 4
Figure 5



 
[Home] [Corporate] [Events & News]
[EarthVision] [Support] [Services] [Contact Us]


© 1999-2007 Dynamic Graphics, Inc. All Rights Reserved. Legal Notices.
See Legal Notices for appropriate copyright trademark legend.
Feedback: webmaster@dgi.com

Last updated: March 22, 2007