This is the final assessment for a course titled: Geographic Information Systems. The task was to create a model using ArcGIS or R to solve a geospatial problem in a creative way.
ArcGIS 10.3 Advanced ArcGIS 10.1/10.2 Advanced (Limited functionality)
Analysis Tools Data Management Tools Geostatistical Analyst Tools
API (American Petroleum Institute) gravity is a measure of the heaviness or lightness of crude oil relative to water (US Energy Information Administration, 2015). The API gravity of an oil well is valuable information for many actors in the oil supply chain. The density of oil determines the methods by which it can be refined, the products into which it can be refined, the refineries which can take on the supply of oil, and the value of the oil (Speight, 2011). This is not to mention other logistical factors such as transportation of oil and economic viability of drilling and extraction.
This model uses the Near Tool in ArcGIS to interpolate API gravity information for wells which are missing this data. The Near Tool (Pro.arcgis.com, 2015), a fairly simple method for interpolation, is used in this case because the precision level of the data is fairly low (Akkala, Devabhaktuni and Kumar, 2010). API Gravity is represented by a numerical range and an accompanying classification (ex: Medium (24-32), See Fig. 1), essentially making it a qualitative (categorical) rather than quantitative measurement. The numerical ranges associated with each classification are subject to interpretation (Speight, 2011). Therefore, a more precise interpolation method might provide misleadingly precise results.
The data input for this model is a CSV (Comma Separated Value) spreadsheet of unique oil wells with 16 associated data columns. Some of the wells have API gravity information, and some do not (Fig. 2). The essential data include Latitude, Longitude, API Gravity, and Play Designation. This tool is intended for use globally, but has been tested on a subset of data from the US state of Wyoming, a spreadsheet which includes approximately 70,000 rows. The model takes around 30 minutes to run using this subset.
The Pre-Model Test Tool allows the user to assess the potential accuracy of the model given the subset of data being processed. Data is likely to be broken down by state or province in order to make the run-time more reasonable. However, given that oil deposits are naturally occurring geological formations, they will not adhere to man-made areal units such as states. The Pre-Model Test Tool uses the Subset Features Tool (Pro.arcgis.com, 2015) to take a random 10% subset of the wells which have known API gravity information and uses that “training data” to interpolate the API gravity for the remaining 90% of the known points. At the end, the results are compared to the true measure and a “Success Rate” is provided (as a proportion of 1 where 1 = 100% predictive success). Based on this outcome, the user can determine whether or not to adjust the input data. (See Fig. 3).
The Wells Model Tool works by first creating a layer of well points from the Latitude and Longitude columns in the spreadsheet. The model then separates the wells with known API gravity (APIyesData) from the wells with unknown API gravity (APInoData) into two layers. For each well point without data, the Near Tool then finds the nearest well point from the APIyesData layer and records the NEAR_FID (ObjectID), NEAR_DIST (distance between the two points), NEAR_X (longitude), and NEAR_Y (latitude) of that point. When using the 10.3 version of ArcGIS, the Method parameter is set to Geodesic and the unit of measurement of the NEAR_DIST field will be meters (Geodesic features and measurements in ArcGIS, 2010).
A join is then performed between the APInoData and the columns API and Play_Designation of the APIyesData layer based on the NEAR_FID and ObjectID respectively. This provides an API gravity value for the points which did not have that information originally. The rest of the APIyesData points are then appended to the layer to create a complete layer. The field API_combined is the populated with the known and interpolated API gravity values for all wells.
The next section of the model provides statistical analysis of the results in order to flag wells for post-processing quality assurance. The Z-score for each well is calculated based on the NEAR_DIST value so that outliers can be further analyzed. It is important to note that in this case, the lowest Z-score will actually indicate the highest level of potential accuracy, since that will indicate a well which is a distance of 0 km from its source well. It could be useful to view any point with a Z-score above zero (the mean) as a potential outlier. A field called “FLAG” is calculated to flag any wells that are more than 8 km away from the nearest well with known API gravity information and have a different Play Designation than the source well. The measure of 8 km is an estimate of the distance that separates wells that are unlikely to have a similar API gravity. Wells which are flagged by this algorithm should be checked to ensure the interpolated value has a chance of being accurate. If using an earlier version of ArcGIS, the Geodesic Method in the Near Tool is not an option and the unit of measurement will be decimal degrees, making the “FLAG” calculation void. In the 10.1/10.2 version, the “FLAG” calculation is based on the Z-score instead (ZSCORE > 3).
The processed layer is then exported into a CSV file for further analysis by the user. (See Fig. 4).
In order for the model to function properly, the column names should be formatted to eliminate spaces and special characters. The following columns should be formatted exactly as follows: Latitude, Longitude, Play_Designation, and API. In the API column, all wells with insufficient data should read “Insuff Data” (this is the default).
The Pre-Model Tool should be run in Edit mode. To do this, right click on the Pre_Model_Test Tool and choose “Edit” from the menu. Then from the Model drop-down, choose “Run Entire Model.” At the end, hover the mouse over the “Output Success Rate” oval and the decimal will display as a pop-up.
To run the main WellsModel, simply double click on the icon. A dialogue box will appear asking for the CSV input. Enter the path of the CSV file to be processed. Click OK, and the model will run. It may be a good idea to check the Near Tool in Edit mode to ensure that the Geodesic option is selected under “Method” (version 10.3 only) as there have been issues with this resetting to Planar in the past.
The main output of the model is a CSV containing all of the information from the original file, plus new columns API_Combined, ZSCORE, and FLAG. The other output is a File Geodatabase Feature Class containing the same information which can be used to visualize the output (See Fig. 5 and Fig. 6). This feature class is stored in the Wells_Data.gdb geodatabase. It is important to note that the model will write over any versions of these files that already exist, so they should be immediately renamed and/or saved in another location when the model finishes.
There are several obvious limitations to this tool, the first being that there are many more factors that contribute to API gravity apart from distance to other wells of the same API gravity. Well depth, temperature, age, rock type, and other geological factors determine the density of oil (Speight, 2011). In this case, most of these other factors are unknown, and thus distance is the next best option for interpolation. If this information was known, the model could be adapted to consider this data.
Another issue with this model is the Modifiable Areal Unit Problem (MAUP) (Wong, 2009). Wells along the border of the state line of the sample subset of data are forced to draw their API gravity data from source wells within the state when it is possible that there is a closer option just on the other side of the man-made border (Fig. 7). The Pre-Model Test Tool is meant to help the user identify when the chosen areal unit is inappropriate and adjust their data subset, but unless the user visualizes the data, they might not notice the MAUP.
The last limitation is scalability, which is related to MAUP. If it were possible to run the model on every well in the world at once, there would be no arbitrary boundaries, and therefore no MAUP. However, the software would not be able to handle this amount of data at one time. This is costly in terms of time. Further work on the model may help to create indexes that speed up some parts of the process.