w4h package

This is the wells4hydrogeology package.

It contains the functions needed to convert raw well descriptions into usable (hydro)geologic data.

w4h.add_control_points(df_without_control, df_control=None, xcol='LONGITUDE', ycol='LATITUDE', zcol='ELEV_FT', controlpoints_crs='EPSG:4269', output_crs='EPSG:5070', description_col='FORMATION', interp_col='INTERPRETATION', target_col='TARGET', verbose=False, log=False, **kwargs)[source]

Function to add control points, primarily to aid in interpolation. This may be useful when conditions are known but do not exist in input well database

Parameters:
df_without_controlpandas.DataFrame

Dataframe with current working data

df_controlstr, pathlib.Purepath, or pandas.DataFrame

Pandas dataframe with control points

well_keystr, optional

The column containing the “key” (unique identifier) for each well, by default ‘API_NUMBER’

xcolstr, optional

The column in df_control containing the x coordinates for each control point, by default ‘LONGITUDE’

ycolstr, optional

The column in df_control containing the y coordinates for each control point, by default ‘LATITUDE’

zcolstr, optional

The column in df_control containing the z coordinates for each control point, by default ‘ELEV_FT’

controlpoints_crsstr, optional

The column in df_control containing the crs of points, by default ‘EPSG:4269’

output_crsstr, optional

The output coordinate system, by default ‘EPSG:5070’

description_colstr, optional

The column in df_control with the description (if this is used), by default ‘FORMATION’

interp_colstr, optional

The column in df_control with the interpretation (if this is used), by default ‘INTERPRETATION’

target_colstr, optional

The column in df_control with the target code (if this is used), by default ‘TARGET’

verbosebool, optional

Whether to print information to terminal, by default False

logbool, optional

Whether to log information in log file, by default False

**kwargs

Keyword arguments of pandas.concat() or pandas.read_csv that will be passed to that function, except for objs, which are df and df_control

Returns:
pandas.DataFrame

Pandas DataFrame with original data and control points formatted the same way and concatenated together

w4h.align_rasters(grids_unaligned=None, model_grid=None, no_data_val_grid=0, verbose=False, log=False)[source]

Reprojects two rasters and aligns their pixels

Parameters:
grids_unalignedlist or xarray.DataArray

Contains a list of grids or one unaligned grid

model_gridxarray.DataArray

Contains model grid

no_data_val_gridint, default=0

Sets value of no data pixels

logbool, default = False

Whether to log results to log file, by default False

Returns:
alignedGridslist or xarray.DataArray

Contains aligned grids

w4h.clip_gdf2study_area(study_area, gdf, log=False, verbose=False)[source]

Clips dataframe to only include things within study area.

Parameters:
study_areageopandas.GeoDataFrame

Inputs study area polygon

gdfgeopandas.GeoDataFrame

Inputs point data

logbool, default = False

Whether to log results to log file, by default False

Returns:
gdfClipgeopandas.GeoDataFrame

Contains only points within the study area

w4h.combine_dataset(layer_dataset, surface_elev, bedrock_elev, layer_thick, log=False)[source]

Function to combine xarray datasets or datarrays into a single xr.Dataset. Useful to add surface, bedrock, layer thick, and layer datasets all into one variable, for pickling, for example.

Parameters:
layer_datasetxr.DataArray

DataArray contining all the interpolated layer information.

surface_elevxr.DataArray

DataArray containing surface elevation data

bedrock_elevxr.DataArray

DataArray containing bedrock elevation data

layer_thickxr.DataArray

DataArray containing the layer thickness at each point in the model grid

logbool, default = False

Whether to log inputs and outputs to log file.

Returns:
xr.Dataset

Dataset with all input arrays set to different variables within the dataset.

w4h.coords2geometry(df_no_geometry, xcol='LONGITUDE', ycol='LATITUDE', zcol='ELEV_FT', input_coords_crs='EPSG:4269', output_crs='EPSG:5070', use_z=False, wkt_col='WKT', geometry_source='coords', verbose=False, log=False)[source]

Adds geometry to points with xy coordinates in the specified coordinate reference system.

Parameters:
df_no_geometrypandas.Dataframe

a Pandas dataframe containing points

xcolstr, default=’LONGITUDE’

Name of column holding x coordinate data in df_no_geometry

ycolstr, default=’LATITUDE’

Name of column holding y coordinate data in df_no_geometry

zcolstr, default=’ELEV_FT’

Name of column holding z coordinate data in df_no_geometry

input_coords_crsstr, default=’EPSG:4269’

Name of crs used for geometry

use_zbool, default=False

Whether to use z column in calculation

geometry_sourcestr {‘coords’, ‘wkt’, ‘geometry’}
logbool, default = False

Whether to log results to log file, by default False

Returns:
gdfgeopandas.GeoDataFrame

Geopandas dataframe with points and their geometry values

w4h.define_dtypes(undefined_df, datatypes=None, verbose=False, log=False)[source]

Function to define datatypes of a dataframe, especially with file-indicated dyptes

Parameters:
undefined_dfpd.DataFrame

Pandas dataframe with columns whose datatypes need to be (re)defined

datatypesdict, str, pathlib.PurePath() object, or None, default = None

Dictionary containing datatypes, to be used in pandas.DataFrame.astype() function. If None, will read from file indicated by dtype_file (which must be defined, along with dtype_dir), by default None

logbool, default = False

Whether to log inputs and outputs to log file.

Returns:
dfoutpandas.DataFrame

Pandas dataframe containing redefined columns

w4h.depth_define(df, top_col='TOP', thresh=550.0, parallel_processing=False, verbose=False, log=False)[source]

Function to define all intervals lower than thresh as bedrock

Parameters:
dfpandas.DataFrame

Dataframe to classify

top_colstr, default = ‘TOP’

Name of column that contains the depth information, likely of the top of the well interval, by default ‘TOP’

threshfloat, default = 550.0

Depth (in units used in df[‘top_col’]) below which all intervals will be classified as bedrock, by default 550.0.

verbosebool, default = False

Whether to print results, by default False

logbool, default = True

Whether to log results to log file

Returns:
dfpandas.DataFrame

Dataframe containing intervals classified as bedrock due to depth

w4h.export_dataframe(df, out_dir, filename, date_stamp=True, log=False)[source]

Function to export dataframes

Parameters:
dfpandas dataframe, or list of pandas dataframes

Data frame or list of dataframes to be exported

out_dirstring or pathlib.Path object

Directory to which to export dataframe object(s) as .csv

filenamestr or list of strings

Filename(s) of output files

date_stampbool, default=True

Whether to include a datestamp in the filename. If true, file ends with _yyyy-mm-dd.csv of current date, by default True.

logbool, default = True

Whether to log inputs and outputs to log file.

w4h.export_grids(grid_data, out_path, file_id='', filetype='tif', variable_sep=True, date_stamp=True, verbose=False, log=False)[source]

Function to export grids to files.

Parameters:
grid_dataxarray DataArray or xarray Dataset

Dataset or dataarray to be exported

out_pathstr or pathlib.Path object

Output location for data export. If variable_sep=True, this should be a directory. Otherwise, this should also include the filename. The file extension should not be included here.

file_idstr, optional

If specified, will add this after ‘LayerXX’ or ‘AllLayers’ in the filename, just before datestamp, if used. Example filename for file_id=’Coarse’: Layer1_Coarse_2023-04-18.tif.

filetypestr, optional

Output filetype. Can either be pickle or any file extension supported by rioxarray.rio.to_raster(). Can either include period or not., by default ‘tif’

variable_sepbool, optional

If grid_data is an xarray Dataset, this will export each variable in the dataset as a separate file, including the variable name in the filename, by default False

date_stampbool, optional

Whether to include a date stamp in the file name., by default True

logbool, default = True

Whether to log inputs and outputs to log file.

w4h.export_undefined(df, outdir)[source]

Function to export terms that still need to be defined.

Parameters:
dfpandas.DataFrame

Dataframe containing at least some unclassified data

outdirstr or pathlib.Path

Directory to save file. Filename will be generated automatically based on today’s date.

Returns:
stillNeededDFpandas.DataFrame

Dataframe containing only unclassified terms, and the number of times they occur

w4h.file_setup(well_data, metadata=None, data_filename='*ISGS_DOWNHOLE_DATA*.txt', metadata_filename='*ISGS_HEADER*.txt', log_dir=None, verbose=False, log=False)[source]

Function to setup files, assuming data, metadata, and elevation/location are in separate files (there should be one “key”/identifying column consistent across all files to join/merge them later)

This function may not be useful if files are organized differently than this structure. If that is the case, it is recommended to use the get_most_recent() function for each individual file if needed. It may also be of use to simply skip this function altogether and directly define each filepath in a manner that can be used by pandas.read_csv()

Parameters:
well_datastr or pathlib.Path object

Str or pathlib.Path to directory containing input files, by default str(repoDir)+’/resources’

metadatastr or pathlib.Path object, optional

Str or pathlib.Path to directory containing input metadata files, by default str(repoDir)+’/resources’

data_filenamestr, optional

Pattern used by pathlib.glob() to get the most recent data file, by default ‘ISGS_DOWNHOLE_DATA.txt’

metadata_filenamestr, optional

Pattern used by pathlib.glob() to get the most recent metadata file, by default ‘ISGS_HEADER.txt’

log_dirstr or pathlib.PurePath() or None, default=None

Directory to place log file in. This is not read directly, but is used indirectly by w4h.logger_function()

verbosebool, default = False

Whether to print name of files to terminal, by default True

logbool, default = True

Whether to log inputs and outputs to log file.

Returns:
tuple

Tuple with paths to (well_data, metadata)

w4h.fill_unclassified(df, classification_col='CLASS_FLAG')[source]

Fills unclassified rows in ‘CLASS_FLAG’ column with np.nan

Parameters:
dfpandas.DataFrame

Dataframe on which to perform operation

Returns:
dfpandas.DataFrame

Dataframe on which operation has been performed

w4h.get_current_date()[source]

Gets the current date to help with finding the most recent file

Parameters:

None

dateSuffix : str to use for naming output files

w4h.get_drift_thick(surface_elev=None, bedrock_elev=None, layers=9, plot=False, verbose=False, log=False)[source]

Finds the distance from surface_elev to bedrock_elev and then divides by number of layers to get layer thickness.

Parameters:
surface_elevrioxarray.DataArray

array holding surface elevation

bedrock_elevrioxarray.DataArray

array holding bedrock elevation

layersint, default=9

number of layers needed to calculate thickness for

plotbool, default=False

tells function to either plot the data or not

Returns:
driftThickrioxarray.DataArray

Contains data array containing depth to bedrock at each point

layerThickrioxarray.DataArray

Contains data array with layer thickness at each point

w4h.get_layer_depths(df_with_depths, surface_elev_col='SURFACE_ELEV', layer_thick_col='LAYER_THICK', layers=9, log=False)[source]

Function to calculate depths and elevations of each model layer at each well based on surface elevation, bedrock elevation, and number of layers/layer thickness

Parameters:
df_with_depthspandas.DataFrame

Dataframe containing well metdata

layersint, default=9

Number of layers. This should correlate with get_drift_thick() input parameter, if drift thickness was calculated using that function, by default 9.

logbool, default = False

Whether to log inputs and outputs to log file.

Returns:
pandas.Dataframe

Dataframe containing new columns for depth to layers and elevation of layers.

w4h.get_most_recent(dir=WindowsPath('c:/Users/balikian/LocalData/CodesScripts/Github/wells4hydrogeology/w4h/resources'), glob_pattern='*', verbose=False)[source]

Function to find the most recent file with the indicated pattern, using pathlib.glob function.

Parameters:
dirstr or pathlib.Path object, optional

Directory in which to find the most recent file, by default str(repoDir)+’/resources’

glob_patternstr, optional

String used by the pathlib.glob() function/method for searching, by default ‘*’

Returns:
pathlib.Path object

Pathlib Path object of the most recent file fitting the glob pattern indicated in the glob_pattern parameter.

w4h.get_resources(resource_type='filepaths', scope='local', verbose=False)[source]

Function to get filepaths for resources included with package

Parameters:
resource_typestr, {‘filepaths’, ‘data’}

If filepaths, will return dictionary with filepaths to sample data. If data, returns dictionary with data objects.

scopestr, {‘local’, ‘statewide’}

If ‘local’, will read in sample data for a local (around county sized) project. If ‘state’, will read in sample data for a statewide project (Illinois)

verbosebool, optional

Whether to print results to terminal, by default False

Returns:
resources_dictdict

Dictionary containing key, value pairs with filepaths to resources that may be of interest.

w4h.get_search_terms(spec_path='C:\\Users\\balikian\\LocalData\\CodesScripts\\Github\\wells4hydrogeology\\docs/resources/', spec_glob_pattern='*SearchTerms-Specific*', start_path=None, start_glob_pattern='*SearchTerms-Start*', wildcard_path=None, wildcard_glob_pattern='*SearchTerms-Wildcard', verbose=False, log=False)[source]

Read in dictionary files for downhole data

Parameters:
spec_pathstr or pathlib.Path, optional

Directory where the file containing the specific search terms is located, by default str(repoDir)+’/resources/’

spec_glob_patternstr, optional

Search string used by pathlib.glob() to find the most recent file of interest, uses get_most_recent() function, by default ‘SearchTerms-Specific

start_pathstr or None, optional

Directory where the file containing the start search terms is located, by default None

start_glob_patternstr, optional

Search string used by pathlib.glob() to find the most recent file of interest, uses get_most_recent() function, by default ‘SearchTerms-Start

wildcard_pathstr or pathlib.Path, default = None

Directory where the file containing the wildcard search terms is located, by default None

wildcard_glob_patternstr, default = ‘*SearchTerms-Wildcard’

Search string used by pathlib.glob() to find the most recent file of interest, uses get_most_recent() function, by default ‘SearchTerms-Wildcard

logbool, default = True

Whether to log inputs and outputs to log file.

Returns:
(specTermsPath, startTermsPath, wilcardTermsPath)tuple

Tuple containing the pandas dataframes with specific search terms, with start search terms, and with wildcard search terms

w4h.get_unique_wells(df, wellid_col='API_NUMBER', verbose=False, log=False)[source]

Gets unique wells as a dataframe based on a given column name.

Parameters:
dfpandas.DataFrame

Dataframe containing all wells and/or well intervals of interest

wellid_colstr, default=”API_NUMBER”

Name of column in df containing a unique identifier for each well, by default ‘API_NUMBER’. .unique() will be run on this column to get the unique values.

logbool, default = False

Whether to log results to log file

Returns:
wellsDF

DataFrame containing only the unique well IDs

w4h.grid2study_area(study_area, grid, output_crs='EPSG:5070', verbose=False, log=False)[source]

Clips grid to study area.

Parameters:
study_areageopandas.GeoDataFrame

inputs study area polygon

gridxarray.DataArray

inputs grid array

output_crsstr, default=’EPSG:5070’

inputs the coordinate reference system for the study area

logbool, default = False

Whether to log results to log file, by default False

Returns:
gridxarray.DataArray

returns xarray containing grid clipped only to area within study area

w4h.layer_interp(points, model_grid, layers=None, interp_kind='nearest', surface_grid=None, bedrock_grid=None, layer_thick_grid=None, drift_thick_grid=None, return_type='dataset', export_dir=None, target_col='TARG_THICK_PER', layer_col='LAYER', xcol=None, ycol=None, xcoord='x', ycoord='y', log=False, verbose=False, **kwargs)[source]

Function to interpolate results, going from points to model_grid data. Uses scipy.interpolate module.

Parameters:
pointslist

List containing pandas dataframes or geopandas geoadataframes containing the point data. Should be resDF_list output from layer_target_thick().

model_gridxr.DataArray or xr.Dataset

Xarray DataArray or DataSet with the coordinates/spatial reference of the output model_grid to interpolate to

layersint, default=None

Number of layers for interpolation. If None, uses the length ofthe points list to determine number of layers. By default None.

interp_kindstr, {‘nearest’, ‘interp2d’,’linear’, ‘cloughtocher’, ‘radial basis function’}

Type of interpolation to use. See scipy.interpolate N-D scattered. Values can be any of the following (also shown in “kind” column of N-D scattered section of table here: https://docs.scipy.org/doc/scipy/tutorial/interpolate.html). By default ‘nearest’

return_typestr, {‘dataset’, ‘dataarray’}

Type of xarray object to return, either xr.DataArray or xr.Dataset, by default ‘dataset.’

export_dirstr or pathlib.Path, default=None

Export directory for interpolated grids, using w4h.export_grids(). If None, does not export, by default None.

target_colstr, default = ‘TARG_THICK_PER’

Name of column in points containing data to be interpolated, by default ‘TARG_THICK_PER’.

layer_colstr, default = ‘Layer’

Name of column containing layer number. Not currently used, by default ‘LAYER’

xcolstr, default = ‘None’

Name of column containing x coordinates. If None, will look for ‘geometry’ column, as in a geopandas.GeoDataframe. By default None

ycolstr, default = ‘None’

Name of column containing y coordinates. If None, will look for ‘geometry’ column, as in a geopandas.GeoDataframe. By default None

xcoordstr, default=’x’

Name of x coordinate in model_grid, used to extract x values of model_grid, by default ‘x’

ycoordstr, default=’y’

Name of y coordinate in model_grid, used to extract x values of model_grid, by default ‘y’

logbool, default = True

Whether to log inputs and outputs to log file.

**kwargs

Keyword arguments to be read directly into whichever scipy.interpolate function is designated by the interp_kind parameter.

Returns:
interp_dataxr.DataArray or xr.Dataset, depending on return_type

By default, returns an xr.DataArray object with the layers added as a new dimension called Layer. Can also specify return_type=’dataset’ to return an xr.Dataset with each layer as a separate variable.

w4h.layer_target_thick(df, layers=9, return_all=False, export_dir=None, outfile_prefix=None, depth_top_col='TOP', depth_bot_col='BOTTOM', log=False)[source]

Function to calculate thickness of target material in each layer at each well point

Parameters:
dfgeopandas.geodataframe

Geodataframe containing classified data, surface elevation, bedrock elevation, layer depths, geometry.

layersint, default=9

Number of layers in model, by default 9

return_allbool, default=False

If True, return list of original geodataframes with extra column added for target thick for each layer. If False, return list of geopandas.geodataframes with only essential information for each layer.

export_dirstr or pathlib.Path, default=None

If str or pathlib.Path, should be directory to which to export dataframes built in function.

outfile_prefixstr, default=None

Only used if export_dir is set. Will be used at the start of the exported filenames

depth_top_colstr, default=’TOP’

Name of column containing data for depth to top of described well intervals

depth_bot_colstr, default=’BOTTOM’

Name of column containing data for depth to bottom of described well intervals

logbool, default = True

Whether to log inputs and outputs to log file.

Returns:
res_df or resgeopandas.geodataframe

Geopandas geodataframe containing only important information needed for next stage of analysis.

w4h.logger_function(logtocommence, parameters, func_name)[source]

Function to log other functions, to be called from within other functions

Parameters:
logtocommencebool

Whether to perform logging steps

parametersdict

Dictionary containing parameters and their values, from function

func_namestr

Name of function within which this is called

w4h.merge_lithologies(well_data_df, targinterps_df, interp_col='INTERPRETATION', target_col='TARGET', target_class='bool')[source]

Function to merge lithologies and target booleans based on classifications

Parameters:
well_data_dfpandas.DataFrame

Dataframe containing classified well data

targinterps_dfpandas.DataFrame

Dataframe containing lithologies and their target interpretations, depending on what the target is for this analysis (often, coarse materials=1, fine=0)

target_colstr, default = ‘TARGET’

Name of column in targinterps_df containing the target interpretations

target_class, default = ‘bool’

Whether the input column is using boolean values as its target indicator

Returns:
df_targpandas.DataFrame

Dataframe containing merged lithologies/targets

w4h.merge_metadata(data_df, header_df, data_cols=None, header_cols=None, auto_pick_cols=False, drop_duplicate_cols=True, log=False, verbose=False, **kwargs)[source]

Function to merge tables, intended for merging metadata table with data table

Parameters:
data_dfpandas.DataFrame

“Left” dataframe, intended for this purpose to be dataframe with main data, but can be anything

header_dfpandas.DataFrame

“Right” dataframe, intended for this purpose to be dataframe with metadata, but can be anything

data_colslist, optional

List of strings of column names, for columns to be included after join from “left” table (data table). If None, all columns are kept, by default None

header_colslist, optional

List of strings of columns names, for columns to be included in merged table after merge from “right” table (metadata). If None, all columns are kept, by default None

auto_pick_colsbool, default = False

Whether to autopick the columns from the metadata table. If True, the following column names are kept:[‘API_NUMBER’, ‘LATITUDE’, ‘LONGITUDE’, ‘BEDROCK_ELEV’, ‘SURFACE_ELEV’, ‘BEDROCK_DEPTH’, ‘LAYER_THICK’], by default False

drop_duplicate_colsbool, optional

If True, drops duplicate columns from the tables so that columns do not get renamed upon merge, by default True

logbool, default = False

Whether to log inputs and outputs to log file.

**kwargs

kwargs that are passed directly to pd.merge(). By default, the ‘on’ and ‘how’ parameters are defined as on=’API_NUMBER’ and how=’inner’

Returns:
mergedTablepandas.DataFrame

Merged dataframe

w4h.plot_cross_section(dataset, profile=None, profile_direction=None, xcoord='x', ycoord='y', mapped_variable='Depth_to_Bedrock', cross_section_variable='Model_Layers', surface_elevation_variable='Surface_Elevation', bedrock_elevation_variable='Bedrock_Elevation', layer_elevation_coordinate='layer_elevs', show_layers=True, return_profile_dicts=False, elev_unit='feet', convert_elevation_to=None, title=None, verbose=False, **kwargs)[source]

Function to plot cross section profiles for datasets with properly configured coordinates and variables. This is intended to work “out of the box” with the xarray.Datasets output from w4h.run()

Parameters:
datasetxarray.Dataset

The xarray.Dataset with the proper data variables. Works “out of the box” with outputs from w4h.run().

profileNone, shapely.Linestring, list of coordinates, or geopandas.GeoDataFrame, optional

The profile(s) for which to create the cross sections. If None, by default creates one X profile and one Y profile in the middle of each dimension, by default None

profile_directionlist of str, optional

List of strings (list is same length as profile) indicating the direction to use for the profile. If None, will be [‘WE’, ‘SN’] to fit with profile=None defaults, by default None

xcoordstr, optional

Name of x coordinate, by default ‘x’

ycoordstr, optional

Name of y coordinate, by default ‘y’

mapped_variablestr, optional

Name of variable to show in main map, by default ‘Depth_to_Bedrock’

cross_section_variablestr, optional

Name of variable to use for cross section profiles, by default ‘Model_Layers’

surface_elevation_variablestr, optional

Variable to use for the surface elevation, by default ‘Surface_Elevation’

bedrock_elevation_variablestr, optional

Variable to use for the bedrock elevation, by default ‘Bedrock_Elevation’

layer_elevation_coordinatestr, optional

Coordinate name to use for the layer elevations. This should be a non-indexed coordinate with the shape of the x, y, and layer coordinates, by default ‘layer_elevs’

show_layersbool, optional

Whether to plot the layer boundaries on the cross section, by default True

return_profile_dictsbool, optional

Whether to return the profile dictionaries, rather than the matplotlib.Figure, by default False

elev_unitstr, optional

Unit of elevation for the elevation data, by default ‘feet’

convert_elevation_tostr, optional

If None (default), does not convert elevation. Otherwise, will convert elevation to specified unit. Only conversion between ‘ft’ and ‘meters’ supported.

titlestr, optional

Title to use for the output figure. If None, will be derived from variable names, by default None

verbosebool, optional

Whether to print information about process to terminal, by default False

Returns:
matplotlib.Figure

Matplotlib.Figure instance is returned, unles return_profile_dicts is True. If return_profile_dicts=True, then a list of dicts with information about the profiles is returned.

w4h.read_dict(file, keytype='np')[source]

Function to read a text file with a dictionary in it into a python dictionary

Parameters:
filestr or pathlib.Path object

Filepath to the file of interest containing the dictionary text

keytypestr, optional

String indicating the datatypes used in the text, currently only ‘np’ is implemented, by default ‘np’

Returns:
dict

Dictionary translated from text file.

w4h.read_dictionary_terms(dict_file=None, id_col='ID', search_col='DESCRIPTION', definition_col='LITHOLOGY', class_flag_col='CLASS_FLAG', dictionary_type=None, class_flag=6, rem_extra_cols=True, verbose=False, log=False)[source]

Function to read dictionary terms from file into pandas dataframe

Parameters:
dict_filestr or pathlib.Path object, or list of these

File or list of files to be read

search_colstr, default = ‘DESCRIPTION’

Name of column containing search terms (geologic formations)

definition_colstr, default = ‘LITHOLOGY’

Name of column containing interpretations of search terms (lithologies)

dictionary_typestr or None, {None, ‘exact’, ‘start’, ‘wildcard’,}
Indicator of which kind of dictionary terms to be read in: None, ‘exact’, ‘start’, or ‘wildcard’ by default None.
  • If None, uses name of file to try to determine. If it cannot, it will default to using the classification flag from class_flag

  • If ‘exact’, will be used to search for exact matches to geologic descriptions

  • If ‘start’, will be used as with the .startswith() string method to find inexact matches to geologic descriptions

  • If ‘wildcard’, will be used to find any matching substring for inexact geologic matches

class_flagint, default = 1

Classification flag to be used if dictionary_type is None and cannot be otherwise determined, by default 1

rem_extra_colsbool, default = True

Whether to remove the extra columns from the input file after it is read in as a pandas dataframe, by default True

logbool, default = False

Whether to log inputs and outputs to log file.

Returns:
dict_termspandas.DataFrame

Pandas dataframe with formatting ready to be used in the classification steps of this package

w4h.read_grid(grid_path=None, grid_type='model', no_data_val_grid=0, use_service=False, study_area=None, grid_crs=None, output_crs='EPSG:5070', verbose=False, log=False, **kwargs)[source]

Reads in grid

Parameters:
grid_pathstr or pathlib.Path, default=None

Path to a grid file

grid_typestr, default=’model’

Sets what type of grid to load in

no_data_val_gridint, default=0

Sets the no data value of the grid

use_servicestr, default=False

Sets which service the function uses

study_areageopandas.GeoDataFrame, default=None

Dataframe containing study area polygon

grid_crsstr, default=None

Sets crs to use if clipping to study area

logbool, default = False

Whether to log results to log file, by default False

Returns:
gridINxarray.DataArray

Returns grid

w4h.read_lithologies(lith_file=None, interp_col='LITHOLOGY', target_col='CODE', use_cols=None, verbose=False, log=False)[source]

Function to read lithology file into pandas dataframe

Parameters:
lith_filestr or pathlib.Path object, default = None

Filename of lithology file. If None, default is contained within repository, by default None

interp_colstr, default = ‘LITHOLOGY’

Column to used to match interpretations

target_colstr, default = ‘CODE’

Column to be used as target code

use_colslist, default = None

Which columns to use when reading in dataframe. If None, defaults to [‘LITHOLOGY’, ‘CODE’].

logbool, default = True

Whether to log inputs and outputs to log file.

Returns:
pandas.DataFrame

Pandas dataframe with lithology information

w4h.read_model_grid(model_grid_path, study_area=None, no_data_val_grid=0, read_grid=True, node_byspace=True, grid_crs=None, output_crs='EPSG:5070', verbose=False, log=False)[source]

Reads in model grid to xarray data array

Parameters:
grid_pathstr

Path to model grid file

study_areageopandas.GeoDataFrame, default=None

Dataframe containing study area polygon

no_data_val_gridint, default=0

value assigned to areas with no data

readGridbool, default=True

Whether function to either read grid or create grid

node_byspacebool, default=False

Denotes how to create grid

output_crsstr, default=’EPSG:5070’

Inputs study area crs

grid_crsstr, default=None

Inputs grid crs

logbool, default = False

Whether to log results to log file, by default False

Returns:
modelGridxarray.DataArray

Data array containing model grid

w4h.read_raw_csv(data_filepath, metadata_filepath, data_cols=None, metadata_cols=None, xcol='LONGITUDE', ycol='LATITUDE', well_key='API_NUMBER', encoding='latin-1', verbose=False, log=False, **read_csv_kwargs)[source]

Easy function to read raw .txt files output from (for example), an Access database

Parameters:
data_filepathstr

Filename of the file containing data, including the extension.

metadata_filepathstr

Filename of the file containing metadata, including the extension.

data_colslist, default = None

List with strings with names of columns from txt file to keep after reading. If None, [“API_NUMBER”,”TABLE_NAME”,”FORMATION”,”THICKNESS”,”TOP”,”BOTTOM”], by default None.

metadata_colslist, default = None

List with strings with names of columns from txt file to keep after reading. If None, [‘API_NUMBER’,”TOTAL_DEPTH”,”SECTION”,”TWP”,”TDIR”,”RNG”,”RDIR”,”MERIDIAN”,”QUARTERS”,”ELEVATION”,”ELEVREF”,”COUNTY_CODE”,”LATITUDE”,”LONGITUDE”,”ELEVSOURCE”], by default None

x_colstr, default = ‘LONGITUDE’

Name of column in metadata file indicating the x-location of the well, by default ‘LONGITUDE’

ycolstr, default = ‘LATITUDE’

Name of the column in metadata file indicating the y-location of the well, by default ‘LATITUDE’

well_keystr, default = ‘API_NUMBER’

Name of the column with the key/identifier that will be used to merge data later, by default ‘API_NUMBER’

encodingstr, default = ‘latin-1’

Encoding of the data in the input files, by default ‘latin-1’

verbosebool, default = False

Whether to print the number of rows in the input columns, by default False

logbool, default = False

Whether to log inputs and outputs to log file.

**read_csv_kwargs

**kwargs that get passed to pd.read_csv()

Returns:
(pandas.DataFrame, pandas.DataFrame/None)

Tuple/list with two pandas dataframes: (well_data, metadata) metadata is None if only well_data is used

w4h.read_study_area(study_area=None, study_area_crs=None, output_crs='EPSG:5070', buffer=None, return_original=False, log=False, verbose=False, **read_file_kwargs)[source]

Read study area geospatial file into geopandas

Parameters:
study_areastr, pathlib.Path, geopandas.GeoDataFrame, or shapely.Geometry

Filepath to any geospatial file readable by geopandas. Polygon is best, but may work with other types if extent is correct. If shapely.Geometry, the crs should also be specified using a valid input to gpd.GeoDataFrame(crs=<crs>).

study_area_crsstr, tuple, dict, optional

Not needed unless CRS must be read in manually (e.g, with a shapely.Geometry). CRS designation readable by geopandas/pyproj.

output_crsstr, tuple, dict, optional

CRS to transform study_area to before returning. CRS designation should be readable by geopandas/pyproj. By default, ‘EPSG:5070’.

bufferNone or numeric, default=None

If None, no buffer created. If a numeric value is given (float or int, for example), a buffer will be created at that distance in the unit of the study_area_crs.

return_originalbool, default=False

Whether to return the (reprojected) study area as well as the (reprojected) buffered study area. Study area is only used for clipping data, so usually return_original=False is sufficient.

logbool, default = False

Whether to log results to log file, by default False

verbosebool, default=False

Whether to print status and results to terminal

Returns:
studyAreaINgeopandas dataframe

Geopandas dataframe with polygon geometry.

w4h.read_wcs(study_area, wcs_url='https://data.isgs.illinois.edu/arcgis/services/Elevation/IL_Statewide_Lidar_DEM_WGS/ImageServer/WCSServer?request=GetCapabilities&service=WCS', res_x=30, res_y=30, verbose=False, log=False, **kwargs)[source]

Reads a WebCoverageService from a url and returns a rioxarray dataset containing it.

Parameters:
study_areageopandas.GeoDataFrame

Dataframe containing study area polygon

wcs_urlstr, default=lidarURL
Represents the url for the WCS
res_xint, default=30

Sets resolution for x axis

res_yint, default=30

Sets resolution for y axis

logbool, default = False

Whether to log results to log file, by default False

**kwargs
Returns:
wcsData_rxrxarray.DataArray

A xarray dataarray holding the image from the WebCoverageService

w4h.read_wms(study_area, layer_name='IL_Statewide_Lidar_DEM_WGS:None', wms_url='https://data.isgs.illinois.edu/arcgis/services/Elevation/IL_Statewide_Lidar_DEM_WGS/ImageServer/WCSServer?request=GetCapabilities&service=WCS', srs='EPSG:3857', clip_to_studyarea=True, bbox=[-9889002.6155, 5134541.069716, -9737541.607038, 5239029.6274], res_x=30, res_y=30, size_x=512, size_y=512, format='image/tiff', verbose=False, log=False, **kwargs)[source]

Reads a WebMapService from a url and returns a rioxarray dataset containing it.

Parameters:
study_areageopandas.GeoDataFrame

Dataframe containg study area polygon

layer_namestr, default=’IL_Statewide_Lidar_DEM_WGS:None’

Represents the layer name in the WMS

wms_urlstr, default=lidarURL

Represents the url for the WMS

srsstr, default=’EPSG:3857’

Sets the srs

clip_to_studyareabool, default=True

Whether to clip to study area or not

res_xint, default=30

Sets resolution for x axis

res_yint, default=512

Sets resolution for y axis

size_xint, default=512

Sets width of result

size_yint, default=512

Sets height of result

logbool, default = False

Whether to log results to log file, by default False

Returns:
wmsData_rxrxarray.DataArray

Holds the image from the WebMapService

w4h.read_xyz(xyzpath, datatypes=None, verbose=False, log=False)[source]

Function to read file containing xyz data (elevation/location)

Parameters:
xyzpathstr or pathlib.Path

Filepath of the xyz file, including extension

datatypesdict, default = None

Dictionary containing the datatypes for the columns int he xyz file. If None, {‘ID’:np.uint32,’API_NUMBER’:np.uint64,’LATITUDE’:np.float64,’LONGITUDE’:np.float64,’ELEV_FT’:np.float64}, by default None

verbosebool, default = False

Whether to print the number of xyz records to the terminal, by default False

logbool, default = False

Whether to log inputs and outputs to log file.

Returns:
pandas.DataFrame

Pandas dataframe containing the elevation and location data

w4h.remerge_data(classifieddf, searchdf, parallel_processing=False)[source]

Function to merge newly-classified (or not) and previously classified data

Parameters:
classifieddfpandas.DataFrame

Dataframe that had already been classified previously

searchdfpandas.DataFrame

Dataframe with new classifications

Returns:
remergeDFpandas.DataFrame

Dataframe containing all the data, merged back together

w4h.remove_bad_depth(df_with_depth, top_col='TOP', bottom_col='BOTTOM', depth_type='depth', verbose=False, log=False)[source]

Function to remove all records in the dataframe with well interpretations where the depth information is bad (i.e., where the bottom of the record is neerer to the surface than the top)

Parameters:
df_with_depthpandas.DataFrame

Pandas dataframe containing the well records and descriptions for each interval

top_colstr, default=’TOP’

The name of the column containing the depth or elevation for the top of the interval, by default ‘TOP’

bottom_colstr, default=’BOTTOM’

The name of the column containing the depth or elevation for the bottom of each interval, by default ‘BOTTOM’

depth_typestr, {‘depth’, ‘elevation’}

Whether the table is organized by depth or elevation. If depth, the top column will have smaller values than the bottom column. If elevation, the top column will have higher values than the bottom column, by default ‘depth’

verbosebool, default = False

Whether to print results to the terminal, by default False

logbool, default = False

Whether to log results to log file, by default False

Returns:
pandas.Dataframe

Pandas dataframe with the records remvoed where the top is indicatd to be below the bottom.

w4h.remove_no_depth(df_with_depth, top_col='TOP', bottom_col='BOTTOM', no_data_val_table='', verbose=False, log=False)[source]

Function to remove well intervals with no depth information

Parameters:
df_with_depthpandas.DataFrame

Dataframe containing well descriptions

top_colstr, optional

Name of column containing information on the top of the well intervals, by default ‘TOP’

bottom_colstr, optional

Name of column containing information on the bottom of the well intervals, by default ‘BOTTOM’

no_data_val_tableany, optional

No data value in the input data, used by this function to indicate that depth data is not there, to be replaced by np.nan, by default ‘’

verbosebool, optional

Whether to print results to console, by default False

logbool, default = False

Whether to log results to log file, by default False

Returns:
df_with_depthpandas.DataFrame

Dataframe with depths dropped

w4h.remove_no_description(df_with_descriptions, description_col='FORMATION', no_data_val_table='', verbose=False, log=False)[source]

Function that removes all records in the dataframe containing the well descriptions where no description is given.

Parameters:
df_with_descriptionspandas.DataFrame

Pandas dataframe containing the well records with their individual descriptions

description_colstr, optional

Name of the column containing the geologic description of each interval, by default ‘FORMATION’

no_data_val_tablestr, optional

The value expected if the column is empty or there is no data. These will be replaced by np.nan before being removed, by default ‘’

verbosebool, optional

Whether to print the results of this step to the terminal, by default False

logbool, default = False

Whether to log results to log file, by default False

Returns:
pandas.DataFrame

Pandas dataframe with records with no description removed.

w4h.remove_no_topo(df_with_topo, zcol='SURFACE_ELEV', no_data_val_table='', verbose=False, log=False)[source]

Function to remove wells that do not have topography data (needed for layer selection later).

This function is intended to be run on the metadata table after elevations have attempted to been added.

Parameters:
df_with_topopandas.DataFrame

Pandas dataframe containing elevation information.

zcolstr

Name of elevation column

no_data_val_tableany

Value in dataset that indicates no data is present (replaced with np.nan)

verbosebool, optional

Whether to print outputs, by default True

logbool, default = False

Whether to log results to log file, by default False

Returns:
pandas.DataFrame

Pandas dataframe with intervals with no topography removed.

w4h.remove_nonlocated(df_with_locations, xcol='LONGITUDE', ycol='LATITUDE', no_data_val_table='', verbose=False, log=False)[source]

Function to remove wells and well intervals where there is no location information

Parameters:
df_with_locationspandas.DataFrame

Pandas dataframe containing well descriptions

metadata_DFpandas.DataFrame

Pandas dataframe containing metadata, including well locations (e.g., Latitude/Longitude)

logbool, default = False

Whether to log results to log file, by default False

Returns:
df_with_locationspandas.DataFrame

Pandas dataframe containing only data with location information

w4h.run(well_data, surf_elev_grid, bedrock_elev_grid, model_grid=None, metadata=None, keep_all_cols=True, layers=9, description_col='FORMATION', top_col='TOP', bottom_col='BOTTOM', depth_type='depth', study_area=None, xcol='LONGITUDE', ycol='LATITUDE', zcol='SURFACE_ELEV', well_id_col='API_NUMBER', lith_dict=None, lith_dict_start=None, lith_dict_wildcard=None, target_dict=None, target_name='', include_elevation_grids=True, include_elevation_coordinates=True, export_dir=None, verbose=False, log=False, **kw_params)[source]

w4h.run() is a function that runs the intended workflow of the wells4hydrogeology (w4h) package. This means that it runs several constituent functions. The workflow that this follows is provided in the package wiki. It accepts the parameters of the constituent functions. To see a list of these functions and parameters, use help(w4h.run).

The following functions used in w4h.run() are listed below, along with their parameters and default values for those parameters. See the documentation for the each of the individual functions for more information on a specific parameter:

file_setup

well_data | default = ‘<no default>’

metadata | default = None

data_filename | default = ‘ISGS_DOWNHOLE_DATA.txt’

metadata_filename | default = ‘ISGS_HEADER.txt’

log_dir | default = None

verbose | default = False

log | default = False

read_raw_csv

data_filepath | default = ‘<output of previous function>’

metadata_filepath | default = ‘<output of previous function>’

data_cols | default = None

metadata_cols | default = None

xcol | default = ‘LONGITUDE’

ycol | default = ‘LATITUDE’

well_key | default = ‘API_NUMBER’

encoding | default = ‘latin-1’

verbose | default = False

log | default = False

read_csv_kwargs | default = {}

define_dtypes

undefined_df | default = ‘<output of previous function>’

datatypes | default = None

verbose | default = False

log | default = False

merge_metadata

data_df | default = ‘<output of previous function>’

header_df | default = ‘<output of previous function>’

data_cols | default = None

header_cols | default = None

auto_pick_cols | default = False

drop_duplicate_cols | default = True

log | default = False

verbose | default = False

kwargs | default = {}

coords2geometry

df_no_geometry | default = ‘<output of previous function>’

xcol | default = ‘LONGITUDE’

ycol | default = ‘LATITUDE’

zcol | default = ‘ELEV_FT’

input_coords_crs | default = ‘EPSG:4269’

output_crs | default = ‘EPSG:5070’

use_z | default = False

wkt_col | default = ‘WKT’

geometry_source | default = ‘coords’

verbose | default = False

log | default = False

read_study_area

study_area | default = None

study_area_crs | default = None

output_crs | default = ‘EPSG:5070’

buffer | default = None

return_original | default = False

log | default = False

verbose | default = False

read_file_kwargs | default = {}

clip_gdf2study_area

study_area | default = ‘<output of previous function>’

gdf | default = ‘<output of previous function>’

log | default = False

verbose | default = False

read_grid

grid_path | default = None

grid_type | default = ‘model’

no_data_val_grid | default = 0

use_service | default = False

study_area | default = None

grid_crs | default = None

output_crs | default = ‘EPSG:5070’

verbose | default = False

log | default = False

kwargs | default = {}

add_control_points

df_without_control | default = ‘<output of previous function>’

df_control | default = None

xcol | default = ‘LONGITUDE’

ycol | default = ‘LATITUDE’

zcol | default = ‘ELEV_FT’

controlpoints_crs | default = ‘EPSG:4269’

output_crs | default = ‘EPSG:5070’

description_col | default = ‘FORMATION’

interp_col | default = ‘INTERPRETATION’

target_col | default = ‘TARGET’

verbose | default = False

log | default = False

kwargs | default = {}

remove_nonlocated

df_with_locations | default = ‘<output of previous function>’

xcol | default = ‘LONGITUDE’

ycol | default = ‘LATITUDE’

no_data_val_table | default = ‘’

verbose | default = False

log | default = False

remove_no_topo

df_with_topo | default = ‘<output of previous function>’

zcol | default = ‘SURFACE_ELEV’

no_data_val_table | default = ‘’

verbose | default = False

log | default = False

remove_no_depth

df_with_depth | default = ‘<output of previous function>’

top_col | default = ‘TOP’

bottom_col | default = ‘BOTTOM’

no_data_val_table | default = ‘’

verbose | default = False

log | default = False

remove_bad_depth

df_with_depth | default = ‘<output of previous function>’

top_col | default = ‘TOP’

bottom_col | default = ‘BOTTOM’

depth_type | default = ‘depth’

verbose | default = False

log | default = False

remove_no_description

df_with_descriptions | default = ‘<output of previous function>’

description_col | default = ‘FORMATION’

no_data_val_table | default = ‘’

verbose | default = False

log | default = False

get_search_terms

spec_path | default = ‘C:UsersbalikianLocalDataCodesScriptsGithubwells4hydrogeologydocs/resources/’

spec_glob_pattern | default = ‘SearchTerms-Specific

start_path | default = None

start_glob_pattern | default = ‘SearchTerms-Start

wildcard_path | default = None

wildcard_glob_pattern | default = ‘*SearchTerms-Wildcard’

verbose | default = False

log | default = False

read_dictionary_terms

dict_file | default = None

id_col | default = ‘ID’

search_col | default = ‘DESCRIPTION’

definition_col | default = ‘LITHOLOGY’

class_flag_col | default = ‘CLASS_FLAG’

dictionary_type | default = None

class_flag | default = 6

rem_extra_cols | default = True

verbose | default = False

log | default = False

specific_define

df | default = ‘<output of previous function>’

terms_df | default = ‘<output of previous function>’

description_col | default = ‘FORMATION’

terms_col | default = ‘DESCRIPTION’

parallel_processing | default = False

verbose | default = False

log | default = False

split_defined

df | default = ‘<output of previous function>’

classification_col | default = ‘CLASS_FLAG’

verbose | default = False

log | default = False

start_define

df | default = ‘<output of previous function>’

terms_df | default = ‘<output of previous function>’

description_col | default = ‘FORMATION’

terms_col | default = ‘DESCRIPTION’

parallel_processing | default = False

verbose | default = False

log | default = False

wildcard_define

df | default = ‘<output of previous function>’

terms_df | default = ‘<output of previous function>’

description_col | default = ‘FORMATION’

terms_col | default = ‘DESCRIPTION’

verbose | default = False

log | default = False

remerge_data

classifieddf | default = ‘<output of previous function>’

searchdf | default = ‘<output of previous function>’

parallel_processing | default = False

fill_unclassified

df | default = ‘<output of previous function>’

classification_col | default = ‘CLASS_FLAG’

read_lithologies

lith_file | default = None

interp_col | default = ‘LITHOLOGY’

target_col | default = ‘CODE’

use_cols | default = None

verbose | default = False

log | default = False

merge_lithologies

well_data_df | default = ‘<output of previous function>’

targinterps_df | default = ‘<output of previous function>’

interp_col | default = ‘INTERPRETATION’

target_col | default = ‘TARGET’

target_class | default = ‘bool’

align_rasters

grids_unaligned | default = None

model_grid | default = None

no_data_val_grid | default = 0

verbose | default = False

log | default = False

get_drift_thick

surface_elev | default = None

bedrock_elev | default = None

layers | default = 9

plot | default = False

verbose | default = False

log | default = False

sample_raster_points

raster | default = None

points_df | default = None

well_id_col | default = ‘API_NUMBER’

xcol | default = ‘LONGITUDE’

ycol | default = ‘LATITUDE’

new_col | default = ‘SAMPLED’

verbose | default = False

log | default = False

get_layer_depths

df_with_depths | default = ‘<output of previous function>’

surface_elev_col | default = ‘SURFACE_ELEV’

layer_thick_col | default = ‘LAYER_THICK’

layers | default = 9

log | default = False

layer_target_thick

df | default = ‘<output of previous function>’

layers | default = 9

return_all | default = False

export_dir | default = None

outfile_prefix | default = None

depth_top_col | default = ‘TOP’

depth_bot_col | default = ‘BOTTOM’

log | default = False

layer_interp

points | default = ‘<no default>’

model_grid | default = ‘<no default>’

layers | default = None

interp_kind | default = ‘nearest’

surface_grid | default = None

bedrock_grid | default = None

layer_thick_grid | default = None

drift_thick_grid | default = None

return_type | default = ‘dataset’

export_dir | default = None

target_col | default = ‘TARG_THICK_PER’

layer_col | default = ‘LAYER’

xcol | default = None

ycol | default = None

xcoord | default = ‘x’

ycoord | default = ‘y’

log | default = False

verbose | default = False

kwargs | default = {}

export_grids

grid_data | default = ‘<no default>’

out_path | default = ‘<no default>’

file_id | default = ‘’

filetype | default = ‘tif’

variable_sep | default = True

date_stamp | default = True

verbose | default = False

log | default = False”

w4h.sample_raster_points(raster=None, points_df=None, well_id_col='API_NUMBER', xcol='LONGITUDE', ycol='LATITUDE', new_col='SAMPLED', verbose=False, log=False)[source]

Sample raster values to points from geopandas geodataframe.

Parameters:
rasterrioxarray data array

Raster containing values to be sampled.

points_dfgeopandas.geodataframe

Geopandas dataframe with geometry column containing point values to sample.

well_id_colstr, default=”API_NUMBER”

Column that uniquely identifies each well so multiple sampling points are not taken per well

xcolstr, default=’LONGITUDE’

Column containing name for x-column, by default ‘LONGITUDE.’ This is used to output (potentially) reprojected point coordinates so as not to overwrite the original.

ycolstr, default=’LATITUDE’

Column containing name for y-column, by default ‘LATITUDE.’ This is used to output (potentially) reprojected point coordinates so as not to overwrite the original. new_col : str, optional

new_colstr, default=’SAMPLED’

Name for name of new column containing points sampled from the raster, by default ‘SAMPLED’.

verbosebool, default=True

Whether to send to print() information about progress of function, by default True.

logbool, default = False

Whether to log results to log file, by default False

Returns:
points_dfgeopandas.geodataframe

Same as points_df, but with sampled values and potentially with reprojected coordinates.

w4h.sort_dataframe(df, sort_cols=['API_NUMBER', 'TOP'], remove_nans=True)[source]

Function to sort dataframe by one or more columns.

Parameters:
dfpandas.DataFrame

Dataframe to be sorted

sort_colsstr or list of str, default = [‘API_NUMBER’,’TOP’]

Name(s) of columns by which to sort dataframe, by default [‘API_NUMBER’,’TOP’]

remove_nansbool, default = True

Whether or not to remove nans in the process, by default True

Returns:
df_sortedpandas.DataFrame

Sorted dataframe

w4h.specific_define(df, terms_df, description_col='FORMATION', terms_col='DESCRIPTION', parallel_processing=False, verbose=False, log=False)[source]

Function to classify terms that have been specifically defined in the terms_df.

Parameters:
dfpandas.DataFrame

Input dataframe with unclassified well descriptions.

terms_dfpandas.DataFrame

Dataframe containing the classifications

description_colstr, default=’FORMATION’

Column name in df containing the well descriptions, by default ‘FORMATION’.

terms_colstr, default=’DESCRIPTION’

Column name in terms_df containing the classified descriptions, by default ‘DESCRIPTION’.

verbosebool, default=False

Whether to print up results, by default False.

Returns:
df_Interpspandas.DataFrame

Dataframe containing the well descriptions and their matched classifications.

w4h.split_defined(df, classification_col='CLASS_FLAG', verbose=False, log=False)[source]

Function to split dataframe with well descriptions into two dataframes based on whether a row has been classified.

Parameters:
dfpandas.DataFrame

Dataframe containing all the well descriptions

classification_colstr, default = ‘CLASS_FLAG’

Name of column containing the classification flag, by default ‘CLASS_FLAG’

verbosebool, default = False

Whether to print results, by default False

logbool, default = False

Whether to log results to log file

Returns:
Two-item tuple of pandas.Dataframe

tuple[0] is dataframe containing classified data, tuple[1] is dataframe containing unclassified data.

w4h.start_define(df, terms_df, description_col='FORMATION', terms_col='DESCRIPTION', parallel_processing=False, verbose=False, log=False)[source]

Function to classify descriptions according to starting substring.

Parameters:
dfpandas.DataFrame

Dataframe containing all the well descriptions

terms_dfpandas.DataFrame

Dataframe containing all the startswith substrings to use for searching

description_colstr, default = ‘FORMATION’

Name of column in df containing descriptions, by default ‘FORMATION’

terms_colstr, default = ‘FORMATION’

Name of column in terms_df containing startswith substring to match with description_col, by default ‘FORMATION’

verbosebool, default = False

Whether to print out results, by default False

logbool, default = True

Whether to log results to log file

Returns:
dfpandas.DataFrame

Dataframe containing the original data and new classifications

w4h.verbose_print(func, local_variables, exclude_params=[])[source]
w4h.wildcard_define(df, terms_df, description_col='FORMATION', terms_col='DESCRIPTION', verbose=False, log=False)[source]

Function to classify descriptions according to any substring.

Parameters:
dfpandas.DataFrame

Dataframe containing all the well descriptions

terms_dfpandas.DataFrame

Dataframe containing all the startswith substrings to use for searching

description_colstr, default = ‘FORMATION’

Name of column in df containing descriptions, by default ‘FORMATION’

terms_colstr, default = ‘FORMATION’

Name of column in terms_df containing startswith substring to match with description_col, by default ‘FORMATION’

verbosebool, default = False

Whether to print out results, by default False

logbool, default = True

Whether to log results to log file

Returns:
dfpandas.DataFrame

Dataframe containing the original data and new classifications

w4h.xyz_metadata_merge(xyz, metadata, verbose=False, log=False)[source]

Add elevation to header data file.

Parameters:
xyzpandas.Dataframe

Contains elevation for the points

metadatapandas dataframe

Header data file

logbool, default = False

Whether to log results to log file, by default False

Returns:
headerXYZDatapandas.Dataframe

Header dataset merged to get elevation values

Submodules