w4h.clean module

The Clean module contains functions for cleaning the data (i.e., removing data not to be used in further analysis)

w4h.clean.remove_bad_depth(df_with_depth, top_col='TOP', bottom_col='BOTTOM', depth_type='depth', verbose=False, log=False)[source]

Function to remove all records in the dataframe with well interpretations where the depth information is bad (i.e., where the bottom of the record is neerer to the surface than the top)

Parameters:
df_with_depthpandas.DataFrame

Pandas dataframe containing the well records and descriptions for each interval

top_colstr, default=’TOP’

The name of the column containing the depth or elevation for the top of the interval, by default ‘TOP’

bottom_colstr, default=’BOTTOM’

The name of the column containing the depth or elevation for the bottom of each interval, by default ‘BOTTOM’

depth_typestr, {‘depth’, ‘elevation’}

Whether the table is organized by depth or elevation. If depth, the top column will have smaller values than the bottom column. If elevation, the top column will have higher values than the bottom column, by default ‘depth’

verbosebool, default = False

Whether to print results to the terminal, by default False

logbool, default = False

Whether to log results to log file, by default False

Returns:
pandas.Dataframe

Pandas dataframe with the records remvoed where the top is indicatd to be below the bottom.

w4h.clean.remove_no_depth(df_with_depth, top_col='TOP', bottom_col='BOTTOM', no_data_val_table='', verbose=False, log=False)[source]

Function to remove well intervals with no depth information

Parameters:
df_with_depthpandas.DataFrame

Dataframe containing well descriptions

top_colstr, optional

Name of column containing information on the top of the well intervals, by default ‘TOP’

bottom_colstr, optional

Name of column containing information on the bottom of the well intervals, by default ‘BOTTOM’

no_data_val_tableany, optional

No data value in the input data, used by this function to indicate that depth data is not there, to be replaced by np.nan, by default ‘’

verbosebool, optional

Whether to print results to console, by default False

logbool, default = False

Whether to log results to log file, by default False

Returns:
df_with_depthpandas.DataFrame

Dataframe with depths dropped

w4h.clean.remove_no_description(df_with_descriptions, description_col='FORMATION', no_data_val_table='', verbose=False, log=False)[source]

Function that removes all records in the dataframe containing the well descriptions where no description is given.

Parameters:
df_with_descriptionspandas.DataFrame

Pandas dataframe containing the well records with their individual descriptions

description_colstr, optional

Name of the column containing the geologic description of each interval, by default ‘FORMATION’

no_data_val_tablestr, optional

The value expected if the column is empty or there is no data. These will be replaced by np.nan before being removed, by default ‘’

verbosebool, optional

Whether to print the results of this step to the terminal, by default False

logbool, default = False

Whether to log results to log file, by default False

Returns:
pandas.DataFrame

Pandas dataframe with records with no description removed.

w4h.clean.remove_no_topo(df_with_topo, zcol='SURFACE_ELEV', no_data_val_table='', verbose=False, log=False)[source]

Function to remove wells that do not have topography data (needed for layer selection later).

This function is intended to be run on the metadata table after elevations have attempted to been added.

Parameters:
df_with_topopandas.DataFrame

Pandas dataframe containing elevation information.

zcolstr

Name of elevation column

no_data_val_tableany

Value in dataset that indicates no data is present (replaced with np.nan)

verbosebool, optional

Whether to print outputs, by default True

logbool, default = False

Whether to log results to log file, by default False

Returns:
pandas.DataFrame

Pandas dataframe with intervals with no topography removed.

w4h.clean.remove_nonlocated(df_with_locations, xcol='LONGITUDE', ycol='LATITUDE', no_data_val_table='', verbose=False, log=False)[source]

Function to remove wells and well intervals where there is no location information

Parameters:
df_with_locationspandas.DataFrame

Pandas dataframe containing well descriptions

metadata_DFpandas.DataFrame

Pandas dataframe containing metadata, including well locations (e.g., Latitude/Longitude)

logbool, default = False

Whether to log results to log file, by default False

Returns:
df_with_locationspandas.DataFrame

Pandas dataframe containing only data with location information