w4h.classify module

The Classify module contains functions for defining geological intervals into a preset subset of geologic interpretations.

w4h.classify.depth_define(df, top_col='TOP', thresh=550.0, parallel_processing=False, verbose=False, log=False)[source]

Function to define all intervals lower than thresh as bedrock

Parameters:
dfpandas.DataFrame

Dataframe to classify

top_colstr, default = ‘TOP’

Name of column that contains the depth information, likely of the top of the well interval, by default ‘TOP’

threshfloat, default = 550.0

Depth (in units used in df[‘top_col’]) below which all intervals will be classified as bedrock, by default 550.0.

verbosebool, default = False

Whether to print results, by default False

logbool, default = True

Whether to log results to log file

Returns:
dfpandas.DataFrame

Dataframe containing intervals classified as bedrock due to depth

w4h.classify.export_undefined(df, outdir)[source]

Function to export terms that still need to be defined.

Parameters:
dfpandas.DataFrame

Dataframe containing at least some unclassified data

outdirstr or pathlib.Path

Directory to save file. Filename will be generated automatically based on today’s date.

Returns:
stillNeededDFpandas.DataFrame

Dataframe containing only unclassified terms, and the number of times they occur

w4h.classify.fill_unclassified(df, classification_col='CLASS_FLAG')[source]

Fills unclassified rows in ‘CLASS_FLAG’ column with np.nan

Parameters:
dfpandas.DataFrame

Dataframe on which to perform operation

Returns:
dfpandas.DataFrame

Dataframe on which operation has been performed

w4h.classify.get_unique_wells(df, wellid_col='API_NUMBER', verbose=False, log=False)[source]

Gets unique wells as a dataframe based on a given column name.

Parameters:
dfpandas.DataFrame

Dataframe containing all wells and/or well intervals of interest

wellid_colstr, default=”API_NUMBER”

Name of column in df containing a unique identifier for each well, by default ‘API_NUMBER’. .unique() will be run on this column to get the unique values.

logbool, default = False

Whether to log results to log file

Returns:
wellsDF

DataFrame containing only the unique well IDs

w4h.classify.merge_lithologies(well_data_df, targinterps_df, interp_col='INTERPRETATION', target_col='TARGET', target_class='bool')[source]

Function to merge lithologies and target booleans based on classifications

Parameters:
well_data_dfpandas.DataFrame

Dataframe containing classified well data

targinterps_dfpandas.DataFrame

Dataframe containing lithologies and their target interpretations, depending on what the target is for this analysis (often, coarse materials=1, fine=0)

target_colstr, default = ‘TARGET’

Name of column in targinterps_df containing the target interpretations

target_class, default = ‘bool’

Whether the input column is using boolean values as its target indicator

Returns:
df_targpandas.DataFrame

Dataframe containing merged lithologies/targets

w4h.classify.remerge_data(classifieddf, searchdf, parallel_processing=False)[source]

Function to merge newly-classified (or not) and previously classified data

Parameters:
classifieddfpandas.DataFrame

Dataframe that had already been classified previously

searchdfpandas.DataFrame

Dataframe with new classifications

Returns:
remergeDFpandas.DataFrame

Dataframe containing all the data, merged back together

w4h.classify.sort_dataframe(df, sort_cols=['API_NUMBER', 'TOP'], remove_nans=True)[source]

Function to sort dataframe by one or more columns.

Parameters:
dfpandas.DataFrame

Dataframe to be sorted

sort_colsstr or list of str, default = [‘API_NUMBER’,’TOP’]

Name(s) of columns by which to sort dataframe, by default [‘API_NUMBER’,’TOP’]

remove_nansbool, default = True

Whether or not to remove nans in the process, by default True

Returns:
df_sortedpandas.DataFrame

Sorted dataframe

w4h.classify.specific_define(df, terms_df, description_col='FORMATION', terms_col='DESCRIPTION', parallel_processing=False, verbose=False, log=False)[source]

Function to classify terms that have been specifically defined in the terms_df.

Parameters:
dfpandas.DataFrame

Input dataframe with unclassified well descriptions.

terms_dfpandas.DataFrame

Dataframe containing the classifications

description_colstr, default=’FORMATION’

Column name in df containing the well descriptions, by default ‘FORMATION’.

terms_colstr, default=’DESCRIPTION’

Column name in terms_df containing the classified descriptions, by default ‘DESCRIPTION’.

verbosebool, default=False

Whether to print up results, by default False.

Returns:
df_Interpspandas.DataFrame

Dataframe containing the well descriptions and their matched classifications.

w4h.classify.split_defined(df, classification_col='CLASS_FLAG', verbose=False, log=False)[source]

Function to split dataframe with well descriptions into two dataframes based on whether a row has been classified.

Parameters:
dfpandas.DataFrame

Dataframe containing all the well descriptions

classification_colstr, default = ‘CLASS_FLAG’

Name of column containing the classification flag, by default ‘CLASS_FLAG’

verbosebool, default = False

Whether to print results, by default False

logbool, default = False

Whether to log results to log file

Returns:
Two-item tuple of pandas.Dataframe

tuple[0] is dataframe containing classified data, tuple[1] is dataframe containing unclassified data.

w4h.classify.start_define(df, terms_df, description_col='FORMATION', terms_col='DESCRIPTION', parallel_processing=False, verbose=False, log=False)[source]

Function to classify descriptions according to starting substring.

Parameters:
dfpandas.DataFrame

Dataframe containing all the well descriptions

terms_dfpandas.DataFrame

Dataframe containing all the startswith substrings to use for searching

description_colstr, default = ‘FORMATION’

Name of column in df containing descriptions, by default ‘FORMATION’

terms_colstr, default = ‘FORMATION’

Name of column in terms_df containing startswith substring to match with description_col, by default ‘FORMATION’

verbosebool, default = False

Whether to print out results, by default False

logbool, default = True

Whether to log results to log file

Returns:
dfpandas.DataFrame

Dataframe containing the original data and new classifications

w4h.classify.wildcard_define(df, terms_df, description_col='FORMATION', terms_col='DESCRIPTION', verbose=False, log=False)[source]

Function to classify descriptions according to any substring.

Parameters:
dfpandas.DataFrame

Dataframe containing all the well descriptions

terms_dfpandas.DataFrame

Dataframe containing all the startswith substrings to use for searching

description_colstr, default = ‘FORMATION’

Name of column in df containing descriptions, by default ‘FORMATION’

terms_colstr, default = ‘FORMATION’

Name of column in terms_df containing startswith substring to match with description_col, by default ‘FORMATION’

verbosebool, default = False

Whether to print out results, by default False

logbool, default = True

Whether to log results to log file

Returns:
dfpandas.DataFrame

Dataframe containing the original data and new classifications