w4h.classify module¶

The Classify module contains functions for defining geological intervals into a preset subset of geologic interpretations.

w4h.classify.depth_define(df, top_col='TOP', thresh=550.0, parallel_processing=False, verbose=False, log=False)[source]¶

Function to define all intervals lower than thresh as bedrock

Parameters:

dfpandas.DataFrame: Dataframe to classify
top_colstr, default = ‘TOP’: Name of column that contains the depth information, likely of the top of the well interval, by default ‘TOP’
threshfloat, default = 550.0: Depth (in units used in df[‘top_col’]) below which all intervals will be classified as bedrock, by default 550.0.
verbosebool, default = False: Whether to print results, by default False
logbool, default = True: Whether to log results to log file

Returns:

dfpandas.DataFrame: Dataframe containing intervals classified as bedrock due to depth

w4h.classify.export_undefined(df, outdir)[source]¶

Function to export terms that still need to be defined.

Parameters:

dfpandas.DataFrame: Dataframe containing at least some unclassified data
outdirstr or pathlib.Path: Directory to save file. Filename will be generated automatically based on today’s date.

Returns:

stillNeededDFpandas.DataFrame: Dataframe containing only unclassified terms, and the number of times they occur

w4h.classify.fill_unclassified(df, classification_col='CLASS_FLAG')[source]¶

Fills unclassified rows in ‘CLASS_FLAG’ column with np.nan

Parameters:

dfpandas.DataFrame: Dataframe on which to perform operation

Returns:

dfpandas.DataFrame: Dataframe on which operation has been performed

w4h.classify.get_unique_wells(df, wellid_col='API_NUMBER', verbose=False, log=False)[source]¶

Gets unique wells as a dataframe based on a given column name.

Parameters:

dfpandas.DataFrame: Dataframe containing all wells and/or well intervals of interest
wellid_colstr, default=”API_NUMBER”: Name of column in df containing a unique identifier for each well, by default ‘API_NUMBER’. .unique() will be run on this column to get the unique values.
logbool, default = False: Whether to log results to log file

Returns:

wellsDF: DataFrame containing only the unique well IDs

w4h.classify.merge_lithologies(well_data_df, targinterps_df, interp_col='INTERPRETATION', target_col='TARGET', target_class='bool')[source]¶

Function to merge lithologies and target booleans based on classifications

Parameters:

well_data_dfpandas.DataFrame: Dataframe containing classified well data
targinterps_dfpandas.DataFrame: Dataframe containing lithologies and their target interpretations, depending on what the target is for this analysis (often, coarse materials=1, fine=0)
target_colstr, default = ‘TARGET’: Name of column in targinterps_df containing the target interpretations
target_class, default = ‘bool’: Whether the input column is using boolean values as its target indicator

Returns:

df_targpandas.DataFrame: Dataframe containing merged lithologies/targets

w4h.classify.remerge_data(classifieddf, searchdf, parallel_processing=False)[source]¶

Function to merge newly-classified (or not) and previously classified data

Parameters:

classifieddfpandas.DataFrame: Dataframe that had already been classified previously
searchdfpandas.DataFrame: Dataframe with new classifications

Returns:

remergeDFpandas.DataFrame: Dataframe containing all the data, merged back together

w4h.classify.sort_dataframe(df, sort_cols=['API_NUMBER', 'TOP'], remove_nans=True)[source]¶

Function to sort dataframe by one or more columns.

Parameters:

dfpandas.DataFrame: Dataframe to be sorted
sort_colsstr or list of str, default = [‘API_NUMBER’,’TOP’]: Name(s) of columns by which to sort dataframe, by default [‘API_NUMBER’,’TOP’]
remove_nansbool, default = True: Whether or not to remove nans in the process, by default True

Returns:

df_sortedpandas.DataFrame: Sorted dataframe

w4h.classify.specific_define(df, terms_df, description_col='FORMATION', terms_col='DESCRIPTION', parallel_processing=False, verbose=False, log=False)[source]¶

Function to classify terms that have been specifically defined in the terms_df.

Parameters:

dfpandas.DataFrame: Input dataframe with unclassified well descriptions.
terms_dfpandas.DataFrame: Dataframe containing the classifications
description_colstr, default=’FORMATION’: Column name in df containing the well descriptions, by default ‘FORMATION’.
terms_colstr, default=’DESCRIPTION’: Column name in terms_df containing the classified descriptions, by default ‘DESCRIPTION’.
verbosebool, default=False: Whether to print up results, by default False.

Returns:

df_Interpspandas.DataFrame: Dataframe containing the well descriptions and their matched classifications.

w4h.classify.split_defined(df, classification_col='CLASS_FLAG', verbose=False, log=False)[source]¶

Function to split dataframe with well descriptions into two dataframes based on whether a row has been classified.

Parameters:

dfpandas.DataFrame: Dataframe containing all the well descriptions
classification_colstr, default = ‘CLASS_FLAG’: Name of column containing the classification flag, by default ‘CLASS_FLAG’
verbosebool, default = False: Whether to print results, by default False
logbool, default = False: Whether to log results to log file

Returns:

Two-item tuple of pandas.Dataframe: tuple[0] is dataframe containing classified data, tuple[1] is dataframe containing unclassified data.

w4h.classify.start_define(df, terms_df, description_col='FORMATION', terms_col='DESCRIPTION', parallel_processing=False, verbose=False, log=False)[source]¶

Function to classify descriptions according to starting substring.

Parameters:

dfpandas.DataFrame: Dataframe containing all the well descriptions
terms_dfpandas.DataFrame: Dataframe containing all the startswith substrings to use for searching
description_colstr, default = ‘FORMATION’: Name of column in df containing descriptions, by default ‘FORMATION’
terms_colstr, default = ‘FORMATION’: Name of column in terms_df containing startswith substring to match with description_col, by default ‘FORMATION’
verbosebool, default = False: Whether to print out results, by default False
logbool, default = True: Whether to log results to log file

Returns:

dfpandas.DataFrame: Dataframe containing the original data and new classifications

w4h.classify.wildcard_define(df, terms_df, description_col='FORMATION', terms_col='DESCRIPTION', verbose=False, log=False)[source]¶

Function to classify descriptions according to any substring.

Parameters:

dfpandas.DataFrame: Dataframe containing all the well descriptions
terms_dfpandas.DataFrame: Dataframe containing all the startswith substrings to use for searching
description_colstr, default = ‘FORMATION’: Name of column in df containing descriptions, by default ‘FORMATION’
terms_colstr, default = ‘FORMATION’: Name of column in terms_df containing startswith substring to match with description_col, by default ‘FORMATION’
verbosebool, default = False: Whether to print out results, by default False
logbool, default = True: Whether to log results to log file

Returns:

dfpandas.DataFrame: Dataframe containing the original data and new classifications