w4h.classify module¶
The Classify module contains functions for defining geological intervals into a preset subset of geologic interpretations.
- w4h.classify.depth_define(df, top_col='TOP', thresh=550.0, parallel_processing=False, verbose=False, log=False)[source]¶
Function to define all intervals lower than thresh as bedrock
- Parameters:
- dfpandas.DataFrame
Dataframe to classify
- top_colstr, default = ‘TOP’
Name of column that contains the depth information, likely of the top of the well interval, by default ‘TOP’
- threshfloat, default = 550.0
Depth (in units used in df[‘top_col’]) below which all intervals will be classified as bedrock, by default 550.0.
- verbosebool, default = False
Whether to print results, by default False
- logbool, default = True
Whether to log results to log file
- Returns:
- dfpandas.DataFrame
Dataframe containing intervals classified as bedrock due to depth
- w4h.classify.export_undefined(df, outdir)[source]¶
Function to export terms that still need to be defined.
- Parameters:
- dfpandas.DataFrame
Dataframe containing at least some unclassified data
- outdirstr or pathlib.Path
Directory to save file. Filename will be generated automatically based on today’s date.
- Returns:
- stillNeededDFpandas.DataFrame
Dataframe containing only unclassified terms, and the number of times they occur
- w4h.classify.fill_unclassified(df, classification_col='CLASS_FLAG')[source]¶
Fills unclassified rows in ‘CLASS_FLAG’ column with np.nan
- Parameters:
- dfpandas.DataFrame
Dataframe on which to perform operation
- Returns:
- dfpandas.DataFrame
Dataframe on which operation has been performed
- w4h.classify.get_unique_wells(df, wellid_col='API_NUMBER', verbose=False, log=False)[source]¶
Gets unique wells as a dataframe based on a given column name.
- Parameters:
- dfpandas.DataFrame
Dataframe containing all wells and/or well intervals of interest
- wellid_colstr, default=”API_NUMBER”
Name of column in df containing a unique identifier for each well, by default ‘API_NUMBER’. .unique() will be run on this column to get the unique values.
- logbool, default = False
Whether to log results to log file
- Returns:
- wellsDF
DataFrame containing only the unique well IDs
- w4h.classify.merge_lithologies(well_data_df, targinterps_df, interp_col='INTERPRETATION', target_col='TARGET', target_class='bool')[source]¶
Function to merge lithologies and target booleans based on classifications
- Parameters:
- well_data_dfpandas.DataFrame
Dataframe containing classified well data
- targinterps_dfpandas.DataFrame
Dataframe containing lithologies and their target interpretations, depending on what the target is for this analysis (often, coarse materials=1, fine=0)
- target_colstr, default = ‘TARGET’
Name of column in targinterps_df containing the target interpretations
- target_class, default = ‘bool’
Whether the input column is using boolean values as its target indicator
- Returns:
- df_targpandas.DataFrame
Dataframe containing merged lithologies/targets
- w4h.classify.remerge_data(classifieddf, searchdf, parallel_processing=False)[source]¶
Function to merge newly-classified (or not) and previously classified data
- Parameters:
- classifieddfpandas.DataFrame
Dataframe that had already been classified previously
- searchdfpandas.DataFrame
Dataframe with new classifications
- Returns:
- remergeDFpandas.DataFrame
Dataframe containing all the data, merged back together
- w4h.classify.sort_dataframe(df, sort_cols=['API_NUMBER', 'TOP'], remove_nans=True)[source]¶
Function to sort dataframe by one or more columns.
- Parameters:
- dfpandas.DataFrame
Dataframe to be sorted
- sort_colsstr or list of str, default = [‘API_NUMBER’,’TOP’]
Name(s) of columns by which to sort dataframe, by default [‘API_NUMBER’,’TOP’]
- remove_nansbool, default = True
Whether or not to remove nans in the process, by default True
- Returns:
- df_sortedpandas.DataFrame
Sorted dataframe
- w4h.classify.specific_define(df, terms_df, description_col='FORMATION', terms_col='DESCRIPTION', parallel_processing=False, verbose=False, log=False)[source]¶
Function to classify terms that have been specifically defined in the terms_df.
- Parameters:
- dfpandas.DataFrame
Input dataframe with unclassified well descriptions.
- terms_dfpandas.DataFrame
Dataframe containing the classifications
- description_colstr, default=’FORMATION’
Column name in df containing the well descriptions, by default ‘FORMATION’.
- terms_colstr, default=’DESCRIPTION’
Column name in terms_df containing the classified descriptions, by default ‘DESCRIPTION’.
- verbosebool, default=False
Whether to print up results, by default False.
- Returns:
- df_Interpspandas.DataFrame
Dataframe containing the well descriptions and their matched classifications.
- w4h.classify.split_defined(df, classification_col='CLASS_FLAG', verbose=False, log=False)[source]¶
Function to split dataframe with well descriptions into two dataframes based on whether a row has been classified.
- Parameters:
- dfpandas.DataFrame
Dataframe containing all the well descriptions
- classification_colstr, default = ‘CLASS_FLAG’
Name of column containing the classification flag, by default ‘CLASS_FLAG’
- verbosebool, default = False
Whether to print results, by default False
- logbool, default = False
Whether to log results to log file
- Returns:
- Two-item tuple of pandas.Dataframe
tuple[0] is dataframe containing classified data, tuple[1] is dataframe containing unclassified data.
- w4h.classify.start_define(df, terms_df, description_col='FORMATION', terms_col='DESCRIPTION', parallel_processing=False, verbose=False, log=False)[source]¶
Function to classify descriptions according to starting substring.
- Parameters:
- dfpandas.DataFrame
Dataframe containing all the well descriptions
- terms_dfpandas.DataFrame
Dataframe containing all the startswith substrings to use for searching
- description_colstr, default = ‘FORMATION’
Name of column in df containing descriptions, by default ‘FORMATION’
- terms_colstr, default = ‘FORMATION’
Name of column in terms_df containing startswith substring to match with description_col, by default ‘FORMATION’
- verbosebool, default = False
Whether to print out results, by default False
- logbool, default = True
Whether to log results to log file
- Returns:
- dfpandas.DataFrame
Dataframe containing the original data and new classifications
- w4h.classify.wildcard_define(df, terms_df, description_col='FORMATION', terms_col='DESCRIPTION', verbose=False, log=False)[source]¶
Function to classify descriptions according to any substring.
- Parameters:
- dfpandas.DataFrame
Dataframe containing all the well descriptions
- terms_dfpandas.DataFrame
Dataframe containing all the startswith substrings to use for searching
- description_colstr, default = ‘FORMATION’
Name of column in df containing descriptions, by default ‘FORMATION’
- terms_colstr, default = ‘FORMATION’
Name of column in terms_df containing startswith substring to match with description_col, by default ‘FORMATION’
- verbosebool, default = False
Whether to print out results, by default False
- logbool, default = True
Whether to log results to log file
- Returns:
- dfpandas.DataFrame
Dataframe containing the original data and new classifications