w4h package¶
This is the wells4hydrogeology package.
It contains the functions needed to convert raw well descriptions into usable (hydro)geologic data.
- w4h.add_control_points(df_without_control, df_control=None, xcol='LONGITUDE', ycol='LATITUDE', zcol='ELEV_FT', controlpoints_crs='EPSG:4269', output_crs='EPSG:5070', description_col='FORMATION', interp_col='INTERPRETATION', target_col='TARGET', verbose=False, log=False, **kwargs)[source]¶
Function to add control points, primarily to aid in interpolation. This may be useful when conditions are known but do not exist in input well database
- Parameters:
- df_without_controlpandas.DataFrame
Dataframe with current working data
- df_controlstr, pathlib.Purepath, or pandas.DataFrame
Pandas dataframe with control points
- well_keystr, optional
The column containing the “key” (unique identifier) for each well, by default ‘API_NUMBER’
- xcolstr, optional
The column in df_control containing the x coordinates for each control point, by default ‘LONGITUDE’
- ycolstr, optional
The column in df_control containing the y coordinates for each control point, by default ‘LATITUDE’
- zcolstr, optional
The column in df_control containing the z coordinates for each control point, by default ‘ELEV_FT’
- controlpoints_crsstr, optional
The column in df_control containing the crs of points, by default ‘EPSG:4269’
- output_crsstr, optional
The output coordinate system, by default ‘EPSG:5070’
- description_colstr, optional
The column in df_control with the description (if this is used), by default ‘FORMATION’
- interp_colstr, optional
The column in df_control with the interpretation (if this is used), by default ‘INTERPRETATION’
- target_colstr, optional
The column in df_control with the target code (if this is used), by default ‘TARGET’
- verbosebool, optional
Whether to print information to terminal, by default False
- logbool, optional
Whether to log information in log file, by default False
- **kwargs
Keyword arguments of pandas.concat() or pandas.read_csv that will be passed to that function, except for objs, which are df and df_control
- Returns:
- pandas.DataFrame
Pandas DataFrame with original data and control points formatted the same way and concatenated together
- w4h.align_rasters(grids_unaligned=None, model_grid=None, no_data_val_grid=0, verbose=False, log=False)[source]¶
Reprojects two rasters and aligns their pixels
- Parameters:
- grids_unalignedlist or xarray.DataArray
Contains a list of grids or one unaligned grid
- model_gridxarray.DataArray
Contains model grid
- no_data_val_gridint, default=0
Sets value of no data pixels
- logbool, default = False
Whether to log results to log file, by default False
- Returns:
- alignedGridslist or xarray.DataArray
Contains aligned grids
- w4h.clip_gdf2study_area(study_area, gdf, log=False, verbose=False)[source]¶
Clips dataframe to only include things within study area.
- Parameters:
- study_areageopandas.GeoDataFrame
Inputs study area polygon
- gdfgeopandas.GeoDataFrame
Inputs point data
- logbool, default = False
Whether to log results to log file, by default False
- Returns:
- gdfClipgeopandas.GeoDataFrame
Contains only points within the study area
- w4h.combine_dataset(layer_dataset, surface_elev, bedrock_elev, layer_thick, log=False)[source]¶
Function to combine xarray datasets or datarrays into a single xr.Dataset. Useful to add surface, bedrock, layer thick, and layer datasets all into one variable, for pickling, for example.
- Parameters:
- layer_datasetxr.DataArray
DataArray contining all the interpolated layer information.
- surface_elevxr.DataArray
DataArray containing surface elevation data
- bedrock_elevxr.DataArray
DataArray containing bedrock elevation data
- layer_thickxr.DataArray
DataArray containing the layer thickness at each point in the model grid
- logbool, default = False
Whether to log inputs and outputs to log file.
- Returns:
- xr.Dataset
Dataset with all input arrays set to different variables within the dataset.
- w4h.coords2geometry(df_no_geometry, xcol='LONGITUDE', ycol='LATITUDE', zcol='ELEV_FT', input_coords_crs='EPSG:4269', output_crs='EPSG:5070', use_z=False, wkt_col='WKT', geometry_source='coords', verbose=False, log=False)[source]¶
Adds geometry to points with xy coordinates in the specified coordinate reference system.
- Parameters:
- df_no_geometrypandas.Dataframe
a Pandas dataframe containing points
- xcolstr, default=’LONGITUDE’
Name of column holding x coordinate data in df_no_geometry
- ycolstr, default=’LATITUDE’
Name of column holding y coordinate data in df_no_geometry
- zcolstr, default=’ELEV_FT’
Name of column holding z coordinate data in df_no_geometry
- input_coords_crsstr, default=’EPSG:4269’
Name of crs used for geometry
- use_zbool, default=False
Whether to use z column in calculation
- geometry_sourcestr {‘coords’, ‘wkt’, ‘geometry’}
- logbool, default = False
Whether to log results to log file, by default False
- Returns:
- gdfgeopandas.GeoDataFrame
Geopandas dataframe with points and their geometry values
- w4h.define_dtypes(undefined_df, datatypes=None, verbose=False, log=False)[source]¶
Function to define datatypes of a dataframe, especially with file-indicated dyptes
- Parameters:
- undefined_dfpd.DataFrame
Pandas dataframe with columns whose datatypes need to be (re)defined
- datatypesdict, str, pathlib.PurePath() object, or None, default = None
Dictionary containing datatypes, to be used in pandas.DataFrame.astype() function. If None, will read from file indicated by dtype_file (which must be defined, along with dtype_dir), by default None
- logbool, default = False
Whether to log inputs and outputs to log file.
- Returns:
- dfoutpandas.DataFrame
Pandas dataframe containing redefined columns
- w4h.depth_define(df, top_col='TOP', thresh=550.0, parallel_processing=False, verbose=False, log=False)[source]¶
Function to define all intervals lower than thresh as bedrock
- Parameters:
- dfpandas.DataFrame
Dataframe to classify
- top_colstr, default = ‘TOP’
Name of column that contains the depth information, likely of the top of the well interval, by default ‘TOP’
- threshfloat, default = 550.0
Depth (in units used in df[‘top_col’]) below which all intervals will be classified as bedrock, by default 550.0.
- verbosebool, default = False
Whether to print results, by default False
- logbool, default = True
Whether to log results to log file
- Returns:
- dfpandas.DataFrame
Dataframe containing intervals classified as bedrock due to depth
- w4h.export_dataframe(df, out_dir, filename, date_stamp=True, log=False)[source]¶
Function to export dataframes
- Parameters:
- dfpandas dataframe, or list of pandas dataframes
Data frame or list of dataframes to be exported
- out_dirstring or pathlib.Path object
Directory to which to export dataframe object(s) as .csv
- filenamestr or list of strings
Filename(s) of output files
- date_stampbool, default=True
Whether to include a datestamp in the filename. If true, file ends with _yyyy-mm-dd.csv of current date, by default True.
- logbool, default = True
Whether to log inputs and outputs to log file.
- w4h.export_grids(grid_data, out_path, file_id='', filetype='tif', variable_sep=True, date_stamp=True, verbose=False, log=False)[source]¶
Function to export grids to files.
- Parameters:
- grid_dataxarray DataArray or xarray Dataset
Dataset or dataarray to be exported
- out_pathstr or pathlib.Path object
Output location for data export. If variable_sep=True, this should be a directory. Otherwise, this should also include the filename. The file extension should not be included here.
- file_idstr, optional
If specified, will add this after ‘LayerXX’ or ‘AllLayers’ in the filename, just before datestamp, if used. Example filename for file_id=’Coarse’: Layer1_Coarse_2023-04-18.tif.
- filetypestr, optional
Output filetype. Can either be pickle or any file extension supported by rioxarray.rio.to_raster(). Can either include period or not., by default ‘tif’
- variable_sepbool, optional
If grid_data is an xarray Dataset, this will export each variable in the dataset as a separate file, including the variable name in the filename, by default False
- date_stampbool, optional
Whether to include a date stamp in the file name., by default True
- logbool, default = True
Whether to log inputs and outputs to log file.
- w4h.export_undefined(df, outdir)[source]¶
Function to export terms that still need to be defined.
- Parameters:
- dfpandas.DataFrame
Dataframe containing at least some unclassified data
- outdirstr or pathlib.Path
Directory to save file. Filename will be generated automatically based on today’s date.
- Returns:
- stillNeededDFpandas.DataFrame
Dataframe containing only unclassified terms, and the number of times they occur
- w4h.file_setup(well_data, metadata=None, data_filename='*ISGS_DOWNHOLE_DATA*.txt', metadata_filename='*ISGS_HEADER*.txt', log_dir=None, verbose=False, log=False)[source]¶
Function to setup files, assuming data, metadata, and elevation/location are in separate files (there should be one “key”/identifying column consistent across all files to join/merge them later)
This function may not be useful if files are organized differently than this structure. If that is the case, it is recommended to use the get_most_recent() function for each individual file if needed. It may also be of use to simply skip this function altogether and directly define each filepath in a manner that can be used by pandas.read_csv()
- Parameters:
- well_datastr or pathlib.Path object
Str or pathlib.Path to directory containing input files, by default str(repoDir)+’/resources’
- metadatastr or pathlib.Path object, optional
Str or pathlib.Path to directory containing input metadata files, by default str(repoDir)+’/resources’
- data_filenamestr, optional
Pattern used by pathlib.glob() to get the most recent data file, by default ‘ISGS_DOWNHOLE_DATA.txt’
- metadata_filenamestr, optional
Pattern used by pathlib.glob() to get the most recent metadata file, by default ‘ISGS_HEADER.txt’
- log_dirstr or pathlib.PurePath() or None, default=None
Directory to place log file in. This is not read directly, but is used indirectly by w4h.logger_function()
- verbosebool, default = False
Whether to print name of files to terminal, by default True
- logbool, default = True
Whether to log inputs and outputs to log file.
- Returns:
- tuple
Tuple with paths to (well_data, metadata)
- w4h.fill_unclassified(df, classification_col='CLASS_FLAG')[source]¶
Fills unclassified rows in ‘CLASS_FLAG’ column with np.nan
- Parameters:
- dfpandas.DataFrame
Dataframe on which to perform operation
- Returns:
- dfpandas.DataFrame
Dataframe on which operation has been performed
- w4h.get_current_date()[source]¶
Gets the current date to help with finding the most recent file¶
- Parameters:
None
dateSuffix : str to use for naming output files
- w4h.get_drift_thick(surface_elev=None, bedrock_elev=None, layers=9, plot=False, verbose=False, log=False)[source]¶
Finds the distance from surface_elev to bedrock_elev and then divides by number of layers to get layer thickness.
- Parameters:
- surface_elevrioxarray.DataArray
array holding surface elevation
- bedrock_elevrioxarray.DataArray
array holding bedrock elevation
- layersint, default=9
number of layers needed to calculate thickness for
- plotbool, default=False
tells function to either plot the data or not
- Returns:
- driftThickrioxarray.DataArray
Contains data array containing depth to bedrock at each point
- layerThickrioxarray.DataArray
Contains data array with layer thickness at each point
- w4h.get_layer_depths(df_with_depths, surface_elev_col='SURFACE_ELEV', layer_thick_col='LAYER_THICK', layers=9, log=False)[source]¶
Function to calculate depths and elevations of each model layer at each well based on surface elevation, bedrock elevation, and number of layers/layer thickness
- Parameters:
- df_with_depthspandas.DataFrame
Dataframe containing well metdata
- layersint, default=9
Number of layers. This should correlate with get_drift_thick() input parameter, if drift thickness was calculated using that function, by default 9.
- logbool, default = False
Whether to log inputs and outputs to log file.
- Returns:
- pandas.Dataframe
Dataframe containing new columns for depth to layers and elevation of layers.
- w4h.get_most_recent(dir=WindowsPath('c:/Users/balikian/LocalData/CodesScripts/Github/wells4hydrogeology/w4h/resources'), glob_pattern='*', verbose=False)[source]¶
Function to find the most recent file with the indicated pattern, using pathlib.glob function.
- Parameters:
- dirstr or pathlib.Path object, optional
Directory in which to find the most recent file, by default str(repoDir)+’/resources’
- glob_patternstr, optional
String used by the pathlib.glob() function/method for searching, by default ‘*’
- Returns:
- pathlib.Path object
Pathlib Path object of the most recent file fitting the glob pattern indicated in the glob_pattern parameter.
- w4h.get_resources(resource_type='filepaths', scope='local', verbose=False)[source]¶
Function to get filepaths for resources included with package
- Parameters:
- resource_typestr, {‘filepaths’, ‘data’}
If filepaths, will return dictionary with filepaths to sample data. If data, returns dictionary with data objects.
- scopestr, {‘local’, ‘statewide’}
If ‘local’, will read in sample data for a local (around county sized) project. If ‘state’, will read in sample data for a statewide project (Illinois)
- verbosebool, optional
Whether to print results to terminal, by default False
- Returns:
- resources_dictdict
Dictionary containing key, value pairs with filepaths to resources that may be of interest.
- w4h.get_search_terms(spec_path='C:\\Users\\balikian\\LocalData\\CodesScripts\\Github\\wells4hydrogeology\\docs/resources/', spec_glob_pattern='*SearchTerms-Specific*', start_path=None, start_glob_pattern='*SearchTerms-Start*', wildcard_path=None, wildcard_glob_pattern='*SearchTerms-Wildcard', verbose=False, log=False)[source]¶
Read in dictionary files for downhole data
- Parameters:
- spec_pathstr or pathlib.Path, optional
Directory where the file containing the specific search terms is located, by default str(repoDir)+’/resources/’
- spec_glob_patternstr, optional
Search string used by pathlib.glob() to find the most recent file of interest, uses get_most_recent() function, by default ‘SearchTerms-Specific’
- start_pathstr or None, optional
Directory where the file containing the start search terms is located, by default None
- start_glob_patternstr, optional
Search string used by pathlib.glob() to find the most recent file of interest, uses get_most_recent() function, by default ‘SearchTerms-Start’
- wildcard_pathstr or pathlib.Path, default = None
Directory where the file containing the wildcard search terms is located, by default None
- wildcard_glob_patternstr, default = ‘*SearchTerms-Wildcard’
Search string used by pathlib.glob() to find the most recent file of interest, uses get_most_recent() function, by default ‘SearchTerms-Wildcard’
- logbool, default = True
Whether to log inputs and outputs to log file.
- Returns:
- (specTermsPath, startTermsPath, wilcardTermsPath)tuple
Tuple containing the pandas dataframes with specific search terms, with start search terms, and with wildcard search terms
- w4h.get_unique_wells(df, wellid_col='API_NUMBER', verbose=False, log=False)[source]¶
Gets unique wells as a dataframe based on a given column name.
- Parameters:
- dfpandas.DataFrame
Dataframe containing all wells and/or well intervals of interest
- wellid_colstr, default=”API_NUMBER”
Name of column in df containing a unique identifier for each well, by default ‘API_NUMBER’. .unique() will be run on this column to get the unique values.
- logbool, default = False
Whether to log results to log file
- Returns:
- wellsDF
DataFrame containing only the unique well IDs
- w4h.grid2study_area(study_area, grid, output_crs='EPSG:5070', verbose=False, log=False)[source]¶
Clips grid to study area.
- Parameters:
- study_areageopandas.GeoDataFrame
inputs study area polygon
- gridxarray.DataArray
inputs grid array
- output_crsstr, default=’EPSG:5070’
inputs the coordinate reference system for the study area
- logbool, default = False
Whether to log results to log file, by default False
- Returns:
- gridxarray.DataArray
returns xarray containing grid clipped only to area within study area
- w4h.layer_interp(points, model_grid, layers=None, interp_kind='nearest', surface_grid=None, bedrock_grid=None, layer_thick_grid=None, drift_thick_grid=None, return_type='dataset', export_dir=None, target_col='TARG_THICK_PER', layer_col='LAYER', xcol=None, ycol=None, xcoord='x', ycoord='y', log=False, verbose=False, **kwargs)[source]¶
Function to interpolate results, going from points to model_grid data. Uses scipy.interpolate module.
- Parameters:
- pointslist
List containing pandas dataframes or geopandas geoadataframes containing the point data. Should be resDF_list output from layer_target_thick().
- model_gridxr.DataArray or xr.Dataset
Xarray DataArray or DataSet with the coordinates/spatial reference of the output model_grid to interpolate to
- layersint, default=None
Number of layers for interpolation. If None, uses the length ofthe points list to determine number of layers. By default None.
- interp_kindstr, {‘nearest’, ‘interp2d’,’linear’, ‘cloughtocher’, ‘radial basis function’}
Type of interpolation to use. See scipy.interpolate N-D scattered. Values can be any of the following (also shown in “kind” column of N-D scattered section of table here: https://docs.scipy.org/doc/scipy/tutorial/interpolate.html). By default ‘nearest’
- return_typestr, {‘dataset’, ‘dataarray’}
Type of xarray object to return, either xr.DataArray or xr.Dataset, by default ‘dataset.’
- export_dirstr or pathlib.Path, default=None
Export directory for interpolated grids, using w4h.export_grids(). If None, does not export, by default None.
- target_colstr, default = ‘TARG_THICK_PER’
Name of column in points containing data to be interpolated, by default ‘TARG_THICK_PER’.
- layer_colstr, default = ‘Layer’
Name of column containing layer number. Not currently used, by default ‘LAYER’
- xcolstr, default = ‘None’
Name of column containing x coordinates. If None, will look for ‘geometry’ column, as in a geopandas.GeoDataframe. By default None
- ycolstr, default = ‘None’
Name of column containing y coordinates. If None, will look for ‘geometry’ column, as in a geopandas.GeoDataframe. By default None
- xcoordstr, default=’x’
Name of x coordinate in model_grid, used to extract x values of model_grid, by default ‘x’
- ycoordstr, default=’y’
Name of y coordinate in model_grid, used to extract x values of model_grid, by default ‘y’
- logbool, default = True
Whether to log inputs and outputs to log file.
- **kwargs
Keyword arguments to be read directly into whichever scipy.interpolate function is designated by the interp_kind parameter.
- Returns:
- interp_dataxr.DataArray or xr.Dataset, depending on return_type
By default, returns an xr.DataArray object with the layers added as a new dimension called Layer. Can also specify return_type=’dataset’ to return an xr.Dataset with each layer as a separate variable.
- w4h.layer_target_thick(df, layers=9, return_all=False, export_dir=None, outfile_prefix=None, depth_top_col='TOP', depth_bot_col='BOTTOM', log=False)[source]¶
Function to calculate thickness of target material in each layer at each well point
- Parameters:
- dfgeopandas.geodataframe
Geodataframe containing classified data, surface elevation, bedrock elevation, layer depths, geometry.
- layersint, default=9
Number of layers in model, by default 9
- return_allbool, default=False
If True, return list of original geodataframes with extra column added for target thick for each layer. If False, return list of geopandas.geodataframes with only essential information for each layer.
- export_dirstr or pathlib.Path, default=None
If str or pathlib.Path, should be directory to which to export dataframes built in function.
- outfile_prefixstr, default=None
Only used if export_dir is set. Will be used at the start of the exported filenames
- depth_top_colstr, default=’TOP’
Name of column containing data for depth to top of described well intervals
- depth_bot_colstr, default=’BOTTOM’
Name of column containing data for depth to bottom of described well intervals
- logbool, default = True
Whether to log inputs and outputs to log file.
- Returns:
- res_df or resgeopandas.geodataframe
Geopandas geodataframe containing only important information needed for next stage of analysis.
- w4h.logger_function(logtocommence, parameters, func_name)[source]¶
Function to log other functions, to be called from within other functions
- Parameters:
- logtocommencebool
Whether to perform logging steps
- parametersdict
Dictionary containing parameters and their values, from function
- func_namestr
Name of function within which this is called
- w4h.merge_lithologies(well_data_df, targinterps_df, interp_col='INTERPRETATION', target_col='TARGET', target_class='bool')[source]¶
Function to merge lithologies and target booleans based on classifications
- Parameters:
- well_data_dfpandas.DataFrame
Dataframe containing classified well data
- targinterps_dfpandas.DataFrame
Dataframe containing lithologies and their target interpretations, depending on what the target is for this analysis (often, coarse materials=1, fine=0)
- target_colstr, default = ‘TARGET’
Name of column in targinterps_df containing the target interpretations
- target_class, default = ‘bool’
Whether the input column is using boolean values as its target indicator
- Returns:
- df_targpandas.DataFrame
Dataframe containing merged lithologies/targets
- w4h.merge_metadata(data_df, header_df, data_cols=None, header_cols=None, auto_pick_cols=False, drop_duplicate_cols=True, log=False, verbose=False, **kwargs)[source]¶
Function to merge tables, intended for merging metadata table with data table
- Parameters:
- data_dfpandas.DataFrame
“Left” dataframe, intended for this purpose to be dataframe with main data, but can be anything
- header_dfpandas.DataFrame
“Right” dataframe, intended for this purpose to be dataframe with metadata, but can be anything
- data_colslist, optional
List of strings of column names, for columns to be included after join from “left” table (data table). If None, all columns are kept, by default None
- header_colslist, optional
List of strings of columns names, for columns to be included in merged table after merge from “right” table (metadata). If None, all columns are kept, by default None
- auto_pick_colsbool, default = False
Whether to autopick the columns from the metadata table. If True, the following column names are kept:[‘API_NUMBER’, ‘LATITUDE’, ‘LONGITUDE’, ‘BEDROCK_ELEV’, ‘SURFACE_ELEV’, ‘BEDROCK_DEPTH’, ‘LAYER_THICK’], by default False
- drop_duplicate_colsbool, optional
If True, drops duplicate columns from the tables so that columns do not get renamed upon merge, by default True
- logbool, default = False
Whether to log inputs and outputs to log file.
- **kwargs
kwargs that are passed directly to pd.merge(). By default, the ‘on’ and ‘how’ parameters are defined as on=’API_NUMBER’ and how=’inner’
- Returns:
- mergedTablepandas.DataFrame
Merged dataframe
- w4h.plot_cross_section(dataset, profile=None, profile_direction=None, xcoord='x', ycoord='y', mapped_variable='Depth_to_Bedrock', cross_section_variable='Model_Layers', surface_elevation_variable='Surface_Elevation', bedrock_elevation_variable='Bedrock_Elevation', layer_elevation_coordinate='layer_elevs', show_layers=True, return_profile_dicts=False, elev_unit='feet', convert_elevation_to=None, title=None, verbose=False, **kwargs)[source]¶
Function to plot cross section profiles for datasets with properly configured coordinates and variables. This is intended to work “out of the box” with the xarray.Datasets output from w4h.run()
- Parameters:
- datasetxarray.Dataset
The xarray.Dataset with the proper data variables. Works “out of the box” with outputs from w4h.run().
- profileNone, shapely.Linestring, list of coordinates, or geopandas.GeoDataFrame, optional
The profile(s) for which to create the cross sections. If None, by default creates one X profile and one Y profile in the middle of each dimension, by default None
- profile_directionlist of str, optional
List of strings (list is same length as profile) indicating the direction to use for the profile. If None, will be [‘WE’, ‘SN’] to fit with profile=None defaults, by default None
- xcoordstr, optional
Name of x coordinate, by default ‘x’
- ycoordstr, optional
Name of y coordinate, by default ‘y’
- mapped_variablestr, optional
Name of variable to show in main map, by default ‘Depth_to_Bedrock’
- cross_section_variablestr, optional
Name of variable to use for cross section profiles, by default ‘Model_Layers’
- surface_elevation_variablestr, optional
Variable to use for the surface elevation, by default ‘Surface_Elevation’
- bedrock_elevation_variablestr, optional
Variable to use for the bedrock elevation, by default ‘Bedrock_Elevation’
- layer_elevation_coordinatestr, optional
Coordinate name to use for the layer elevations. This should be a non-indexed coordinate with the shape of the x, y, and layer coordinates, by default ‘layer_elevs’
- show_layersbool, optional
Whether to plot the layer boundaries on the cross section, by default True
- return_profile_dictsbool, optional
Whether to return the profile dictionaries, rather than the matplotlib.Figure, by default False
- elev_unitstr, optional
Unit of elevation for the elevation data, by default ‘feet’
- convert_elevation_tostr, optional
If None (default), does not convert elevation. Otherwise, will convert elevation to specified unit. Only conversion between ‘ft’ and ‘meters’ supported.
- titlestr, optional
Title to use for the output figure. If None, will be derived from variable names, by default None
- verbosebool, optional
Whether to print information about process to terminal, by default False
- Returns:
- matplotlib.Figure
Matplotlib.Figure instance is returned, unles return_profile_dicts is True. If return_profile_dicts=True, then a list of dicts with information about the profiles is returned.
- w4h.read_dict(file, keytype='np')[source]¶
Function to read a text file with a dictionary in it into a python dictionary
- Parameters:
- filestr or pathlib.Path object
Filepath to the file of interest containing the dictionary text
- keytypestr, optional
String indicating the datatypes used in the text, currently only ‘np’ is implemented, by default ‘np’
- Returns:
- dict
Dictionary translated from text file.
- w4h.read_dictionary_terms(dict_file=None, id_col='ID', search_col='DESCRIPTION', definition_col='LITHOLOGY', class_flag_col='CLASS_FLAG', dictionary_type=None, class_flag=6, rem_extra_cols=True, verbose=False, log=False)[source]¶
Function to read dictionary terms from file into pandas dataframe
- Parameters:
- dict_filestr or pathlib.Path object, or list of these
File or list of files to be read
- search_colstr, default = ‘DESCRIPTION’
Name of column containing search terms (geologic formations)
- definition_colstr, default = ‘LITHOLOGY’
Name of column containing interpretations of search terms (lithologies)
- dictionary_typestr or None, {None, ‘exact’, ‘start’, ‘wildcard’,}
- Indicator of which kind of dictionary terms to be read in: None, ‘exact’, ‘start’, or ‘wildcard’ by default None.
If None, uses name of file to try to determine. If it cannot, it will default to using the classification flag from class_flag
If ‘exact’, will be used to search for exact matches to geologic descriptions
If ‘start’, will be used as with the .startswith() string method to find inexact matches to geologic descriptions
If ‘wildcard’, will be used to find any matching substring for inexact geologic matches
- class_flagint, default = 1
Classification flag to be used if dictionary_type is None and cannot be otherwise determined, by default 1
- rem_extra_colsbool, default = True
Whether to remove the extra columns from the input file after it is read in as a pandas dataframe, by default True
- logbool, default = False
Whether to log inputs and outputs to log file.
- Returns:
- dict_termspandas.DataFrame
Pandas dataframe with formatting ready to be used in the classification steps of this package
- w4h.read_grid(grid_path=None, grid_type='model', no_data_val_grid=0, use_service=False, study_area=None, grid_crs=None, output_crs='EPSG:5070', verbose=False, log=False, **kwargs)[source]¶
Reads in grid
- Parameters:
- grid_pathstr or pathlib.Path, default=None
Path to a grid file
- grid_typestr, default=’model’
Sets what type of grid to load in
- no_data_val_gridint, default=0
Sets the no data value of the grid
- use_servicestr, default=False
Sets which service the function uses
- study_areageopandas.GeoDataFrame, default=None
Dataframe containing study area polygon
- grid_crsstr, default=None
Sets crs to use if clipping to study area
- logbool, default = False
Whether to log results to log file, by default False
- Returns:
- gridINxarray.DataArray
Returns grid
- w4h.read_lithologies(lith_file=None, interp_col='LITHOLOGY', target_col='CODE', use_cols=None, verbose=False, log=False)[source]¶
Function to read lithology file into pandas dataframe
- Parameters:
- lith_filestr or pathlib.Path object, default = None
Filename of lithology file. If None, default is contained within repository, by default None
- interp_colstr, default = ‘LITHOLOGY’
Column to used to match interpretations
- target_colstr, default = ‘CODE’
Column to be used as target code
- use_colslist, default = None
Which columns to use when reading in dataframe. If None, defaults to [‘LITHOLOGY’, ‘CODE’].
- logbool, default = True
Whether to log inputs and outputs to log file.
- Returns:
- pandas.DataFrame
Pandas dataframe with lithology information
- w4h.read_model_grid(model_grid_path, study_area=None, no_data_val_grid=0, read_grid=True, node_byspace=True, grid_crs=None, output_crs='EPSG:5070', verbose=False, log=False)[source]¶
Reads in model grid to xarray data array
- Parameters:
- grid_pathstr
Path to model grid file
- study_areageopandas.GeoDataFrame, default=None
Dataframe containing study area polygon
- no_data_val_gridint, default=0
value assigned to areas with no data
- readGridbool, default=True
Whether function to either read grid or create grid
- node_byspacebool, default=False
Denotes how to create grid
- output_crsstr, default=’EPSG:5070’
Inputs study area crs
- grid_crsstr, default=None
Inputs grid crs
- logbool, default = False
Whether to log results to log file, by default False
- Returns:
- modelGridxarray.DataArray
Data array containing model grid
- w4h.read_raw_csv(data_filepath, metadata_filepath, data_cols=None, metadata_cols=None, xcol='LONGITUDE', ycol='LATITUDE', well_key='API_NUMBER', encoding='latin-1', verbose=False, log=False, **read_csv_kwargs)[source]¶
Easy function to read raw .txt files output from (for example), an Access database
- Parameters:
- data_filepathstr
Filename of the file containing data, including the extension.
- metadata_filepathstr
Filename of the file containing metadata, including the extension.
- data_colslist, default = None
List with strings with names of columns from txt file to keep after reading. If None, [“API_NUMBER”,”TABLE_NAME”,”FORMATION”,”THICKNESS”,”TOP”,”BOTTOM”], by default None.
- metadata_colslist, default = None
List with strings with names of columns from txt file to keep after reading. If None, [‘API_NUMBER’,”TOTAL_DEPTH”,”SECTION”,”TWP”,”TDIR”,”RNG”,”RDIR”,”MERIDIAN”,”QUARTERS”,”ELEVATION”,”ELEVREF”,”COUNTY_CODE”,”LATITUDE”,”LONGITUDE”,”ELEVSOURCE”], by default None
- x_colstr, default = ‘LONGITUDE’
Name of column in metadata file indicating the x-location of the well, by default ‘LONGITUDE’
- ycolstr, default = ‘LATITUDE’
Name of the column in metadata file indicating the y-location of the well, by default ‘LATITUDE’
- well_keystr, default = ‘API_NUMBER’
Name of the column with the key/identifier that will be used to merge data later, by default ‘API_NUMBER’
- encodingstr, default = ‘latin-1’
Encoding of the data in the input files, by default ‘latin-1’
- verbosebool, default = False
Whether to print the number of rows in the input columns, by default False
- logbool, default = False
Whether to log inputs and outputs to log file.
- **read_csv_kwargs
**kwargs that get passed to pd.read_csv()
- Returns:
- (pandas.DataFrame, pandas.DataFrame/None)
Tuple/list with two pandas dataframes: (well_data, metadata) metadata is None if only well_data is used
- w4h.read_study_area(study_area=None, study_area_crs=None, output_crs='EPSG:5070', buffer=None, return_original=False, log=False, verbose=False, **read_file_kwargs)[source]¶
Read study area geospatial file into geopandas
- Parameters:
- study_areastr, pathlib.Path, geopandas.GeoDataFrame, or shapely.Geometry
Filepath to any geospatial file readable by geopandas. Polygon is best, but may work with other types if extent is correct. If shapely.Geometry, the crs should also be specified using a valid input to gpd.GeoDataFrame(crs=<crs>).
- study_area_crsstr, tuple, dict, optional
Not needed unless CRS must be read in manually (e.g, with a shapely.Geometry). CRS designation readable by geopandas/pyproj.
- output_crsstr, tuple, dict, optional
CRS to transform study_area to before returning. CRS designation should be readable by geopandas/pyproj. By default, ‘EPSG:5070’.
- bufferNone or numeric, default=None
If None, no buffer created. If a numeric value is given (float or int, for example), a buffer will be created at that distance in the unit of the study_area_crs.
- return_originalbool, default=False
Whether to return the (reprojected) study area as well as the (reprojected) buffered study area. Study area is only used for clipping data, so usually return_original=False is sufficient.
- logbool, default = False
Whether to log results to log file, by default False
- verbosebool, default=False
Whether to print status and results to terminal
- Returns:
- studyAreaINgeopandas dataframe
Geopandas dataframe with polygon geometry.
- w4h.read_wcs(study_area, wcs_url='https://data.isgs.illinois.edu/arcgis/services/Elevation/IL_Statewide_Lidar_DEM_WGS/ImageServer/WCSServer?request=GetCapabilities&service=WCS', res_x=30, res_y=30, verbose=False, log=False, **kwargs)[source]¶
Reads a WebCoverageService from a url and returns a rioxarray dataset containing it.
- Parameters:
- study_areageopandas.GeoDataFrame
Dataframe containing study area polygon
- wcs_urlstr, default=lidarURL
- Represents the url for the WCS
- res_xint, default=30
Sets resolution for x axis
- res_yint, default=30
Sets resolution for y axis
- logbool, default = False
Whether to log results to log file, by default False
- **kwargs
- Returns:
- wcsData_rxrxarray.DataArray
A xarray dataarray holding the image from the WebCoverageService
- w4h.read_wms(study_area, layer_name='IL_Statewide_Lidar_DEM_WGS:None', wms_url='https://data.isgs.illinois.edu/arcgis/services/Elevation/IL_Statewide_Lidar_DEM_WGS/ImageServer/WCSServer?request=GetCapabilities&service=WCS', srs='EPSG:3857', clip_to_studyarea=True, bbox=[-9889002.6155, 5134541.069716, -9737541.607038, 5239029.6274], res_x=30, res_y=30, size_x=512, size_y=512, format='image/tiff', verbose=False, log=False, **kwargs)[source]¶
Reads a WebMapService from a url and returns a rioxarray dataset containing it.
- Parameters:
- study_areageopandas.GeoDataFrame
Dataframe containg study area polygon
- layer_namestr, default=’IL_Statewide_Lidar_DEM_WGS:None’
Represents the layer name in the WMS
- wms_urlstr, default=lidarURL
Represents the url for the WMS
- srsstr, default=’EPSG:3857’
Sets the srs
- clip_to_studyareabool, default=True
Whether to clip to study area or not
- res_xint, default=30
Sets resolution for x axis
- res_yint, default=512
Sets resolution for y axis
- size_xint, default=512
Sets width of result
- size_yint, default=512
Sets height of result
- logbool, default = False
Whether to log results to log file, by default False
- Returns:
- wmsData_rxrxarray.DataArray
Holds the image from the WebMapService
- w4h.read_xyz(xyzpath, datatypes=None, verbose=False, log=False)[source]¶
Function to read file containing xyz data (elevation/location)
- Parameters:
- xyzpathstr or pathlib.Path
Filepath of the xyz file, including extension
- datatypesdict, default = None
Dictionary containing the datatypes for the columns int he xyz file. If None, {‘ID’:np.uint32,’API_NUMBER’:np.uint64,’LATITUDE’:np.float64,’LONGITUDE’:np.float64,’ELEV_FT’:np.float64}, by default None
- verbosebool, default = False
Whether to print the number of xyz records to the terminal, by default False
- logbool, default = False
Whether to log inputs and outputs to log file.
- Returns:
- pandas.DataFrame
Pandas dataframe containing the elevation and location data
- w4h.remerge_data(classifieddf, searchdf, parallel_processing=False)[source]¶
Function to merge newly-classified (or not) and previously classified data
- Parameters:
- classifieddfpandas.DataFrame
Dataframe that had already been classified previously
- searchdfpandas.DataFrame
Dataframe with new classifications
- Returns:
- remergeDFpandas.DataFrame
Dataframe containing all the data, merged back together
- w4h.remove_bad_depth(df_with_depth, top_col='TOP', bottom_col='BOTTOM', depth_type='depth', verbose=False, log=False)[source]¶
Function to remove all records in the dataframe with well interpretations where the depth information is bad (i.e., where the bottom of the record is neerer to the surface than the top)
- Parameters:
- df_with_depthpandas.DataFrame
Pandas dataframe containing the well records and descriptions for each interval
- top_colstr, default=’TOP’
The name of the column containing the depth or elevation for the top of the interval, by default ‘TOP’
- bottom_colstr, default=’BOTTOM’
The name of the column containing the depth or elevation for the bottom of each interval, by default ‘BOTTOM’
- depth_typestr, {‘depth’, ‘elevation’}
Whether the table is organized by depth or elevation. If depth, the top column will have smaller values than the bottom column. If elevation, the top column will have higher values than the bottom column, by default ‘depth’
- verbosebool, default = False
Whether to print results to the terminal, by default False
- logbool, default = False
Whether to log results to log file, by default False
- Returns:
- pandas.Dataframe
Pandas dataframe with the records remvoed where the top is indicatd to be below the bottom.
- w4h.remove_no_depth(df_with_depth, top_col='TOP', bottom_col='BOTTOM', no_data_val_table='', verbose=False, log=False)[source]¶
Function to remove well intervals with no depth information
- Parameters:
- df_with_depthpandas.DataFrame
Dataframe containing well descriptions
- top_colstr, optional
Name of column containing information on the top of the well intervals, by default ‘TOP’
- bottom_colstr, optional
Name of column containing information on the bottom of the well intervals, by default ‘BOTTOM’
- no_data_val_tableany, optional
No data value in the input data, used by this function to indicate that depth data is not there, to be replaced by np.nan, by default ‘’
- verbosebool, optional
Whether to print results to console, by default False
- logbool, default = False
Whether to log results to log file, by default False
- Returns:
- df_with_depthpandas.DataFrame
Dataframe with depths dropped
- w4h.remove_no_description(df_with_descriptions, description_col='FORMATION', no_data_val_table='', verbose=False, log=False)[source]¶
Function that removes all records in the dataframe containing the well descriptions where no description is given.
- Parameters:
- df_with_descriptionspandas.DataFrame
Pandas dataframe containing the well records with their individual descriptions
- description_colstr, optional
Name of the column containing the geologic description of each interval, by default ‘FORMATION’
- no_data_val_tablestr, optional
The value expected if the column is empty or there is no data. These will be replaced by np.nan before being removed, by default ‘’
- verbosebool, optional
Whether to print the results of this step to the terminal, by default False
- logbool, default = False
Whether to log results to log file, by default False
- Returns:
- pandas.DataFrame
Pandas dataframe with records with no description removed.
- w4h.remove_no_topo(df_with_topo, zcol='SURFACE_ELEV', no_data_val_table='', verbose=False, log=False)[source]¶
Function to remove wells that do not have topography data (needed for layer selection later).
This function is intended to be run on the metadata table after elevations have attempted to been added.
- Parameters:
- df_with_topopandas.DataFrame
Pandas dataframe containing elevation information.
- zcolstr
Name of elevation column
- no_data_val_tableany
Value in dataset that indicates no data is present (replaced with np.nan)
- verbosebool, optional
Whether to print outputs, by default True
- logbool, default = False
Whether to log results to log file, by default False
- Returns:
- pandas.DataFrame
Pandas dataframe with intervals with no topography removed.
- w4h.remove_nonlocated(df_with_locations, xcol='LONGITUDE', ycol='LATITUDE', no_data_val_table='', verbose=False, log=False)[source]¶
Function to remove wells and well intervals where there is no location information
- Parameters:
- df_with_locationspandas.DataFrame
Pandas dataframe containing well descriptions
- metadata_DFpandas.DataFrame
Pandas dataframe containing metadata, including well locations (e.g., Latitude/Longitude)
- logbool, default = False
Whether to log results to log file, by default False
- Returns:
- df_with_locationspandas.DataFrame
Pandas dataframe containing only data with location information
- w4h.run(well_data, surf_elev_grid, bedrock_elev_grid, model_grid=None, metadata=None, keep_all_cols=True, layers=9, description_col='FORMATION', top_col='TOP', bottom_col='BOTTOM', depth_type='depth', study_area=None, xcol='LONGITUDE', ycol='LATITUDE', zcol='SURFACE_ELEV', well_id_col='API_NUMBER', lith_dict=None, lith_dict_start=None, lith_dict_wildcard=None, target_dict=None, target_name='', include_elevation_grids=True, include_elevation_coordinates=True, export_dir=None, verbose=False, log=False, **kw_params)[source]¶
w4h.run() is a function that runs the intended workflow of the wells4hydrogeology (w4h) package. This means that it runs several constituent functions. The workflow that this follows is provided in the package wiki. It accepts the parameters of the constituent functions. To see a list of these functions and parameters, use help(w4h.run).
The following functions used in w4h.run() are listed below, along with their parameters and default values for those parameters. See the documentation for the each of the individual functions for more information on a specific parameter:
file_setup
well_data | default = ‘<no default>’
metadata | default = None
data_filename | default = ‘ISGS_DOWNHOLE_DATA.txt’
metadata_filename | default = ‘ISGS_HEADER.txt’
log_dir | default = None
verbose | default = False
log | default = False
read_raw_csv
data_filepath | default = ‘<output of previous function>’
metadata_filepath | default = ‘<output of previous function>’
data_cols | default = None
metadata_cols | default = None
xcol | default = ‘LONGITUDE’
ycol | default = ‘LATITUDE’
well_key | default = ‘API_NUMBER’
encoding | default = ‘latin-1’
verbose | default = False
log | default = False
read_csv_kwargs | default = {}
define_dtypes
undefined_df | default = ‘<output of previous function>’
datatypes | default = None
verbose | default = False
log | default = False
merge_metadata
data_df | default = ‘<output of previous function>’
header_df | default = ‘<output of previous function>’
data_cols | default = None
header_cols | default = None
auto_pick_cols | default = False
drop_duplicate_cols | default = True
log | default = False
verbose | default = False
kwargs | default = {}
coords2geometry
df_no_geometry | default = ‘<output of previous function>’
xcol | default = ‘LONGITUDE’
ycol | default = ‘LATITUDE’
zcol | default = ‘ELEV_FT’
input_coords_crs | default = ‘EPSG:4269’
output_crs | default = ‘EPSG:5070’
use_z | default = False
wkt_col | default = ‘WKT’
geometry_source | default = ‘coords’
verbose | default = False
log | default = False
read_study_area
study_area | default = None
study_area_crs | default = None
output_crs | default = ‘EPSG:5070’
buffer | default = None
return_original | default = False
log | default = False
verbose | default = False
read_file_kwargs | default = {}
clip_gdf2study_area
study_area | default = ‘<output of previous function>’
gdf | default = ‘<output of previous function>’
log | default = False
verbose | default = False
read_grid
grid_path | default = None
grid_type | default = ‘model’
no_data_val_grid | default = 0
use_service | default = False
study_area | default = None
grid_crs | default = None
output_crs | default = ‘EPSG:5070’
verbose | default = False
log | default = False
kwargs | default = {}
add_control_points
df_without_control | default = ‘<output of previous function>’
df_control | default = None
xcol | default = ‘LONGITUDE’
ycol | default = ‘LATITUDE’
zcol | default = ‘ELEV_FT’
controlpoints_crs | default = ‘EPSG:4269’
output_crs | default = ‘EPSG:5070’
description_col | default = ‘FORMATION’
interp_col | default = ‘INTERPRETATION’
target_col | default = ‘TARGET’
verbose | default = False
log | default = False
kwargs | default = {}
remove_nonlocated
df_with_locations | default = ‘<output of previous function>’
xcol | default = ‘LONGITUDE’
ycol | default = ‘LATITUDE’
no_data_val_table | default = ‘’
verbose | default = False
log | default = False
remove_no_topo
df_with_topo | default = ‘<output of previous function>’
zcol | default = ‘SURFACE_ELEV’
no_data_val_table | default = ‘’
verbose | default = False
log | default = False
remove_no_depth
df_with_depth | default = ‘<output of previous function>’
top_col | default = ‘TOP’
bottom_col | default = ‘BOTTOM’
no_data_val_table | default = ‘’
verbose | default = False
log | default = False
remove_bad_depth
df_with_depth | default = ‘<output of previous function>’
top_col | default = ‘TOP’
bottom_col | default = ‘BOTTOM’
depth_type | default = ‘depth’
verbose | default = False
log | default = False
remove_no_description
df_with_descriptions | default = ‘<output of previous function>’
description_col | default = ‘FORMATION’
no_data_val_table | default = ‘’
verbose | default = False
log | default = False
get_search_terms
spec_path | default = ‘C:UsersbalikianLocalDataCodesScriptsGithubwells4hydrogeologydocs/resources/’
spec_glob_pattern | default = ‘SearchTerms-Specific’
start_path | default = None
start_glob_pattern | default = ‘SearchTerms-Start’
wildcard_path | default = None
wildcard_glob_pattern | default = ‘*SearchTerms-Wildcard’
verbose | default = False
log | default = False
read_dictionary_terms
dict_file | default = None
id_col | default = ‘ID’
search_col | default = ‘DESCRIPTION’
definition_col | default = ‘LITHOLOGY’
class_flag_col | default = ‘CLASS_FLAG’
dictionary_type | default = None
class_flag | default = 6
rem_extra_cols | default = True
verbose | default = False
log | default = False
specific_define
df | default = ‘<output of previous function>’
terms_df | default = ‘<output of previous function>’
description_col | default = ‘FORMATION’
terms_col | default = ‘DESCRIPTION’
parallel_processing | default = False
verbose | default = False
log | default = False
split_defined
df | default = ‘<output of previous function>’
classification_col | default = ‘CLASS_FLAG’
verbose | default = False
log | default = False
start_define
df | default = ‘<output of previous function>’
terms_df | default = ‘<output of previous function>’
description_col | default = ‘FORMATION’
terms_col | default = ‘DESCRIPTION’
parallel_processing | default = False
verbose | default = False
log | default = False
wildcard_define
df | default = ‘<output of previous function>’
terms_df | default = ‘<output of previous function>’
description_col | default = ‘FORMATION’
terms_col | default = ‘DESCRIPTION’
verbose | default = False
log | default = False
remerge_data
classifieddf | default = ‘<output of previous function>’
searchdf | default = ‘<output of previous function>’
parallel_processing | default = False
fill_unclassified
df | default = ‘<output of previous function>’
classification_col | default = ‘CLASS_FLAG’
read_lithologies
lith_file | default = None
interp_col | default = ‘LITHOLOGY’
target_col | default = ‘CODE’
use_cols | default = None
verbose | default = False
log | default = False
merge_lithologies
well_data_df | default = ‘<output of previous function>’
targinterps_df | default = ‘<output of previous function>’
interp_col | default = ‘INTERPRETATION’
target_col | default = ‘TARGET’
target_class | default = ‘bool’
align_rasters
grids_unaligned | default = None
model_grid | default = None
no_data_val_grid | default = 0
verbose | default = False
log | default = False
get_drift_thick
surface_elev | default = None
bedrock_elev | default = None
layers | default = 9
plot | default = False
verbose | default = False
log | default = False
sample_raster_points
raster | default = None
points_df | default = None
well_id_col | default = ‘API_NUMBER’
xcol | default = ‘LONGITUDE’
ycol | default = ‘LATITUDE’
new_col | default = ‘SAMPLED’
verbose | default = False
log | default = False
get_layer_depths
df_with_depths | default = ‘<output of previous function>’
surface_elev_col | default = ‘SURFACE_ELEV’
layer_thick_col | default = ‘LAYER_THICK’
layers | default = 9
log | default = False
layer_target_thick
df | default = ‘<output of previous function>’
layers | default = 9
return_all | default = False
export_dir | default = None
outfile_prefix | default = None
depth_top_col | default = ‘TOP’
depth_bot_col | default = ‘BOTTOM’
log | default = False
layer_interp
points | default = ‘<no default>’
model_grid | default = ‘<no default>’
layers | default = None
interp_kind | default = ‘nearest’
surface_grid | default = None
bedrock_grid | default = None
layer_thick_grid | default = None
drift_thick_grid | default = None
return_type | default = ‘dataset’
export_dir | default = None
target_col | default = ‘TARG_THICK_PER’
layer_col | default = ‘LAYER’
xcol | default = None
ycol | default = None
xcoord | default = ‘x’
ycoord | default = ‘y’
log | default = False
verbose | default = False
kwargs | default = {}
export_grids
grid_data | default = ‘<no default>’
out_path | default = ‘<no default>’
file_id | default = ‘’
filetype | default = ‘tif’
variable_sep | default = True
date_stamp | default = True
verbose | default = False
log | default = False”
- w4h.sample_raster_points(raster=None, points_df=None, well_id_col='API_NUMBER', xcol='LONGITUDE', ycol='LATITUDE', new_col='SAMPLED', verbose=False, log=False)[source]¶
Sample raster values to points from geopandas geodataframe.
- Parameters:
- rasterrioxarray data array
Raster containing values to be sampled.
- points_dfgeopandas.geodataframe
Geopandas dataframe with geometry column containing point values to sample.
- well_id_colstr, default=”API_NUMBER”
Column that uniquely identifies each well so multiple sampling points are not taken per well
- xcolstr, default=’LONGITUDE’
Column containing name for x-column, by default ‘LONGITUDE.’ This is used to output (potentially) reprojected point coordinates so as not to overwrite the original.
- ycolstr, default=’LATITUDE’
Column containing name for y-column, by default ‘LATITUDE.’ This is used to output (potentially) reprojected point coordinates so as not to overwrite the original. new_col : str, optional
- new_colstr, default=’SAMPLED’
Name for name of new column containing points sampled from the raster, by default ‘SAMPLED’.
- verbosebool, default=True
Whether to send to print() information about progress of function, by default True.
- logbool, default = False
Whether to log results to log file, by default False
- Returns:
- points_dfgeopandas.geodataframe
Same as points_df, but with sampled values and potentially with reprojected coordinates.
- w4h.sort_dataframe(df, sort_cols=['API_NUMBER', 'TOP'], remove_nans=True)[source]¶
Function to sort dataframe by one or more columns.
- Parameters:
- dfpandas.DataFrame
Dataframe to be sorted
- sort_colsstr or list of str, default = [‘API_NUMBER’,’TOP’]
Name(s) of columns by which to sort dataframe, by default [‘API_NUMBER’,’TOP’]
- remove_nansbool, default = True
Whether or not to remove nans in the process, by default True
- Returns:
- df_sortedpandas.DataFrame
Sorted dataframe
- w4h.specific_define(df, terms_df, description_col='FORMATION', terms_col='DESCRIPTION', parallel_processing=False, verbose=False, log=False)[source]¶
Function to classify terms that have been specifically defined in the terms_df.
- Parameters:
- dfpandas.DataFrame
Input dataframe with unclassified well descriptions.
- terms_dfpandas.DataFrame
Dataframe containing the classifications
- description_colstr, default=’FORMATION’
Column name in df containing the well descriptions, by default ‘FORMATION’.
- terms_colstr, default=’DESCRIPTION’
Column name in terms_df containing the classified descriptions, by default ‘DESCRIPTION’.
- verbosebool, default=False
Whether to print up results, by default False.
- Returns:
- df_Interpspandas.DataFrame
Dataframe containing the well descriptions and their matched classifications.
- w4h.split_defined(df, classification_col='CLASS_FLAG', verbose=False, log=False)[source]¶
Function to split dataframe with well descriptions into two dataframes based on whether a row has been classified.
- Parameters:
- dfpandas.DataFrame
Dataframe containing all the well descriptions
- classification_colstr, default = ‘CLASS_FLAG’
Name of column containing the classification flag, by default ‘CLASS_FLAG’
- verbosebool, default = False
Whether to print results, by default False
- logbool, default = False
Whether to log results to log file
- Returns:
- Two-item tuple of pandas.Dataframe
tuple[0] is dataframe containing classified data, tuple[1] is dataframe containing unclassified data.
- w4h.start_define(df, terms_df, description_col='FORMATION', terms_col='DESCRIPTION', parallel_processing=False, verbose=False, log=False)[source]¶
Function to classify descriptions according to starting substring.
- Parameters:
- dfpandas.DataFrame
Dataframe containing all the well descriptions
- terms_dfpandas.DataFrame
Dataframe containing all the startswith substrings to use for searching
- description_colstr, default = ‘FORMATION’
Name of column in df containing descriptions, by default ‘FORMATION’
- terms_colstr, default = ‘FORMATION’
Name of column in terms_df containing startswith substring to match with description_col, by default ‘FORMATION’
- verbosebool, default = False
Whether to print out results, by default False
- logbool, default = True
Whether to log results to log file
- Returns:
- dfpandas.DataFrame
Dataframe containing the original data and new classifications
- w4h.wildcard_define(df, terms_df, description_col='FORMATION', terms_col='DESCRIPTION', verbose=False, log=False)[source]¶
Function to classify descriptions according to any substring.
- Parameters:
- dfpandas.DataFrame
Dataframe containing all the well descriptions
- terms_dfpandas.DataFrame
Dataframe containing all the startswith substrings to use for searching
- description_colstr, default = ‘FORMATION’
Name of column in df containing descriptions, by default ‘FORMATION’
- terms_colstr, default = ‘FORMATION’
Name of column in terms_df containing startswith substring to match with description_col, by default ‘FORMATION’
- verbosebool, default = False
Whether to print out results, by default False
- logbool, default = True
Whether to log results to log file
- Returns:
- dfpandas.DataFrame
Dataframe containing the original data and new classifications
- w4h.xyz_metadata_merge(xyz, metadata, verbose=False, log=False)[source]¶
Add elevation to header data file.
- Parameters:
- xyzpandas.Dataframe
Contains elevation for the points
- metadatapandas dataframe
Header data file
- logbool, default = False
Whether to log results to log file, by default False
- Returns:
- headerXYZDatapandas.Dataframe
Header dataset merged to get elevation values