w4h.read module¶
The Read module contains funtions primarily for the input of data through the reading of data files, as well as support functions to carry out this task
- w4h.read.add_control_points(df_without_control, df_control=None, xcol='LONGITUDE', ycol='LATITUDE', zcol='ELEV_FT', controlpoints_crs='EPSG:4269', output_crs='EPSG:5070', description_col='FORMATION', interp_col='INTERPRETATION', target_col='TARGET', verbose=False, log=False, **kwargs)[source]¶
Function to add control points, primarily to aid in interpolation. This may be useful when conditions are known but do not exist in input well database
- Parameters:
- df_without_controlpandas.DataFrame
Dataframe with current working data
- df_controlstr, pathlib.Purepath, or pandas.DataFrame
Pandas dataframe with control points
- well_keystr, optional
The column containing the “key” (unique identifier) for each well, by default ‘API_NUMBER’
- xcolstr, optional
The column in df_control containing the x coordinates for each control point, by default ‘LONGITUDE’
- ycolstr, optional
The column in df_control containing the y coordinates for each control point, by default ‘LATITUDE’
- zcolstr, optional
The column in df_control containing the z coordinates for each control point, by default ‘ELEV_FT’
- controlpoints_crsstr, optional
The column in df_control containing the crs of points, by default ‘EPSG:4269’
- output_crsstr, optional
The output coordinate system, by default ‘EPSG:5070’
- description_colstr, optional
The column in df_control with the description (if this is used), by default ‘FORMATION’
- interp_colstr, optional
The column in df_control with the interpretation (if this is used), by default ‘INTERPRETATION’
- target_colstr, optional
The column in df_control with the target code (if this is used), by default ‘TARGET’
- verbosebool, optional
Whether to print information to terminal, by default False
- logbool, optional
Whether to log information in log file, by default False
- **kwargs
Keyword arguments of pandas.concat() or pandas.read_csv that will be passed to that function, except for objs, which are df and df_control
- Returns:
- pandas.DataFrame
Pandas DataFrame with original data and control points formatted the same way and concatenated together
- w4h.read.define_dtypes(undefined_df, datatypes=None, verbose=False, log=False)[source]¶
Function to define datatypes of a dataframe, especially with file-indicated dyptes
- Parameters:
- undefined_dfpd.DataFrame
Pandas dataframe with columns whose datatypes need to be (re)defined
- datatypesdict, str, pathlib.PurePath() object, or None, default = None
Dictionary containing datatypes, to be used in pandas.DataFrame.astype() function. If None, will read from file indicated by dtype_file (which must be defined, along with dtype_dir), by default None
- logbool, default = False
Whether to log inputs and outputs to log file.
- Returns:
- dfoutpandas.DataFrame
Pandas dataframe containing redefined columns
- w4h.read.file_setup(well_data, metadata=None, data_filename='*ISGS_DOWNHOLE_DATA*.txt', metadata_filename='*ISGS_HEADER*.txt', log_dir=None, verbose=False, log=False)[source]¶
Function to setup files, assuming data, metadata, and elevation/location are in separate files (there should be one “key”/identifying column consistent across all files to join/merge them later)
This function may not be useful if files are organized differently than this structure. If that is the case, it is recommended to use the get_most_recent() function for each individual file if needed. It may also be of use to simply skip this function altogether and directly define each filepath in a manner that can be used by pandas.read_csv()
- Parameters:
- well_datastr or pathlib.Path object
Str or pathlib.Path to directory containing input files, by default str(repoDir)+’/resources’
- metadatastr or pathlib.Path object, optional
Str or pathlib.Path to directory containing input metadata files, by default str(repoDir)+’/resources’
- data_filenamestr, optional
Pattern used by pathlib.glob() to get the most recent data file, by default ‘ISGS_DOWNHOLE_DATA.txt’
- metadata_filenamestr, optional
Pattern used by pathlib.glob() to get the most recent metadata file, by default ‘ISGS_HEADER.txt’
- log_dirstr or pathlib.PurePath() or None, default=None
Directory to place log file in. This is not read directly, but is used indirectly by w4h.logger_function()
- verbosebool, default = False
Whether to print name of files to terminal, by default True
- logbool, default = True
Whether to log inputs and outputs to log file.
- Returns:
- tuple
Tuple with paths to (well_data, metadata)
- w4h.read.get_current_date()[source]¶
Gets the current date to help with finding the most recent file¶
- Parameters:
None
dateSuffix : str to use for naming output files
- w4h.read.get_most_recent(dir=WindowsPath('c:/Users/balikian/LocalData/CodesScripts/Github/wells4hydrogeology/w4h/resources'), glob_pattern='*', verbose=False)[source]¶
Function to find the most recent file with the indicated pattern, using pathlib.glob function.
- Parameters:
- dirstr or pathlib.Path object, optional
Directory in which to find the most recent file, by default str(repoDir)+’/resources’
- glob_patternstr, optional
String used by the pathlib.glob() function/method for searching, by default ‘*’
- Returns:
- pathlib.Path object
Pathlib Path object of the most recent file fitting the glob pattern indicated in the glob_pattern parameter.
- w4h.read.get_search_terms(spec_path='C:\\Users\\balikian\\LocalData\\CodesScripts\\Github\\wells4hydrogeology\\docs/resources/', spec_glob_pattern='*SearchTerms-Specific*', start_path=None, start_glob_pattern='*SearchTerms-Start*', wildcard_path=None, wildcard_glob_pattern='*SearchTerms-Wildcard', verbose=False, log=False)[source]¶
Read in dictionary files for downhole data
- Parameters:
- spec_pathstr or pathlib.Path, optional
Directory where the file containing the specific search terms is located, by default str(repoDir)+’/resources/’
- spec_glob_patternstr, optional
Search string used by pathlib.glob() to find the most recent file of interest, uses get_most_recent() function, by default ‘SearchTerms-Specific’
- start_pathstr or None, optional
Directory where the file containing the start search terms is located, by default None
- start_glob_patternstr, optional
Search string used by pathlib.glob() to find the most recent file of interest, uses get_most_recent() function, by default ‘SearchTerms-Start’
- wildcard_pathstr or pathlib.Path, default = None
Directory where the file containing the wildcard search terms is located, by default None
- wildcard_glob_patternstr, default = ‘*SearchTerms-Wildcard’
Search string used by pathlib.glob() to find the most recent file of interest, uses get_most_recent() function, by default ‘SearchTerms-Wildcard’
- logbool, default = True
Whether to log inputs and outputs to log file.
- Returns:
- (specTermsPath, startTermsPath, wilcardTermsPath)tuple
Tuple containing the pandas dataframes with specific search terms, with start search terms, and with wildcard search terms
- w4h.read.read_dict(file, keytype='np')[source]¶
Function to read a text file with a dictionary in it into a python dictionary
- Parameters:
- filestr or pathlib.Path object
Filepath to the file of interest containing the dictionary text
- keytypestr, optional
String indicating the datatypes used in the text, currently only ‘np’ is implemented, by default ‘np’
- Returns:
- dict
Dictionary translated from text file.
- w4h.read.read_dictionary_terms(dict_file=None, id_col='ID', search_col='DESCRIPTION', definition_col='LITHOLOGY', class_flag_col='CLASS_FLAG', dictionary_type=None, class_flag=6, rem_extra_cols=True, verbose=False, log=False)[source]¶
Function to read dictionary terms from file into pandas dataframe
- Parameters:
- dict_filestr or pathlib.Path object, or list of these
File or list of files to be read
- search_colstr, default = ‘DESCRIPTION’
Name of column containing search terms (geologic formations)
- definition_colstr, default = ‘LITHOLOGY’
Name of column containing interpretations of search terms (lithologies)
- dictionary_typestr or None, {None, ‘exact’, ‘start’, ‘wildcard’,}
- Indicator of which kind of dictionary terms to be read in: None, ‘exact’, ‘start’, or ‘wildcard’ by default None.
If None, uses name of file to try to determine. If it cannot, it will default to using the classification flag from class_flag
If ‘exact’, will be used to search for exact matches to geologic descriptions
If ‘start’, will be used as with the .startswith() string method to find inexact matches to geologic descriptions
If ‘wildcard’, will be used to find any matching substring for inexact geologic matches
- class_flagint, default = 1
Classification flag to be used if dictionary_type is None and cannot be otherwise determined, by default 1
- rem_extra_colsbool, default = True
Whether to remove the extra columns from the input file after it is read in as a pandas dataframe, by default True
- logbool, default = False
Whether to log inputs and outputs to log file.
- Returns:
- dict_termspandas.DataFrame
Pandas dataframe with formatting ready to be used in the classification steps of this package
- w4h.read.read_lithologies(lith_file=None, interp_col='LITHOLOGY', target_col='CODE', use_cols=None, verbose=False, log=False)[source]¶
Function to read lithology file into pandas dataframe
- Parameters:
- lith_filestr or pathlib.Path object, default = None
Filename of lithology file. If None, default is contained within repository, by default None
- interp_colstr, default = ‘LITHOLOGY’
Column to used to match interpretations
- target_colstr, default = ‘CODE’
Column to be used as target code
- use_colslist, default = None
Which columns to use when reading in dataframe. If None, defaults to [‘LITHOLOGY’, ‘CODE’].
- logbool, default = True
Whether to log inputs and outputs to log file.
- Returns:
- pandas.DataFrame
Pandas dataframe with lithology information
- w4h.read.read_raw_csv(data_filepath, metadata_filepath, data_cols=None, metadata_cols=None, xcol='LONGITUDE', ycol='LATITUDE', well_key='API_NUMBER', encoding='latin-1', verbose=False, log=False, **read_csv_kwargs)[source]¶
Easy function to read raw .txt files output from (for example), an Access database
- Parameters:
- data_filepathstr
Filename of the file containing data, including the extension.
- metadata_filepathstr
Filename of the file containing metadata, including the extension.
- data_colslist, default = None
List with strings with names of columns from txt file to keep after reading. If None, [“API_NUMBER”,”TABLE_NAME”,”FORMATION”,”THICKNESS”,”TOP”,”BOTTOM”], by default None.
- metadata_colslist, default = None
List with strings with names of columns from txt file to keep after reading. If None, [‘API_NUMBER’,”TOTAL_DEPTH”,”SECTION”,”TWP”,”TDIR”,”RNG”,”RDIR”,”MERIDIAN”,”QUARTERS”,”ELEVATION”,”ELEVREF”,”COUNTY_CODE”,”LATITUDE”,”LONGITUDE”,”ELEVSOURCE”], by default None
- x_colstr, default = ‘LONGITUDE’
Name of column in metadata file indicating the x-location of the well, by default ‘LONGITUDE’
- ycolstr, default = ‘LATITUDE’
Name of the column in metadata file indicating the y-location of the well, by default ‘LATITUDE’
- well_keystr, default = ‘API_NUMBER’
Name of the column with the key/identifier that will be used to merge data later, by default ‘API_NUMBER’
- encodingstr, default = ‘latin-1’
Encoding of the data in the input files, by default ‘latin-1’
- verbosebool, default = False
Whether to print the number of rows in the input columns, by default False
- logbool, default = False
Whether to log inputs and outputs to log file.
- **read_csv_kwargs
**kwargs that get passed to pd.read_csv()
- Returns:
- (pandas.DataFrame, pandas.DataFrame/None)
Tuple/list with two pandas dataframes: (well_data, metadata) metadata is None if only well_data is used
- w4h.read.read_xyz(xyzpath, datatypes=None, verbose=False, log=False)[source]¶
Function to read file containing xyz data (elevation/location)
- Parameters:
- xyzpathstr or pathlib.Path
Filepath of the xyz file, including extension
- datatypesdict, default = None
Dictionary containing the datatypes for the columns int he xyz file. If None, {‘ID’:np.uint32,’API_NUMBER’:np.uint64,’LATITUDE’:np.float64,’LONGITUDE’:np.float64,’ELEV_FT’:np.float64}, by default None
- verbosebool, default = False
Whether to print the number of xyz records to the terminal, by default False
- logbool, default = False
Whether to log inputs and outputs to log file.
- Returns:
- pandas.DataFrame
Pandas dataframe containing the elevation and location data