dataflow is a CLI script running on the gl-calcs Linux server that hosts the
InfluxDB time series database. The script scans folders for files and tries
to assign a filetype to each found file. If a filetype was successfully
assigned to a specific file, dataflow uploads the data of the respective
file using the settings for the respectively assigned filetype.
dataflow scans folders for files. Then, dataflow uses the POET script dbc-influxdb for:
- reading found files
- scanning found files for variables
- uploading found data to the database
dataflow configurations, including the different filetypes, are given in the configs folder.
Configurations for accessing the database are not included in the configs folder for security reasons.
dataflow uses poetry for dependency management.
Filetypes are defined in the configs, see here: Filetypes
gl-calcsis a Linux computer running CentOS 7- Source archive is built via
poetrywithpoetry build.- Example:
dataflow-0.3.0.tar.gz
- Example:
- The resulting
.tar.gzfile is uploaded to the servergl-calcs. - On the server, the script is installed using
pipx, e.g.,pipx install /path/to/file/dataflow-0.3.0.tar.gz. - This also installs the script
dbc-influxdbfor uploading data to the database. - The script can also be installed directly from source to install a specific version
with
pipx install https://github.com/holukas/dataflow/archive/refs/tags/v0.10.3.tar.gz. This example would install script v0.10.3.
Accessed using the help argument with python .\main.py -h.
usage: main.py [-h] [-y YEAR] [-m MONTH] [-l FILELIMIT] [-n NEWESTFILES] site datatype access filegroup dirconf
dataflow
positional arguments:
site Site abbreviation, e.g. ch-dav, ch-lae
datatype Data type: 'raw' for raw data, 'processed' for processed data
access Access to data via 'server' address (e.g. outside gl-calcs) or 'mount' path (e.g. on gl-calcs)
filegroup Data group, e.g. '10_meteo'
dirconf Path to folder with configuration settings
optional arguments:
-h, --help show this help message and exit
-y YEAR, --year YEAR Year (default: None)
-m MONTH, --month MONTH
Month (default: None)
-l FILELIMIT, --filelimit FILELIMIT
File limit, 0 corresponds to no limit. (default: 0)
-n NEWESTFILES, --newestfiles NEWESTFILES
Consider newest files only, 0 means keep all files, e.g. 3 means keep 3 newest files. Is applied after FILELIMIT was considered. (default: 0)
With the dataflow script installed via pipx (see above) it can be called with
dataflow ch-aws raw mount 10_meteo /home/holukas/source_code/configs -y 2023 -n 10
dataflowuses the script installed withpipxch-awsis the siterawis the datatype, in this case we want to upload raw datamountmeans we are using the mounted server locations defined in theconfigs10_meteois the filegroup, basically this is the subfolder we use to store this kind of data on the raw data server./home/holukas/source_code/configsis the location of the config files, in this case we are using the location on the Linux computer.-y 2023means that only data for the year 2023 are considered (i.e., searched and uploaded to the database)-n 10means that of all files found, only the newest 10 files are considered
This command can easily be used to automate execution e.g. via cronjobs.
Alternatively the script can be called directly using the local Python version and source code:
python .\main.py ch-aws raw mount 10_meteo /home/holukas/source_code/configs -y 2023 -n 10
This example executes the script on a Windows computer using the CLI.
python .\main.py ch-aws raw server 10_meteo "F:\Sync\luhk_work\20 - CODING\22 - POET\configs" -y 2023 -n 1
pythonis the used Python version, e.g. in acondaenvironmentmain.pyis the entry point for the scriptch-awsis the siterawis the datatype, in this case we want to upload raw dataservermeans we are using the network addresses such as\\serverxyz.ethz.ch\archive\FluxData10_meteois the filegroup, basically this is the subfolder we use to store this kind of data on the raw data server."F:\Sync\luhk_work\20 - CODING\22 - POET\configs"is the location of the config files, in this case we are using a local Windows folder.-y 2023means that only data for the year 2023 are considered (i.e., searched and uploaded to the database)-n 10means that of all files found, only the newest 10 files are considered