This is a Bash+Python simple utility designed to automate uploading of data from csv files into tables in a relational database (DB). When deployed, it lurks in the background waiting for a trigger event that signifies availability of a new batch of data files in a predesignated directory.
Once triggered, it "wakes up", resets the triggering event (to be used next time), and takes stock of the new csv's. It then kick starts the DB server, creates one DB table per csv file, and populates the tables with relevant data.
Finally, it shuts down the DB server, and checks for errors. If none had occurred, it goes back to silent lurking. Otherwise, it sends you an email alert.
-
The utility is tested on a Ubunto 14.04 box running the following:
- Python 3.x with the yagmail module
- Upstart 1.12
- csvkit 0.9.1 (command line utilities)
-
After cloning the repo to your machine, copy
watcher.confto "/etc/init/". This is the file that will enable the utility to lurk as a daemon in the background. It sets a number of configurations that you may want to customize:- Line 1 lists the triggering event. By default, the utility will "wake to action" when a dummy text file, "success.txt", is created in the same directory where csv's are created/placed. This dummy file is generated by a sister utility named , "Automated-Data-Transfer" after it had done placing new csv's in that directory. If you want to use a different triggering event, consult the Upstart CookBook.
- On Line 2, replace my username, "tamer", with your linux/database username.
- On Line 4, replace "/home/tamer/recipient/" with the address of the directory where you'll place your new csv's and dummy trigger file.
- On line 6, replace "/home/tamer/recipient/data_store.sh " with the address of the bash script that will be run when the triggering event occurs.
-
Copy
email_alert.pyanddata_store.shto the directory of the csv's ("/home/tamer/recipient/" in my case). -
Open
data_store.sh, customize the following, then save and close:- On Line 2, replace "/home/tamer/recipient/" with the address of the directory where the csv's and dummy trigger file will be available.
- If you are not using the Postgresql DB, replace "postgresql" with the name of your DB server on the following lines: 6, 11, and 15.
- On Line 11, replace "experiment" with the name of the database you want to store the data an in. Make sure that the current linux/DB username has permissions to access and create new tables in this database.
- On Line 19, replace "/home/tamer/recipient/email_alert.py" with the address of the python email-alert script.
-
Open
email_alert.py, and provide credentials of the gmail account you'll use to send the email alert, and the address of the recipient email. -
Finally, make sure that the watcher service is up and lurking in the background! Type "sudo service watcher status" on the command line, and replace "status" with "start" if you find that watcher is stopped or not running.
#Whom Should You Blame?
Well, No one but yourself! I offer no warrantee, implied or explicit, for the code in any of my repositories. Use it at your own risk and discretion. I accept no liability, whatsoever, as a result of using it, or using a modified version of it!
Tamer Soliman, the author of this repository, has been immersed in data collection and statistical modeling since 2002. He holds a PhD in quantitative Experimental Psychology, where he designs experiments to understand and model human cognition, decision making, socio-cultural behavior, attitudes, and actions. He develops data-centered utilities and applications that subserve his data-science and machine-learning projects. While he approaches his projects with the mindset of a skeptic homo-academicus, he understands the concept of "deadlines", and loves making money just as all other homo-sapiens!