Skip to content

This is a stand-alone Ubunto utility that automates uploading of data from multiple CSV's to a relational database, and sends email alert in the case of failure. It is auto-triggered when data files are available. It is integrated in my composite project: Data Analysis Pipeline

Notifications You must be signed in to change notification settings

TamerSoliman/Automated_Data_Storing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Tell me the Story!

What is it & what does it do?

This is a Bash+Python simple utility designed to automate uploading of data from csv files into tables in a relational database (DB). When deployed, it lurks in the background waiting for a trigger event that signifies availability of a new batch of data files in a predesignated directory.

Once triggered, it "wakes up", resets the triggering event (to be used next time), and takes stock of the new csv's. It then kick starts the DB server, creates one DB table per csv file, and populates the tables with relevant data.

Finally, it shuts down the DB server, and checks for errors. If none had occurred, it goes back to silent lurking. Otherwise, it sends you an email alert.

Get me Fixed!

How can I deploy it?

  1. The utility is tested on a Ubunto 14.04 box running the following:

    • Python 3.x with the yagmail module
    • Upstart 1.12
    • csvkit 0.9.1 (command line utilities)
  2. After cloning the repo to your machine, copy watcher.conf to "/etc/init/". This is the file that will enable the utility to lurk as a daemon in the background. It sets a number of configurations that you may want to customize:

    • Line 1 lists the triggering event. By default, the utility will "wake to action" when a dummy text file, "success.txt", is created in the same directory where csv's are created/placed. This dummy file is generated by a sister utility named , "Automated-Data-Transfer" after it had done placing new csv's in that directory. If you want to use a different triggering event, consult the Upstart CookBook.
    • On Line 2, replace my username, "tamer", with your linux/database username.
    • On Line 4, replace "/home/tamer/recipient/" with the address of the directory where you'll place your new csv's and dummy trigger file.
    • On line 6, replace "/home/tamer/recipient/data_store.sh " with the address of the bash script that will be run when the triggering event occurs.
  3. Copy email_alert.py and data_store.sh to the directory of the csv's ("/home/tamer/recipient/" in my case).

  4. Open data_store.sh, customize the following, then save and close:

    • On Line 2, replace "/home/tamer/recipient/" with the address of the directory where the csv's and dummy trigger file will be available.
    • If you are not using the Postgresql DB, replace "postgresql" with the name of your DB server on the following lines: 6, 11, and 15.
    • On Line 11, replace "experiment" with the name of the database you want to store the data an in. Make sure that the current linux/DB username has permissions to access and create new tables in this database.
    • On Line 19, replace "/home/tamer/recipient/email_alert.py" with the address of the python email-alert script.
  5. Open email_alert.py, and provide credentials of the gmail account you'll use to send the email alert, and the address of the recipient email.

  6. Finally, make sure that the watcher service is up and lurking in the background! Type "sudo service watcher status" on the command line, and replace "status" with "start" if you find that watcher is stopped or not running.

    #Whom Should You Blame?

Well, No one but yourself! I offer no warrantee, implied or explicit, for the code in any of my repositories. Use it at your own risk and discretion. I accept no liability, whatsoever, as a result of using it, or using a modified version of it!

Tamer Soliman, the author of this repository, has been immersed in data collection and statistical modeling since 2002. He holds a PhD in quantitative Experimental Psychology, where he designs experiments to understand and model human cognition, decision making, socio-cultural behavior, attitudes, and actions. He develops data-centered utilities and applications that subserve his data-science and machine-learning projects. While he approaches his projects with the mindset of a skeptic homo-academicus, he understands the concept of "deadlines", and loves making money just as all other homo-sapiens!

About

This is a stand-alone Ubunto utility that automates uploading of data from multiple CSV's to a relational database, and sends email alert in the case of failure. It is auto-triggered when data files are available. It is integrated in my composite project: Data Analysis Pipeline

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published