Proof of Concept for ingestion of twitter streams into Google BigQuery via Google Dataflow
Project layout:
beam-jobs/- Python Beam jobs for loading tweests into BigQuery and for generating a list of trending topics.ingest-twitter/- scripts to ingest Tweet streamssample-data/- some already collected tweetsterraform/- Infrastructure setup
- Setup infra per
terraform/folder. - Get twitter stream data with scripts in
ingest-twitter/folder or use supplied sample data. - Copy twitter data to created Google Cloud Storage bucket.
- Install python dependencies
pip install -r requirements.txt - In
beam-jobs/`` adjust parameters inrun-import-on-gcp.sh` and run.