This repository contains the code for the application presented at flxw.de/code-repository-mining. It is a Python 3 client-server application. Please follow the subsections below to setup the individual parts:
The directory contents are as follows:
clientcontains the code that runs client-sideservercontains the code running server-side, providing an APIdatacontains the scripts needed for creating the necessary tables and viewsdocscontains several the project website, several artifacts and raw Jupyter notebooks
- Change directories to the
client/folder and install the dependencies:
pip install -r requirements.txt
Then, simply scan your system for vulnerable packages via: ./checksystem.py. Currently, only apt and pacman
package managers are supported, which translates to most Debian or Arch based Linux distributions.
To simulate the results that this application could potentially give, run ./checksystem.py --test.
It will show results for openssl package affected by Heartbleed.
This setup assumes the GHtorrent database dump at the HPI chair for software architecture. Furthermore a mongoDB instance needs to be running and you need to have access to it. The data procurement and setup is time-consuming:
- Change directories to
data/ - Copy the
config.py.smpltoconfig.pyand edit it so it works for your installation - Copy
config.pytoserver/as well - Run
create-cve-search-view.sql. Wait for completion. - Install
scrapyviapip install scrapy - Download my TweetScraper fork
- Configure the TweetScraper via its
settings.pyto reflect your PostgreSQL settings and have the TweetScraper use it - Run
./crawl-cve-tweets-from-github-subsetfrom inside the TweetScraper project directory. You can go ahead with the next step while the crawler is doing its thing. - Download and setup cve-search. Wait for completion here.
- Run
mine-cve-search-into-postgres.py. Wait for completion. - Run
create-reference-url-extraction-view.sql,create-tweet-extracted-views.sql,create-cwe-nist-reference-ranking.sqlandcreate-twitter-user-ranking.sql. In that order.
The API server setup is straightforward and can be summarized in three commands:
cd server
pip install -r requirements.txt
hug -f api.py