-
Notifications
You must be signed in to change notification settings - Fork 0
Home
FOSSology generates a large set of data that is exported to the time-series influx database and visualized with the help of Grafana. I Wrote a fossdash_publisher script that collects useful data from FOSSology DB (Postgres) and exposes them to influx DB. Developed a visualization dashboard in Grafana by integrating influx as an input data source.
This project is divided into two parts:
- Generating meaningful data from fossology DB and publish those data to InfluxDB ( Time-series database ) using a fossdash-publisher script.
- In the Grafana using a query tool, We can get InfluxDB data and show it using meaningful charts and graphs.
- Install fossology - You can get more information about fossology from Here: https://github.com/fossology/fossology/wiki
- Install the fossdash dependency by running the script
install/fossdash/fossdash_dep_install.sh
- We can configure fossdash from the Fossology Sysconfig UI page.
- by going to
Admin->Fossdash
.
- by going to
-
Enable/Disable
fossdash from above fossology UI sysconfig page. - Set InfluxDB server URL (
FossDash Endpoint URL
): We are pushing theses all data metrics to specified InfluxDB URL.- URL included = influxDB URL + database name
- If running local installation of fossology:
http://localhost:8086/write?db=fossology_db
- If running docker instance of fossology:
http://influxdb:8086/write?db=fossology_db
-
Fosslogy_instance_name
: Set the fossology instance name, leave empty to use autogenerated UUID value.- If you leave this field empty, It automatically used the default UUID (e.g. 569ef786-4182-4b8d-bbf4-bbe055cfc3f3 )
- you can configure your custom unique fossology instance name.
-
Cron job Configuration
: we are triggering a fossdash-publisher script to run at every interval specified according to the cronjob interval.- Every minute:
* * * * *
- Schedule a cron to execute at 1am daily:
0 1 * * *
- Every minute:
-
Fossdash reported files cleaning
: Number of days for which the successfully pushed metrics are archived. Older data will be deleted. Leave empty to disable cleanup.- To save disk space and removes old reported files.
- Set it to zero to delete all reported files.
-
Auth_type for InfluxDB
: you can choose either username_password based authentication or Token_based authentication to push data to inlfuxDB.-
username_password
: Asked from InfluxDB admin to create username and password for you and have access to the databasefossology_db
- To test user_pssword auth:
curl -XPOST "localhost:8086/query?db=fossology_db&u=admin&p=admin" --data-urlencode 'q=show MEASUREMENTS'
- To test user_pssword auth:
-
token_based
: generate JWT token by providing your influx username + shared secrete key of InfluxDB (get it from InfluxDB config file) + expiration timestamp. - Steps to generate InfluxDB token
- Read doc here: https://docs.influxdata.com/influxdb/v1.8/administration/authentication_and_authorization/#authenticate-using-jwt-tokens
- Find or define the
shared-secret
value in the config fileinfluxdb-etc/influxdb.conf
- Create a Unix timestamp: Either use date
"+%s" -d "2020/12/22"
or https://www.unixtimestamp.com/ - Generate token here: https://jwt.io/
- Write, in the data block, replace admin and timestamp
{ "username": "admin", "exp": 1607731200 }
- Paste the shared secret in the VERIFY SIGNATURE block
- Copy the Token from the left box
- Test the token with:
curl -XPOST "localhost:8086/query?db=fossology_db" -H 'Authorization: <token>' --data-urlencode 'q=show MEASUREMENTS'
-
-
Fossdash metric-reporting config
: you can modify(inlude/exclude) fossdash metrics display in dashboard by using this configuration.- If you leave empty: It used the default metric config file.
install/fossdash/fossdash_metrics.yml
- If you want to add a new metric field in the dashboard
- Add new metric name in the
QUERIES_NAME
list;-
QUERIES_NAME: [ ... , "number_of_users" ]
-
- Add same query name and its related query to fetch the data from postgres (fossologyDB)
-
QUERY: ... ... number_of_users: "SELECT count(u.*) AS users FROM users u;"
-
- Add new metric name in the
- If you leave empty: It used the default metric config file.
- Clone this repository: https://github.com/Orange-OpenSource/fossdash
- Configure Environment variable for grafana and influxDB
- SERVICE_URL_ROOT=
http://localhost
- INFLUXDB_ADMIN_USER=
admin
- INFLUXDB_ADMIN_PASSWORD=
admin
- etc...
- SERVICE_URL_ROOT=
- Once you are done with the configuration, you can run this as docker instance:
docker-compose up -d
- go to
http://localhost:8081/grafana
- There two dashboard
-
Instances-Specific_FOSSY_DASH
: You can choose fossology instances from drop-down to get all stats related to the selected instance. -
FOSSY_DASH
: generic dashboard gives stats about all fossology instance.
-
- you can change the time range to get statistics between two times.
Absolute time range
at the top right corner.
- There two dashboard
- Storing reported and unreported metrics files in
/srv/fossology/repository/fossdash
- Checking fossdash log in
/srv/fossology/repository/fossdash/fossdash.log
file - Fossdash metrics config:
/srv/fossology/repository/fossdash/fossdash_metrics.yml
-
Added fossdash configuration.
src/lib/php/fossdash-config.php
src/www/ui/admin-fossdash-config.php
-
Changes in fossology sysconfig UI to add Bootstrap classes.
src/www/ui/admin-config.php
src/lib/php/common-sysconfig.php
-
changes in respective
Makefile
-
Wrote a
fossdash-publisher
script in python to get the latest data from Postgres and convert them to InfluxDB standard formate and then push them to InfluxDB.install/fossdash/fossdash-publish.py.in
- Metric-reported config for fossdash:
install/fossdash/fossdash_metrics.yml
-
Developed a Dashboards in grafana to show these data metrics in a meaningful way.
- General Dashboard: It shows combined information of all instances of fossology.
- Instance-specific-Dashboard: It shows only instance-specific data metric on the dashboard. You can choose any specific instance name from the dropdown.
-
Checking cron job entry using :
crontab -l
OR/var/spool/cron/crontabs/
- GitHub Repo (Publisher script): https://github.com/fossology/fossology/
- Github Repo (InfluxDB and Grafana Dashboard): https://github.com/Orange-OpenSource/fossdash
- Student: Darshan Kansagara (darshank15)
- Mentor(s): Gaurav Mishra (@GMishx), Shaheem Azmal (@shaheemazmalmmd), Sandipbhuyan (@sandipbhuyan)
- Clone fossology repo and set up to run it locally
- Understand the terminology used by the fossology project by going through the documentation. Read wiki page about the fossology
- Understanding more on Prometheus and influxdb real-time data source and Grafana.
- Created two demo architecture for dashboard
- Using Prometheus as a data source ( Pull based architecture )
- Using influxdb as a data source ( Push based architecture )
- Created dashboard for each to showcase as POC of our idea.
- Created document for basic terminology of Prometheus and grafana for the beginners Link .
- Link to commit: https://github.com/darshank15/GSoC_2020_FOSSOlogy/commit/121695cc6569b9f0a042d6b88f8bd8fc287633a7
- Discuss with mentor about project and its goals and understand it more clearly.
- Understanding bash script as it was used much more in many scripts in fossology.
- Look into some docker command and learn docker-compose
- Take look into fossology branch for fossdash
dev/fossdash-exporter
- Run it locally to see data generated by the python script file “fossdash-publish.py”
- Solve issue by modifying the above python file
- Link to Issue: https://github.com/darshank15/GSoC_2020_FOSSOlogy/issues/1
Initially python script generatting data in below formate :
agents_count.copyright,instance=c7fe15ee-5c9c-4687-91d8-b1ba840e6b00 value=1 1591286501000000000
agents_count.ecc,instance=c7fe15ee-5c9c-4687-91d8-b1ba840e6b00 value=1 1591286501000000000
We want to modify python script to get data as below to get new tag_set for type of agent_count
agents_count,instance=c7fe15ee-5c9c-4687-91d8-b1ba840e6b00,type=copyright value=1 1591246503000000000
agents_count,instance=c7fe15ee-5c9c-4687-91d8-b1ba840e6b00,type=ecc value=1 1591246503000000000
So we can do groupby based on INSTANCE as well as based on TYPE.
- Made a simple dashboard for the above newly generated data for the showcase.
- Look into an issue for configuring fossdash to work on both Docker containers as well as in source code.
- the issue is here: https://github.com/darshank15/GSoC_2020_FOSSOlogy/issues/2
- Look into Makefile for configuration
- Changed the code in
common-sysconfig.php
to get an input box in UI to configure FossDash URL and store it into the database table sysconfig. - Wrote script
run_me.py
which will trigger to read updated data from the database and modify fossdash.conf file. - I also started working on code to include VERSION info into the influx DB.
- The issue is here: https://github.com/darshank15/GSoC_2020_FOSSOlogy/issues/3
- Currently fetched all versions and build info from the VERSION file. But later on, we can get this data from the database table.
- Continue on the Configuration of fossdash.
- Continue on including VERSION and build info into influxDB.
- Created charts Panel for it into the grafana.
- Link to new commits: https://github.com/darshank15/fossology/commit/9d741682df78aa728e704a91af2a1a772eeac1f7
- New Dashboard Looks like as below.
- Started wiki page Link to Wiki for work done till now.
- Included introduction, architecture, and codebase details of fossdash.
- Included weekly report details.
- Currently, we have two scripts are
fossdash-publish.py
(Python script) andfossdash-publish.run
(bash script), alsofossdash.conf
file. - We can merge these two scripts into a single script, Which is easy to maintain and execute for fossdash.
- converted and rewritten the bash script code of fossdash-publish.run into python file - fossdash-publish.py
- segregated the VERSION content into three different parts: version, total commit since the last release, latest commit.
- Link to Issue: https://github.com/darshank15/GSoC_2020_FOSSOlogy/issues/5
- Link to commit: https://github.com/Orange-OpenSource/fossology/pull/1/commits/7226d27becd544d083d1a5d1feeaa0f3a088ae92
- New Dashboard looks as follows.
- Rename UUID on a Fossology instance
- Link to Issue: https://github.com/darshank15/GSoC_2020_FOSSOlogy/issues/6
- Done with the first GSoC evalution.
- Task-1 Cleaning old fossdash reported files.
- As of now we generating all reported files posts send the data to the influxDB. As the fossdash script may be running every day, the Amount of all reported files in local space will be larger and more disk consumption over the period of time.
- Implemented this functionality using
find
command (ctime, maxDepth) to get and delete older reported file to save the disk space.
- Task-2 Cron job configuration to schedule an interval for the fossdash script file.
- From configuration, the user can change the corn job schedule interval for fossdash.
- Done using
crontab
command to update cronjob interval.
- Task-3 Implemented Enable/Disable button to control the functionality of the fossdash.
- Link to Issue: https://github.com/darshank15/GSoC_2020_FOSSOlogy/issues/7
- Fixed the issue: Wrong permissions for fossdash folder
- Fixed the issue: Enabling Fossdash overrides existing CRON Setting
- Fixed the issue: Change metrics data filename to include a timestamp
- As of now, When we changed Instance_name then scripts will fetch only old reported files and replaces the old name with a new name. Now I have included the code to fetches all unreported files as well and replace fossology instance name into them.
- Removed static path to run fossdash script from PHP code. Using GLOBALS variable : $GLOBALS['SysConf']['DIRECTORIES']['LIBEXECDIR'];
- On DISABLE the fossdash, we are doing grep to get fossdash cron entry and remove that single entry rather than doing
crontab -r
- Added influxDB authentication by admin username and password.
- Link to Issue: https://github.com/darshank15/GSoC_2020_FOSSOlogy/issues/12
- part1: Enable authentication in InfluxDB. ==> Commit : Orange-OpenSource/fossdash@beff4fc
- part2: In the publisher script, Added authentication header to authenticate the sending data request. ==> Commit : Orange-OpenSource/fossology@3b28e4a
- Create generic Instance-oriented dashboard
- Link to Issue: https://github.com/darshank15/GSoC_2020_FOSSOlogy/issues/14
- Added an INSTANCE type in every metric field. So that we can perform filtering based on Instance ID.
- Created Instance specific Dashboard.
- Added token-based authentication in an influxDB.
- Token(JWT token) generated using shared secrete key + username + expiration time.
- Link to Commit: https://github.com/Orange-OpenSource/fossology/commit/0f81a157edcab002c99f5e0206fd8d4496ed8784
- We can't overwrite same user with a different password. An added feature when the only password changed and username remain same.
- Created instance-specific dashboard , We can get a dropdown menu to select the particular instance.
- Added an INSTANCE type in every metric field. So that we can perform filtering based on Instance ID.
- Created Instance specific Dashboard.
- See the image here: https://github.com/darshank15/GSoC_2020_FOSSOlogy/blob/master/dashboard-images/4_instance_specific_dashboard_images_21_July_2020.png
- Adding error msg when Entering wrong input value.
- Link to Issue: https://github.com/darshank15/GSoC_2020_FOSSOlogy/issues/15
- Added drop-down for Two different kinds of authentication on InfluxDB, Show/Hide below Auth fields accordingly.
- Link to Issue: https://github.com/darshank15/GSoC_2020_FOSSOlogy/issues/20
- Earlier Some cron configuration is ignored. Corrected the Cron Validation.
- Updated Wiki page about how to get started with fossdash.
- refactor code to convert fossdash script from python2 to python3.
- fixed a small bug, on page-reload, showHide is not working.
- Rebase and resolve merge conflict to get the latest changes from the master branch.
- Wrote script for local installation of grafana and influxdb.
- Link to Issue: https://github.com/darshank15/GSoC_2020_FOSSOlogy/issues/18
- Refactoring the fossdash_publisher script, to remove query metrics from code and put it into the configuration way.