Skip to content

Commit

Permalink
HELLODATA-1881 - sftpgo
Browse files Browse the repository at this point in the history
  • Loading branch information
Slawomir Wieczorek committed Jan 22, 2025
1 parent 182970b commit 224f526
Show file tree
Hide file tree
Showing 176 changed files with 54,037 additions and 54 deletions.
Binary file added docs/docs/images/20250121155015.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
138 changes: 85 additions & 53 deletions docs/docs/manuals/user-manual.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,45 +2,54 @@

## Goal

This use manual should enable you to use the HelloDATA platform and illustrate the features of the product and how to use them.
This use manual should enable you to use the HelloDATA platform and illustrate the features of the product and how to
use them.

→ More about the Platform and its architecture you can find on [Architecture & Concepts](../architecture/architecture.md).
→ More about the Platform and its architecture you can find
on[Architecture & Concepts](../architecture/architecture.md).

## Navigation

### Portal

The entry page of HelloDATA is the [Web Portal](../architecture/data-stack.md#control-pane-portal).

1. Navigation to jump to the different capabilities of HelloDATA
2. Extended status information about
1. data pipelines, containers, performance and security
2. documentation and subscriptions
3. User and profile information of logged-in user. 
4. Overview of your dashboards
3. User and profile information of logged-in user.4. Overview of your dashboards

![](../images/1068204566.png)

#### Business & Data Domain
As explained in [Domain View](../architecture/architecture.md#domain-view), a key feature is to create business domains with n-data domains. If you have access to more than one data domain, you can switch between them by clicking the `drop-down` at the top and switch between them.

As explained in [Domain View](../architecture/architecture.md#domain-view), a key feature is to create business domains
with n-data domains. If you have access to more than one data domain, you can switch between them by clicking the
`drop-down` at the top and switch between them.

![](../images/Pasted%20image%2020231130145958.png)

### Dashboards

The most important navigation button is the dashboard links. If you hover over it, you'll see three options to choose from. 
The most important navigation button is the dashboard links. If you hover over it, you'll see three options to choose
from.

You can either click the dashboard list in the hover menu (2) to see the list of dashboards with thumbnails, or directly choose your dashboard (3).
You can either click the dashboard list in the hover menu (2) to see the list of dashboards with thumbnails, or directly
choose your dashboard (3).

![](../images/1068204575.png)

### Data-Lineage

To see the data lineage (dependencies of your data tables), you have the second menu option. Again, you chose the list or directly on "data lineage" (2).
To see the data lineage (dependencies of your data tables), you have the second menu option. Again, you chose the list
or directly on "data lineage" (2).

Button 2 will bring you to the project site, where you choose your project and load the lineage.
![](../images/1068204578.png)

Once loaded, you see all sources (1) and dbt Projects (2). On the detail page, you can see all the beautiful and helpful documentation such as:
Once loaded, you see all sources (1) and dbt Projects (2). On the detail page, you can see all the beautiful and helpful
documentation such as:

- table name (3)
- columns and data types (4)
Expand All @@ -61,58 +70,67 @@ This view let's you access the universaal data mart (udm) layer:

![](../images/Pasted%20image%2020231130155512.png)

These are cleaned and modeled data mart tables. Data marts are the tables that have been joined and cleaned from the source tables. This is effectively the latest layer of HelloDATA BE, which the Dashboards are accessing. Dashboards should not access any layer before (landing zone, data storage, or data processing).
These are cleaned and modeled data mart tables. Data marts are the tables that have been joined and cleaned from the
source tables. This is effectively the latest layer of HelloDATA BE, which the Dashboards are accessing. Dashboards
should not access any layer before (landing zone, data storage, or data processing).

We use CloudBeaver for this, same as the DWH Viewer later.
![](../images/Pasted%20image%2020231130155752.png)


### Data Engineering

#### DWH Viewer

This is essentially a database access layer where you see all your tables, and you can write SQL queries based on your access roles with a provided tool ([CloudBeaver](https://github.com/dbeaver/cloudbeaver)).
This is essentially a database access layer where you see all your tables, and you can write SQL queries based on your
access roles with a provided tool ([CloudBeaver](https://github.com/dbeaver/cloudbeaver)).

##### Create new SQL Query

![](../images/Pasted%20image%2020231130154714.png)o

##### Choose Connection and stored queries
You can chose pre-defined connections and query your data warehouse. Also you can store queries that other user can see and use as well. Run your queries with (1).

![](../images/Pasted%20image%2020231130154943.png)
You can chose pre-defined connections and query your data warehouse. Also you can store queries that other user can see
and use as well. Run your queries with (1).

![](../images/Pasted%20image%2020231130154943.png)

##### Settings and Powerful features

You can set many settings, such as user status, and many more.

![](../images/Pasted%20image%2020231130154849.png)
Please find all setting and features in the [CloudBeaver Documentation](https://dbeaver.com/docs/cloudbeaver/).

#### Orchestration

The orchestrator is your task manager. You tell [Airflow](https://wiki.bedag.ch/pages/viewpage.action?pageId=1040683176#HDTechArchitecture&Concepts-TaskOrchestration-Airflow), our orchestrator, in which order the task will run. This is usually done ahead of time, and in the portal, you can see the latest runs and their status (successful, failed, etc.). 
The orchestrator is your task manager. You
tell[Airflow](https://wiki.bedag.ch/pages/viewpage.action?pageId=1040683176#HDTechArchitecture&Concepts-TaskOrchestration-Airflow),
our orchestrator, in which order the task will run. This is usually done ahead of time, and in the portal, you can see
the latest runs and their status (successful, failed, etc.).

- You can navigate to DAGs (2) and see all the details (3) with the DAG name, owner, runs, schedules, next run and recent.
- You can navigate to DAGs (2) and see all the details (3) with the DAG name, owner, runs, schedules, next run and
recent.
- You can also dive deeper into Datasets, Security, Admin or similar (4)
- Airflow offers lots of different visualization modes, e.g. the Graph view (6), that allows you to see each step of this task.
- Airflow offers lots of different visualization modes, e.g. the Graph view (6), that allows you to see each step of
this task.
- As you can see, you can choose calendar, task duration, Gantt, etc.

![](../images/1068204596.png)
![](../images/1068204607.png)

#### Jupyter Notebooks (Jupyter Hub)

If you have one of the roles of `HELLODATA_ADMIN`, `BUSINESS_DOMAIN_ADMIN`, or `DATA_DOMAIN_ADMIN`, you can access Jupyter Hub and its notebooks with:
If you have one of the roles of `HELLODATA_ADMIN`, `BUSINESS_DOMAIN_ADMIN`, or `DATA_DOMAIN_ADMIN`, you can access
Jupyter Hub and its notebooks with:

![](../images/Pasted%20image%2020240828153754.png)


That opens up Jupyter Hub where you choose the base image you want to start with. E.g. you choose Data Science to do ML workloads, or R if you solely want to work with R. This could look like this:
That opens up Jupyter Hub where you choose the base image you want to start with. E.g. you choose Data Science to do ML
workloads, or R if you solely want to work with R. This could look like this:

![](../images/Pasted%20image%2020240828153902.png)


After you can start creating notebooks with `File -> New -> Notebook`:

![](../images/Pasted%20image%2020240828155058.png)
Expand All @@ -125,9 +143,12 @@ After you can start running commands like you do in Jupyter Notebooks.
See the [official documentation](https://docs.jupyter.org/) for help or functions.

##### Connect to HD Postgres DB

By default, a connection to your own Postgres DB can be made.

The default session time is 24h as of now and can be changed with ENV `HELLODATA_JUPYTERHUB_TEMP_USER_PASSWORD_VALID_IN_DAYS`.
The default session time is 24h as of now and can be changed with ENV
`HELLODATA_JUPYTERHUB_TEMP_USER_PASSWORD_VALID_IN_DAYS`.

###### How to connect to the database

This is how to get a db-connection:
Expand All @@ -138,9 +159,11 @@ connection = connect() # use function, it fetches the temp user creds and establ
```

`connection` can be used to read from postgres.

###### Example

This is a more extensive example of querying the Postgres database. Imagine `SELECT version();` as your custom query or logic you want to do.
This is a more extensive example of querying the Postgres database. Imagine `SELECT version();` as your custom query or
logic you want to do.

```python
import sys
Expand Down Expand Up @@ -182,6 +205,7 @@ Here you manage the portal configurations such as user, roles, announcements, FA
#### Benutzerverwaltung / User Management

##### Adding user

First type your email and hit enter. Then choose the drop down and click on it.
![](../images/Pasted%20image%2020231130151446.png)

Expand All @@ -193,6 +217,7 @@ You should see something like this:
![](../images/Pasted%20image%2020231130151610.png)

##### Changing Permissions

1. Search the user you want to give or change permission
2. Scroll to the right
3. Click the green edit icon
Expand All @@ -208,20 +233,22 @@ And or give access to specific data domains:
![](../images/Pasted%20image%2020231130151816.png)

See more in [role-authorization-concept](role-authorization-concept.md).

#### Portal Rollenverwaltung / Portal Role Management

In this portal role management, you can see all the roles that exist.

!!! warning

Creating new roles are not supported, despite the fact "Rolle erstellen" button exists. All roles are defined and hard coded.


![](../images/Pasted%20image%2020231130152628.png)


##### Creating a new role

See how to create a new role below:
![](../images/Pasted%20image%2020231130152819.png)

#### Ankündigung / Announcement

You can simply create an announcement that goes to all users by `Ankündigung erstellen`:
Expand All @@ -236,9 +263,10 @@ You'll see a success if everything went well:
And this is how it looks to the users — It will appear until the user clicks the cross to close it.
![](../images/Pasted%20image%2020231130153220.png)


#### FAQ
The FAQ works the same as the announcements above. They are shown on the starting dashboard, but you can set the granularity of a data domain:

The FAQ works the same as the announcements above. They are shown on the starting dashboard, but you can set the
granularity of a data domain:

![](../images/Pasted%20image%2020231130153427.png)

Expand All @@ -247,66 +275,70 @@ And this is how it looks:

#### Dokumentationsmanagement / Documentation Management

Lastly, you can document the system with documentation management. Here you have one document that you can document everything in detail, and everyone can write to it. It will appear on the dashboard as well:
Lastly, you can document the system with documentation management. Here you have one document that you can document
everything in detail, and everyone can write to it. It will appear on the dashboard as well:

![](../images/Pasted%20image%2020231130153801.png)


### Monitoring

We provide two different ways of monitoring: 
We provide two different ways of monitoring:

- Status: 
- Workspaces
- Status:- Workspaces

![](../images/1068204614.png)

#### Status
It will show you details information on instances of HelloDATA, how is the situation for the Portal, is the monitoring running, etc.

It will show you details information on instances of HelloDATA, how is the situation for the Portal, is the monitoring
running, etc.
![](../images/1068204616.png)

#### Data Domains

In Monitoring your data domains you see each system and the link to the native application. You can easily and quickly observer permission, roles and users by different subsystems (1). Click the one you want, and you can choose different levels (2) for each, and see its permissions (3).
In Monitoring your data domains you see each system and the link to the native application. You can easily and quickly
observer permission, roles and users by different subsystems (1). Click the one you want, and you can choose different
levels (2) for each, and see its permissions (3).

![](../images/1068204622.png)

![](../images/1068204620.png)

By clicking on the blue underlined `DBT Docs`, you will be navigated to the native dbt docs. Same is true if you click on a Airflow or Superset instance.
By clicking on the blue underlined `DBT Docs`, you will be navigated to the native dbt docs. Same is true if you click
on a Airflow or Superset instance.

### DevTools

DevTools are additional tools HelloDATA provides out of the box to e.g. send Mail (Mailbox) or browse files (FileBrowser).
DevTools are additional tools HelloDATA provides out of the box to e.g. send Mail (Mailbox) or browse files (
FileBrowser).

![](../images/1068204623.png)

#### Mailbox

You can check in Mailbox (we use [MailHog](https://github.com/mailhog/MailHog)) what emails have been sending or what accounts are updated.|
You can check in Mailbox (we use[MailHog](https://github.com/mailhog/MailHog)) what emails have been sending or what
accounts are updated.|

![](../images/1068204627.png)

#### FileBrowser

Here you can browse all the documentation or code from the git repos as file browser. We use [FileBrowser](https://github.com/filebrowser/filebrowser) here. Please use with care, as some of the folder are system relevant.

!!! note "Log in"

Make sure you have the login credentials to log in. Your administrator should be able to provide these to you.

Here you can browse all the documentation or code from the git repos as file browser. We
use [SFTPGo](https://github.com/drakkan/sftpgo) here. Please use with care, as some of the folder are system relevant.

![](../images/1068204628.png)
![](../images/20250121155015.png)

## More: Know-How


- More help for Superset
- [Superset Documentation](https://superset.apache.org/docs/intro/)
- [Superset Documentation](https://superset.apache.org/docs/intro/)
- More help for dbt:
- [dbt Documentation](https://docs.getdbt.com/docs/collaborate/documentation)
- [dbt Developer Hub](https://docs.getdbt.com/)
- More about Airflow
- [Airflow Documentation](https://airflow.apache.org/docs/)


Find further important references, know-how, and best practices on [HelloDATA Know-How](https://confluence.bedag.ch/x/4wHXE).
- [dbt Documentation](https://docs.getdbt.com/docs/collaborate/documentation)
- [dbt Developer Hub](https://docs.getdbt.com/)
- More about Airflow
- [Airflow Documentation](https://airflow.apache.org/docs/)
- More about SFTPGo
- [SFTPGo Documentation](https://docs.sftpgo.com/)

Find further important references, know-how, and best practices
on[HelloDATA Know-How](https://confluence.bedag.ch/x/4wHXE).
37 changes: 37 additions & 0 deletions hello-data-sidecars/hello-data-sidecar-sftpgo/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
#
# Copyright © 2024, Kanton Bern
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of the <organization> nor the
# names of its contributors may be used to endorse or promote products
# derived from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL <COPYRIGHT HOLDER> BE LIABLE FOR ANY
# DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#

FROM eclipse-temurin:17.0.9_9-jre
LABEL MAINTAINER="HelloData Team"

# add 'app' user with uid 3012 and corresponding group
RUN useradd -d /app -m -s /bin/bash -u 3012 app -U
USER app
WORKDIR app

COPY target/*.jar /app/app.jar
CMD ["java", "-jar", "/app/app.jar"]
Loading

0 comments on commit 224f526

Please sign in to comment.