Skip to content

Commit

Permalink
Cria catálogo de dados (#127)
Browse files Browse the repository at this point in the history
* Altera dbt_project

* Cria macro `get_models_with_tags`

* Insere tags de 'geolocalizacao'

* Insere tags de 'identificacao'

* Cria modelos de `catalogo`

* Insere nova função em `dev/utils.py`

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Atualiza CHANGELOG

* Atualiza CHANGELOG

* Atualiza model `operadoras.sql`

* Atualiza changelog

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
eng-rodrigocunha and pre-commit-ci[bot] authored Aug 2, 2024
1 parent 88cf689 commit f13d72d
Show file tree
Hide file tree
Showing 23 changed files with 172 additions and 11 deletions.
5 changes: 4 additions & 1 deletion queries/dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -285,4 +285,7 @@ models:
+schema: validacao_dados_jae
staging:
+materialized: incremental
+schema: validacao_dados_jae_staging
+schema: validacao_dados_jae_staging
catalogo:
+materialized: view
+schema: catalogo
15 changes: 15 additions & 0 deletions queries/dev/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
# from datetime import timedelta
from typing import Dict, List, Union

import requests

# import pandas as pd


Expand Down Expand Up @@ -62,3 +64,16 @@ def run_dbt_model(

print(f"\n>>> RUNNING: {run_command}\n")
os.system(run_command)


def fetch_dataset_sha(dataset_id: str):
"""Fetches the SHA of a branch from Github"""
url = "https://api.github.com/repos/prefeitura-rio/queries-rj-smtr"
url += f"/commits?queries-rj-smtr/rj_smtr/{dataset_id}"
response = requests.get(url)

if response.status_code != 200:
return None

dataset_version = response.json()[0]["sha"]
return {"version": dataset_version}
18 changes: 18 additions & 0 deletions queries/macros/get_models_with_tags.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
/* https://discourse.getdbt.com/t/get-all-dbt-table-model-names-from-a-tag-inside-another-model/7703 (modificado) */
{% macro get_models_with_tags(tags) %}

{% set models_with_tag = [] %}

{% for model in graph.nodes.values() | selectattr("resource_type", "equalto", "model") %}

{% for tag in tags %}
{% if tag in model.config.tags %}
{{ models_with_tag.append(model) }}
{% endif %}
{% endfor %}

{% endfor %}

{{ return(models_with_tag) }}

{% endmacro %}
6 changes: 6 additions & 0 deletions queries/models/br_rj_riodejaneiro_bilhetagem/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# Changelog - bilhetagem

## [2.1.4] - 2024-08-02

### Alterado
- Adiciona tag `geolocalizacao` aos modelos `gps_validador_van.sql` e `gps_validador.sql` (https://github.com/prefeitura-rio/pipelines_rj_smtr/pull/127)
- Adiciona tag `identificacao` ao modelo `staging_cliente.sql` (https://github.com/prefeitura-rio/pipelines_rj_smtr/pull/127)

## [2.1.3] - 2024-07-18

### Adicionado
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
"data_type":"date",
"granularity": "day"
},
tags=['geolocalizacao']
)
}}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
"data_type":"date",
"granularity": "day"
},
tags=['geolocalizacao']
)
}}

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
{{
config(
alias='cliente',
tags=['identificacao']
)
}}

Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# Changelog - onibus_gps_zirix

## [1.0.3] - 2024-08-02

### Alterado
- Adiciona tag `geolocalizacao` ao modelo `gps_sppo_zirix.sql` (https://github.com/prefeitura-rio/pipelines_rj_smtr/pull/127)

## [1.0.2] - 2024-07-02

### Adicionado
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@
'data_type':'date',
'granularity': 'day'
},
alias='gps_sppo'
alias='gps_sppo',
tags=['geolocalizacao']
)
}}
/*
Expand Down
6 changes: 6 additions & 0 deletions queries/models/br_rj_riodejaneiro_veiculos/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Changelog - br_rj_riodejaneiro_veiculos

## [1.0.1] - 2024-08-02

### Alterado
- Adiciona tag `geolocalizacao` aos modelos `gps_brt.sql` e `gps_sppo.sql` (https://github.com/prefeitura-rio/pipelines_rj_smtr/pull/127)
3 changes: 2 additions & 1 deletion queries/models/br_rj_riodejaneiro_veiculos/gps_brt.sql
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@
'field': 'data',
'data_type': 'date',
'granularity': 'day'
}
},
tags=['geolocalizacao']
)
}}
/*
Expand Down
3 changes: 2 additions & 1 deletion queries/models/br_rj_riodejaneiro_veiculos/gps_sppo.sql
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@
'field':"data",
'data_type':'date',
'granularity': 'day'
}
},
tags=['geolocalizacao']
)
}}
/*
Expand Down
6 changes: 6 additions & 0 deletions queries/models/cadastro/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# Changelog - cadastro

## [1.2.1] - 2024-08-02

### Alterado
- Adiciona tag `geolocalizacao` ao modelo `servicos.sql` (https://github.com/prefeitura-rio/pipelines_rj_smtr/pull/127)
- Adiciona tag `identificacao` ao modelo `operadoras.sql` (https://github.com/prefeitura-rio/pipelines_rj_smtr/pull/127)

## [1.2.0] - 2024-07-17

### Adicionado
Expand Down
3 changes: 2 additions & 1 deletion queries/models/cadastro/operadoras.sql
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
{{
config(
materialized="table"
materialized="table",
tags=["identificacao"]
)
}}

Expand Down
5 changes: 3 additions & 2 deletions queries/models/cadastro/servicos.sql
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
{{
config(
materialized='table'
)
materialized='table',
tags=['geolocalizacao']
),
}}

SELECT
Expand Down
20 changes: 20 additions & 0 deletions queries/models/catalogo/ed_metadado_coluna.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{% if execute %}
{% set models_with_tag = get_models_with_tags(["geolocalizacao", "identificacao"]) %}
{% do log("Models: \n", info=true) %}
{% for model in models_with_tag %}
{% do log(model.schema~"."~model.alias~"\n", info=true) %}
{% endfor %}
{% endif %}

SELECT
*
FROM
{{ ref("metadado_coluna") }}
WHERE
{% for model in models_with_tag %}
{% if not loop.first %}OR {% endif %}(dataset_id = "{{ model.schema }}"
AND table_id = "{{ model.alias }}")
{% endfor %}

OR (dataset_id = "br_rj_riodejaneiro_stpl_gps"
AND table_id = "registros")
9 changes: 9 additions & 0 deletions queries/models/catalogo/metadado_coluna.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
SELECT
table_catalog AS project_id,
table_schema AS dataset_id,
table_name AS table_id,
column_name,
data_type,
description
FROM
rj-smtr.`region-US`.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS
33 changes: 33 additions & 0 deletions queries/models/catalogo/schema.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
version: 2

models:
- name: ed_metadado_coluna
description: "Catálogo de dados de geolocalização e identificação do data lake da SMTR destinados ao Escritório de Dados (GP/ED)"
columns:
- name: project_id
description: "{{ doc('project_id') }}"
- name: dataset_id
description: "{{ doc('dataset_id') }}"
- name: table_id
description: "{{ doc('table_id') }}"
- name: column_name
description: "{{ doc('column_name') }}"
- name: data_type
description: "{{ doc('data_type') }}"
- name: description
description: "{{ doc('metadado_descricao') }}"
- name: metadado_coluna
description: "Catálogo de dados do data lake da SMTR"
columns:
- name: project_id
description: "{{ doc('project_id') }}"
- name: dataset_id
description: "{{ doc('dataset_id') }}"
- name: table_id
description: "{{ doc('table_id') }}"
- name: column_name
description: "{{ doc('column_name') }}"
- name: data_type
description: "{{ doc('data_type') }}"
- name: description
description: "{{ doc('metadado_descricao') }}"
26 changes: 25 additions & 1 deletion queries/models/docs.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,27 @@
{% docs consorcio %}
Consórcio ao qual o serviço pertence
{% enddocs %}
{% enddocs %}

{% docs project_id %}
Nome do projeto (rj-smtr)
{% enddocs %}

{% docs dataset_id %}
Nome do conjunto de dados
{% enddocs %}

{% docs table_id %}
Nome da tabela
{% enddocs %}

{% docs column_name %}
Nome da coluna
{% enddocs %}

{% docs data_type %}
Tipo de dado da coluna
{% enddocs %}

{% docs metadado_descricao %}
Descrição da coluna
{% enddocs %}
5 changes: 5 additions & 0 deletions queries/models/gtfs/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# Changelog - gtfs

## [1.1.8] - 2024-08-02

### Alterado
- Adiciona tag `geolocalizacao` aos modelos `shapes_geom_gtfs.sql`, `shapes_gtfs.sql` e `stops_gtfs.sql` (https://github.com/prefeitura-rio/pipelines_rj_smtr/pull/127)

## [1.1.7] - 2024-07-23

### Adicionado
Expand Down
3 changes: 2 additions & 1 deletion queries/models/gtfs/shapes_geom_gtfs.sql
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
'data_type' :'date',
'granularity': 'day' },
unique_key = ['shape_id', 'feed_start_date'],
alias = 'shapes_geom'
alias = 'shapes_geom',
tags=['geolocalizacao']
) }}

{% if execute and is_incremental() %}
Expand Down
3 changes: 2 additions & 1 deletion queries/models/gtfs/shapes_gtfs.sql
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
'data_type' :'date',
'granularity': 'day' },
unique_key = ['shape_id', 'shape_pt_sequence', 'feed_start_date'],
alias = 'shapes'
alias = 'shapes',
tags=['geolocalizacao']
)}}

{% if execute and is_incremental() %}
Expand Down
3 changes: 2 additions & 1 deletion queries/models/gtfs/stops_gtfs.sql
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
'data_type' :'date',
'granularity': 'day' },
unique_key = ['stop_id', 'feed_start_date'],
alias = 'stops'
alias = 'stops',
tags=['geolocalizacao']
)}}

{% if execute and is_incremental() %}
Expand Down

0 comments on commit f13d72d

Please sign in to comment.