FEUP-PRI

This project explores the application of Information Processing and Retrieval techniques to the study of Portuguese monuments. We aim to develop an efficient system for collecting, organizing, and retrieving relevant data about historical landmarks across Portugal. This work contributes to the digital preservation of cultural heritage and supports the creation of user-friendly tools for educational and touristic purposes.

Milestone #1 - Data Preparation

The first milestone of this project focuses on data collection and processing. For the collection of data, we first determined what websites we would take the information from. We selected two different sources with three specific links: Rota do Românico; Wikipedia - List of National Monuments; Wikipedia - Categoria: Imóveis de interesse público em Portugal.

How the code works

As we explored the websites, we found that each one had a different HTML structure and, in some cases, even the same website had different HTML structures for each monument. To address this, we developed three distinct web scrapers: one for Rota do Românico; one for Wikipedia - List of National Monuments; and one for Wikipedia - List of Public Interest Real Estate. Each link provides a detailed explanation of how the data collection process and pipeline were implemented for each source.

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
m1		m1
m2/solr		m2/solr
m3		m3
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FEUP-PRI

Milestone #1 - Data Preparation

How the code works

About

Releases

Packages

Contributors 4

Languages

kikoveiga/feup-pri

Folders and files

Latest commit

History

Repository files navigation

FEUP-PRI

Milestone #1 - Data Preparation

How the code works

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages