Skip to content

Information Processing and Retrieval (PRI) Project (2024/2025): MSc in Informatics and Computing Engineering @ FEUP

Notifications You must be signed in to change notification settings

kikoveiga/feup-pri

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 

Repository files navigation

FEUP-PRI

    This project explores the application of Information Processing and Retrieval techniques to the study of Portuguese monuments. We aim to develop an efficient system for collecting, organizing, and retrieving relevant data about historical landmarks across Portugal. This work contributes to the digital preservation of cultural heritage and supports the creation of user-friendly tools for educational and touristic purposes.

Milestone #1 - Data Preparation

    The first milestone of this project focuses on data collection and processing. For the collection of data, we first determined what websites we would take the information from. We selected two different sources with three specific links: Rota do Românico; Wikipedia - List of National Monuments; Wikipedia - Categoria: Imóveis de interesse público em Portugal.

How the code works

    As we explored the websites, we found that each one had a different HTML structure and, in some cases, even the same website had different HTML structures for each monument. To address this, we developed three distinct web scrapers: one for Rota do Românico; one for Wikipedia - List of National Monuments; and one for Wikipedia - List of Public Interest Real Estate. Each link provides a detailed explanation of how the data collection process and pipeline were implemented for each source.

About

Information Processing and Retrieval (PRI) Project (2024/2025): MSc in Informatics and Computing Engineering @ FEUP

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •