diff --git a/styles.css b/styles.css index 34c030b..6667cf4 100644 --- a/styles.css +++ b/styles.css @@ -107,6 +107,10 @@ text-decoration: underline #ff3cc7 3px; } +.callout { + border-left-color: #a5d7d2 !important; +} + /* Print styles */ @media print { body { diff --git a/submissions/405/index.qmd b/submissions/405/index.qmd index 01c44b8..4d9f10d 100644 --- a/submissions/405/index.qmd +++ b/submissions/405/index.qmd @@ -19,10 +19,20 @@ key-points: - Data-driven research offers significant opportunities for analyzing large volumes of web archived data to reveal trends in the complexity of the structure of the preserved websites and the development of content. - Implementation of data-driven research in web history is challenging due to issues like data incompleteness, biases, and the diversity of file formats, which require the development of innovative solutions and digital research infrastructures. - The research highlights challenges in obtaining comprehensive datasets from web archives, underscores the importance of assessing data quality, and indicates the need to address the heterogeneous nature of data preserved in web archives. -date: 07-24-2024 +date: 09-12-2024 +doi: 10.5281/zenodo.13904210 +other-links: + - text: Presentation Slides (PDF) + href: https://doi.org/10.5281/zenodo.13904210 bibliography: references.bib --- +::: {.callout-note appearance="simple" icon=false} + +For this paper, slides are available [on Zenodo (PDF)](https://zenodo.org/records/13904210/files/405_DigiHistCH24_HistoryOfMuseums_Slides.pdf). + +::: + ## Introduction Data-driven approaches bring extensive opportunities for research to analyze large volumes of data, and gain new knowledge and insights. This is considered especially beneficial for implementation in the humanities and social sciences [@weichselbraun2021]. Application of data-driven research methodologies in the field of history requires a sufficient source base, which should be accurate, transparently shaped and large enough for robust analysis [@braake2016]. diff --git a/submissions/427/index.qmd b/submissions/427/index.qmd index 7df15f1..e562ae5 100644 --- a/submissions/427/index.qmd +++ b/submissions/427/index.qmd @@ -25,10 +25,21 @@ abstract: | key-points: - Software development is increasingly important in digital humanities research projects, yet many struggle to implement modern engineering practices that enhance sustainability and speed up development. - Developing an XML schema for a scholarly edition project is challenging but can provide a solid foundation for the project when executed effectively. -date: 07-25-2024 +date: 09-13-2024 +date-modified: 11-15-2024 +doi: 10.5281/zenodo.14171339 +other-links: + - text: Presentation Slides (PDF) + href: https://doi.org/10.5281/zenodo.14171339 bibliography: references.bib --- +::: {.callout-note appearance="simple" icon=false} + +For this paper, slides are available [on Zenodo (PDF)](https://zenodo.org/records/14171339/files/427_DigiHistCH24_SolidGround_Slides.pdf). + +::: + ## Introduction ### General Problem Description diff --git a/submissions/428/index.qmd b/submissions/428/index.qmd index 54f10eb..abd1f2c 100644 --- a/submissions/428/index.qmd +++ b/submissions/428/index.qmd @@ -14,51 +14,72 @@ keywords: abstract: | This project on digital data management explores the use of XML structures, specifically the Text Encoding Initiative (TEI), to digitize historical statistical tables from Zurich's 1918 pandemic data. The goal was to make these health statistics tables reusable, interoperable, and machine-readable. Following the retro-digitization of statistical publications by Zurich's Central Library, the content was semi-automatically captured with OCR in Excel and converted to XML using TEI guidelines. - However, OCR software struggled to accurately capture table content, requiring manual data entry, which introduced potential errors. Ideally, OCR tools would allow for direct XML export from PDFs. The implementation of TEI for tables remains a challenge, as TEI is primarily focused on running text rather than tabular data, as noted by TEI pioneer Lou Burnard. + However, OCR software struggled to accurately capture table content, requiring manual data entry, which introduced potential errors. Ideally, OCR tools would allow for direct XML export from PDFs. The implementation of TEI for tables remains a challenge, as TEI is primarily focused on running text rather than tabular data, as noted by TEI pioneer Lou Burnard. - Despite these challenges, TEI data processing offers opportunities for conceptualizing tabular data structures and ensuring traceability of changes, especially in serial statistics. An example is a project using early-modern Basle account books, which were "upcycled" following TEI principles. Additionally, TEI's structured approach could help improve the accuracy of table text recognition in future projects. -date: 08-15-2024 + Despite these challenges, TEI data processing offers opportunities for conceptualizing tabular data structures and ensuring traceability of changes, especially in serial statistics. An example is a project using early-modern Basle account books, which were "upcycled" following TEI principles. Additionally, TEI's structured approach could help improve the accuracy of table text recognition in future projects. +date: 09-12-2024 +date-modified: last-modified +doi: 10.5281/zenodo.13903990 +other-links: + - text: Presentation Slides (PDF) + href: https://doi.org/10.5281/zenodo.13903990 bibliography: references.bib --- +::: {.callout-note appearance="simple" icon=false} + +For this paper, slides are available [on Zenodo (PDF)](https://zenodo.org/records/13903990/files/428_DigiHistCH24_TrickyTables_Slides.pdf). + +::: + ## Introduction In 2121, nothing is as it once was: a nasty virus is keeping the world on tenterhooks – and people trapped in their own four walls. In the depths of the metaverse, contemporaries are searching for data to compare the frightening death toll of the current killer virus with its predecessors during the Covid-19 pandemic and the «Spanish flu». There is an incredible amount of statistical material on the Covid-19 pandemic in particular, but annoyingly, this is only available in obscure data formats such as .xslx in the internet archives. They can still be opened with the usual text editors, but their structure is terribly confusing and unreadable with the latest statistical tools. If only those digital hillbillies in the 2020s had used a structured format that not only long-outdated machines but also people in the year 2121 could read... -Admittedly, very few epidemiologists, statisticians and federal officials are likely to have considered such future scenarios during the pandemic years. Quantitative social sciences and the humanities, including medical and economic history, but also memory institutions such as archives and libraries, should consciously consider how they can sustainably preserve the flood of digital data for future generations. Thus, the sustainable processing and storage of statistical printed data from the time of the First World War makes it possible to gain new insights into the so-called "Spanish flu" e. g. in the city of Zurich even today. The publications by the Statistical Office of the City of Zurich, which were previously only available in “analog” paper format, have been digitized by the Zentralbibliothek (Central Library, ZB) Zurich as part of Joël Floris' Willy Bretscher Fellowship 2022/2023 (@Floris2023). This project paper has been written in the context of this digitisation project, as issues regarding digital recording, processing, and storage of historical statistics have always occupied quantitative economic historians “for professional reasons”. + +Admittedly, very few epidemiologists, statisticians and federal officials are likely to have considered such future scenarios during the pandemic years. Quantitative social sciences and the humanities, including medical and economic history, but also memory institutions such as archives and libraries, should consciously consider how they can sustainably preserve the flood of digital data for future generations. Thus, the sustainable processing and storage of statistical printed data from the time of the First World War makes it possible to gain new insights into the so-called "Spanish flu" e. g. in the city of Zurich even today. The publications by the Statistical Office of the City of Zurich, which were previously only available in “analog” paper format, have been digitized by the Zentralbibliothek (Central Library, ZB) Zurich as part of Joël Floris' Willy Bretscher Fellowship 2022/2023 [@Floris2023]. This project paper has been written in the context of this digitisation project, as issues regarding digital recording, processing, and storage of historical statistics have always occupied quantitative economic historians “for professional reasons”. + The basic idea of this paper is to prepare tables with historical health statistics in a sustainable way so that they can be easily analysed using digital means. The aim was to capture the statistical publications retro-digitized by the ZB semi-automatically with OCR in Excel tables and to prepare them as XML documents according to the guidelines of the Text Encoding Initiative (TEI), a standardized vocabulary for text structures. To do this, it was first necessary to familiarise with TEI and its appropriate modules, and to apply them to a sample table in Excel. To be able to validate the Excel table manually transferred to XML, I then developed a schema based on the vocabularies of XML and TEI. This could then serve as the basis for an automated conversion of the Excel tables into TEI-compliant XML documents. Such clearly structured XML documents should ultimately be relatively easy to convert into formats that can be read into a wide variety of visualisation and statistical tools. ## Data description A table from the monthly reports of the Zurich Statistical Office serves as an example data set. The monthly reports were digitised as high-resolution pdfs with underlying Optical Character Recognition (OCR) based on Tesseract by the Central Library's Digitisation Centre (DigiZ) as part of the Willi Bretscher Fellowship project. They are available on the ZB’s Zurich Open Platform (ZOP, @SASZ1919), including detailed metadata information. They were published by the Statistical Office of the City of Zurich as a journal volume under this title between 1908 and 1919, and then as «Quarterly Reports» until 1923. The monthly reports each consist of a 27-page table section with individual footnotes, and conclude with a two-page explanatory section in continuous text. + For this study, the data selection is limited to a table for the year 1914 and the month of January (@SASZ1919). In connection with Joël Floris' project, which aims at obtaining quantitative information on Zurich's demographic development during the «Spanish flu» from the retro-digitisation project, it was obvious to focus on tables with causes of death. The corresponding table number 12 entitled «Die Gestorbenen (in der Wohnbev.) nach Todesursachen und Alter» («The Deceased (in the Resident Pop.) by Cause of Death and Age») can be found on page seven of the monthly report. It contains monthly data on causes of death, broken down by age group and gender, as well as comparative figures for the same month of the previous year. The content of this table is to be prepared below in the form of a standardized XML document with an associated schema that complies with the TEI guidelines. ## Methods for capturing historical tables in XML -The source of inspiration for this project paper was a pioneering research project originally based at the University of Basle. In the research project, the annual accounts of the city of Basle from 1535 to 1610 were digitally edited (@Burghartz2015). Technical implementation was carried out by the Center for Information Modeling at the University of Graz. Based on a digital text edition prepared in accordance with the TEI standard, the project manages to combine facsimile, web editing in HTML, and table editing via an RDF (Resource Description Framework ) and XSLT (eXtensible Stylesheet Language Transformations ) in an exemplary manner. The edition thus allows users to compile their own selection of booking data in a "data basket" for subsequent machine-readable analysis. In an accompanying article, project team member Georg Vogeler describes the first-time implementation of a numerical evaluation and how "even extensive holdings can be efficiently edited digitally" [@Vogeler2015]. However, as mentioned, the central basis for this is XML processing of the corresponding tabular information based on the TEI standard. -This project is based on the April 2022 version (4.4.0) of the TEI guidelines (@Burnard2022). They include a short chapter on the preparation of tables, formulas, graphics, and music. And even the introduction to Chapter 14 is being rather cautious with regard to TEI application for table formats, warning that layout and presentation details are more important in table formats than in running text, and that they are already covered more comprehensively by other standards and should be prepared accordingly in these notations. +The source of inspiration for this project paper was a pioneering research project originally based at the University of Basle. In the research project, the annual accounts of the city of Basle from 1535 to 1610 were digitally edited [@Burghartz2015]. Technical implementation was carried out by the Center for Information Modeling at the University of Graz. Based on a digital text edition prepared in accordance with the TEI standard, the project manages to combine facsimile, web editing in HTML, and table editing via an RDF (Resource Description Framework ) and XSLT (eXtensible Stylesheet Language Transformations ) in an exemplary manner. The edition thus allows users to compile their own selection of booking data in a "data basket" for subsequent machine-readable analysis. In an accompanying article, project team member Georg Vogeler describes the first-time implementation of a numerical evaluation and how "even extensive holdings can be efficiently edited digitally" [@Vogeler2015]. However, as mentioned, the central basis for this is XML processing of the corresponding tabular information based on the TEI standard. + +This project is based on the April 2022 version (4.4.0) of the TEI guidelines [@Burnard2022]. They include a short chapter on the preparation of tables, formulas, graphics, and music. And even the introduction to Chapter 14 is being rather cautious with regard to TEI application for table formats, warning that layout and presentation details are more important in table formats than in running text, and that they are already covered more comprehensively by other standards and should be prepared accordingly in these notations. On asking the TEI-L mailing list whether it made sense to prepare historical tables with the TEI table module, the answers were rather reserved (). Only the Graz team remained optimistic that TEI could be used to process historical tables, albeit in combination with an RDF including a corresponding ontology. Christopher Pollin also provided github links via TEI to the DEPCHA project, in which they are developing an ontology for annotating transactions in historical account books. ## Table structure in TEI-XML Basically, the TEI schema treats a table as a special text element consisting of line elements, which in turn contain cell elements. This basic structure was used to code Table 12 from 1914, which I transcribed manually as an Excel file. Because exact formatting including precise reproduction of the frame lines is very time-consuming, the frame lines in the project work only served as structural information and are not included as topographical line elements as TEI demands. Long dashes, which correspond to zero values in the source, are interpreted as empty values in the TEI-XML. I used the resulting worksheet as the basis for the TEI-XML annotation, in which I also added some metadata. I then had to create an adapted local schema as well as a TEI header, before structuring the table’s text body. Suitable heading ("head") elements are the title of the table, the table number as a note and the «date» of the table. The first table row contains the column headings and is assigned the role attribute "label" accordingly. The third-last cell of each row contains the row total, which I have given the attribute "ana" for analysis and the value "#sum" for total, following the example of the Basle Edition. + The first cell of each row again names the cause of death and must therefore also be labelled with the role attribute "label". The second last row shows the sum of the current monthly table, which is why it is given the "#sum" attribute for all respective cells. Finally, the last line shows the total for the previous year's month. It is therefore not only marked with the sum attribute, but also with a date in the label cell. A potential confounding factor for later calculations is the row "including diarrhea", which further specifies diseases of the digestive organs but must not be included in the column total. Accordingly, it is provided with another analytical attribute called "#exsum". As each cell in the code represents a separate element, the »digitally upcycled table 12 in XML format ultimately extends over a good 550 lines of code, which I’m happy to share on request. ## Challenges and problems An initial problem already arose during the OCR-based digitisation. The Central Library (ZB)'s Tesseract-based OCR software, which specializes in continuous text, simply failed to capture the text in the tables. I therefore first had to transcribe the table by hand, which is error-prone. In principle, however, it is irrelevant in TEI in which format the original text was created. The potential for errors when transferring Excel data into the "original" XML is also high, especially if the table is complex and/or detailed. Ideally, i. e. with a clean OCR table, it ought to be possible to export OCR content in pdfs to XML. When speaking with the ZB’s DigiZ, they confirmed not being happy with OCR quality anymore, and are considering improvement with regard to precision. + Due to the extremely short instructions for table preparation in TEI, I underestimated the variety of different text components that TEI offers. The complexity of TEI is not clear from the rough overview of the individual chapters and their introductory descriptions. This only became clear while adjusting table 12 to TEI standards. By becoming accustomed to TEI, its limitations regarding table preparation also became more evident: It is fundamentally geared towards structuring continuous text rather than text forms, where the structure or layout also indicates meaning, as is the case with tables. + The conversion of the sample table into XML and the preparation of an associated TEI schema, which is reduced to the elements present in the sample document, yet remains valid with the TEI standard, proved to be time-consuming code work. Thus, both the sample XML and the local schema each comprise over 500 lines of code – and this basically for only a single – though complex – table with a few metadata. In addition, the extremely comprehensive and complex TEI schema on which my XML document is based is not suitable for implementation in Excel. As a result, I had to prepare an XML table schema that was as general as possible, which may be used to convert the Excel tables into XML in the future, thus reducing error potential of the XML conversion. ## Ideas for Project Expansion Because, as mentioned, the OCR output of the tables in this case is not usable, it should now be crucial for any digitisation project to achieve high-quality OCR of the retro-digitised tables. Table recognition is definitely an issue in economic history research, and there are several open source development tools around on Git-Repositories, which yet have to set a standard, however. + Ideally, the tables recognized in this way would then provide better text structures in the facsimile. With the module for the transcription of original sources, TEI offers extensive possibilities for linking text passages in the transcription with the corresponding passages in the facsimiles. Such links could ideally be used as training data for text recognition programs to improve their performance in the area of table recognition. Other TEI elements that lend structure to the table, such as the dividing lines and the long dashes for the empty values, could also serve as such structural recognition features. + Additional important TEI elements such as locations and gender would further increase the content of the TEI XML format. Detailed metadata, as e.g. provided by the retro-digitized version of the ZOP, can be easily integrated into the TEI header area "xenodata". Finally, in view of the complex structure of the tables, it is essential to understand and implement XSLT (eXtensible Stylesheet Language Transformation) for automated structuring, and as a basis for RDF used e.g. by the Graz team. ## Conclusion So far, tables seem to have had a shadowy existence within the Text Encoding Initiative (TEI) – or, as TEI pioneer Lou Burnard remarked in the TEI mailing list on behalf of my question whether TEI processing of tables made sense: "Tables are tricky". The main reason for this probably lies in the continuous text orientation of existing tools and users, who are also less interested in numerical formats. + In principle, however, preparation according to the TEI standard offers the opportunity to think conceptually about the function of tabularly structured data and to make changes, e.g. in serial sources such as statistical tables, comprehensible. The clearly structured text processing of TEI could provide a basis for improving the still rather poor quality of text recognition programs when recording tables. And a platform-independent, non-proprietary data structure such as XML would be almost indispensable for the sustainable long-term archiving of "digitally born" statistics, which have experienced a boom in recent years, and especially during the pandemic. After all, our descendants should also be able to access historical statistics during the next one. ## References diff --git a/submissions/429/index.qmd b/submissions/429/index.qmd index 93475cb..331e357 100644 --- a/submissions/429/index.qmd +++ b/submissions/429/index.qmd @@ -23,10 +23,21 @@ key-points: - The Techn’hom Time Machine project aims to offer a virtual reality reconstruction of a former spinning mill in the city of Belfort (France), with its machines and activities. - Students from the Belfort-Montbéliard University of Technology participate directly in the project by modeling buildings, machines, or by working on knowledge engineering. - Their reports make it possible to identify points that most marked them, namely the discovery of human sciences and their difficulties, as well as new technical and organizational skills learning. -date: 07-26-2024 +date: 09-13-2024 +date-modified: 11-15-2024 +doi: 10.5281/zenodo.14171328 +other-links: + - text: Presentation Slides (PDF) + href: https://doi.org/10.5281/zenodo.14171328 bibliography: references.bib --- +::: {.callout-note appearance="simple" icon=false} + +For this paper, slides are available [on Zenodo (PDF)](https://zenodo.org/records/14171328/files/429_DigiHistCH24_Technhom_Slides.pdf). + +::: + ## Introduction Part of the national Lab In Virtuo project (2021-2024), the Techn'hom Time Machine project, initiated in 2019 by the Belfort-Montbéliard University of Technology, aims to study and digitally restore the history of an industrial neighborhood, with teacher-researchers but also students as co-constructors [@Gasnier2014 ; @Gasnier2020, p. 293]. The project is thus located at the interface of pedagogy and research. The Techn'hom district was created after the Franco-Prussian War of 1870 with two companies from Alsace: the Société Alsacienne de Constructions Mécaniques, nowadays Alstom; and the Dollfus-Mieg et Compagnie (DMC) spinning mill, in operation from 1879 to 1959. The project aims to create a “Time Machine” of these industrial areas, beginning with the spinning mill. We seek to restore in four dimensions (including time) buildings, machines with their operation, but also document and model sociability and know-how, down to the gestures and feelings. The resulting “Sensory Realistic Intelligent Virtual Environment” should allow both researchers and general public to virtually discover places and “facts” taking place in the industry, but also interact with them or even make modifications. diff --git a/submissions/431/index.qmd b/submissions/431/index.qmd index 7df333f..5c24dce 100644 --- a/submissions/431/index.qmd +++ b/submissions/431/index.qmd @@ -25,7 +25,12 @@ key-points: - Key point 1 The Repertorium Academicum Germanicum (RAG) focuses on the knowledge influence of medieval scholars in pre-modern Europe, creating a comprehensive research database. - Key point 2 The RAG database, with data on 62,000 scholars, has advanced from manual to computer-aided and AI-assisted data collection and analysis. - Key point 3 Technological advancements, including the use of nodegoat, have enhanced data management, collaboration, and accessibility, integrating AI for improved historical data analysis. -date: 07-07-2024 +date: 09-12-2024 +date-modified: 11-15-2024 +doi: 10.5281/zenodo.14171301 +other-links: + - text: Post on Personal Blog + href: https://doi.org/10.58079/126xr bibliography: references.bib --- diff --git a/submissions/438/index.qmd b/submissions/438/index.qmd index fe4ee94..a78d39f 100644 --- a/submissions/438/index.qmd +++ b/submissions/438/index.qmd @@ -22,11 +22,23 @@ key-points: - The study of video game graphics integrates narrative and aesthetic aspects with interactive and functional elements, differing significantly from classical visual media. - The Framework for the Analysis of Visual Representation in Video Games (FAVR) provides a structured approach to analyze video game images through annotation, focusing on their formal, material, and functional aspects. - The initial implementation of the FAVR framework as a linked open ontology for tools like Tropy has proven valuable in formally analyzing video game images and comparing aspects such as dynamic versus static image space, facilitating further digital and computational research. -date: 07-19-2024 -date-modified: last-modified +date: 09-12-2024 +date-modified: 10-10-2024 +doi: 10.5281/zenodo.13904453 bibliography: references.bib +other-links: + - text: Presentation Slides (PDF) + href: https://doi.org/10.5281/zenodo.13904453 + - text: Transcript + href: transcript.html --- +::: {.callout-note appearance="simple" icon=false} + +For this paper, slides are available [on Zenodo (PDF)](https://zenodo.org/records/13904453/files/438_DigiHistCH24_VideoGameGraphics_Slides.pdf). + +::: + The 1980s marked the arrival of the home computer. Computing systems became affordable and were marketed to private consumers through state-supported programs and new economic opportunities [@haddonHomeComputerMaking1988; @williamsEarlyComputersEurope1976]. Early models, such as the ZX Spectrum[^1], Texas Instrument TI-99/4A[^2], or the Atari[^3], quickly became popular in Europe and opened the door for digital technology to enter the home. This period also marks the advent of homebrew video game culture and newly emerging creative programming practices [@swalwellHomebrewGamingBeginnings2021; @albertsHackingEuropeComputer2014]. As part of this process, these early programmers not only had to figure out how to develop video games but also were among the first to incorporate graphics into video games. This created fertile grounds for a new array of video game genres and helped popularize video games as a mainstream media. I’m researching graphics programming for video games from the 1980s and 1990s. The difference to other visual media lies in the amalgamation of computing and the expression of productive or creative intent by video game designers and developers. The specifics of video game graphics are deeply rooted in how human ideas must be translated into instructions that a computer understands. This necessitates a mediation between the computer's pure logic and a playing person's phenomenological experience. In other words, the video game image is a specific type of interface that needs to take care of a semiotic layer and offer functional affordances. I am interested in how early video game programmers worked with these interfaces, incorporating their own visual inspirations and attempting to work with the limited resources at hand. Besides critical source code analysis, I also extensively analyze formal aspects of video game images. For the latter, I depend on FAVR to properly describe and annotate images in datasets relevant to my inquiries. The framework explicitly deals with problems of analyzing video game graphics. It guides the annotation of images by their functional, material, and formal aspects and aids in analyzing narrativity and the rhetoric of aesthetic aspects [@arsenaultGameFAVRFramework2015]. diff --git a/submissions/438/presentation/A handful of pixels of blood - Slides.pdf b/submissions/438/presentation/A handful of pixels of blood - Slides.pdf deleted file mode 100644 index 1de883e..0000000 Binary files a/submissions/438/presentation/A handful of pixels of blood - Slides.pdf and /dev/null differ diff --git a/submissions/438/presentation/A handful of pixels of blood - Transcript.pdf b/submissions/438/presentation/A handful of pixels of blood - Transcript.pdf deleted file mode 100644 index a02fa91..0000000 Binary files a/submissions/438/presentation/A handful of pixels of blood - Transcript.pdf and /dev/null differ diff --git a/submissions/438/presentation/A handful of pixels of blood - Transcript.md b/submissions/438/transcript.md similarity index 96% rename from submissions/438/presentation/A handful of pixels of blood - Transcript.md rename to submissions/438/transcript.md index caf5551..bcabaf2 100644 --- a/submissions/438/presentation/A handful of pixels of blood - Transcript.md +++ b/submissions/438/transcript.md @@ -1,10 +1,19 @@ --- -created: 2024-09-06T11:02 -updated: 2024-09-09T13:04 +submission_id: 438_Transcript +title: A handful of pixels of blood – Decoding early video game graphics with the FAVR ontology +subtitle: Transcript +author: + - name: Adrian Demleitner + orcid: 0000-0001-9918-7300 + email: adrian.demleitner@hkb.bfh.ch + affiliations: + - University of the Arts Bern + - University of Bern +date: 2024-09-06T11:02 +date-modified: 2024-09-09T13:04 +doi: 10.5281/zenodo.13904453 --- -# A handful of pixels of blood - ## A Historical and Technological Perspective on Understanding Video Game Graphics Good afternoon, colleagues. Today, I'd like to share with you parts of my research on video game programming practices of the 1980s and 1990s, with a particular focus on graphics. This work is an integral part of my dissertation, where I'm exploring the technological foundations of video games as a popular medium. diff --git a/submissions/443/index.qmd b/submissions/443/index.qmd index b6d0636..55bda1d 100644 --- a/submissions/443/index.qmd +++ b/submissions/443/index.qmd @@ -32,8 +32,8 @@ keywords: - historical scholarship abstract: | The Impresso project pioneers the exploration of historical media content across temporal, linguistic, and geographical boundaries. In its initial phase (2017-2020), the project developed a scalable infrastructure for Swiss and Luxembourgish newspapers, featuring a powerful search interface. The second phase, beginning in 2023, expands the focus to connect media archives across languages and modalities, creating a Western European corpus of newspaper and broadcast collections for transnational research on historical media. In this presentation, we introduce Impresso 2 and discuss some of the specific challenges to connecting newspaper and radio. - -date: 04-09-2024 +date: 09-12-2024 +doi: 10.5281/zenodo.13907298 bibliography: references.bib --- diff --git a/submissions/444/index.qmd b/submissions/444/index.qmd index 4d4828e..a8c0b1c 100644 --- a/submissions/444/index.qmd +++ b/submissions/444/index.qmd @@ -24,10 +24,21 @@ key-points: - Key point 1 Hybrid thinking or multidisciplinary collaboration always takes much more time than one estimates, and it is useful to develop several complementary ways of working with data together in order to understand their local specificities. - Key point 2 Archival metadata is an untapped research resource for digital humanities, but its use requires close collaboration with cultural heritage organisations and practical knowledge of archival practices. - Key point 3 The user test survey of the portal with 19th-century letter metadata showed that building a committed test group is challenging and that 'traditional' humanists have difficulties in studying mass data. -date: 07-04-2024 +date: 09-12-2024 +date-modified: 11-15-2024 +doi: 10.5281/zenodo.14171306 +other-links: + - text: Presentation Slides (PDF) + href: https://doi.org/10.5281/zenodo.14171306 bibliography: references.bib --- +::: {.callout-note appearance="simple" icon=false} + +For this paper, slides are available [on Zenodo (PDF)](https://zenodo.org/records/14171306/files/444_DigiHistCH24_LetterMetadata_Slides.pdf). + +::: + ## Introduction This paper discusses data and practices related to an ongoing digital humanities consortium project *Constellations of Correspondence – Large and Small Networks of Epistolary Exchange in the Grand Duchy of Finland* (CoCo; Research Council of Finland, 2021–2025). The project aggregates, analyses and publishes 19th-century epistolary metadata from letter collections of Finnish cultural heritage (CH) organisations on a Linked Open Data service and as a semantic web portal (the ‘CoCo Portal’), and it consists of three research teams, bringing together computational and humanities expertise. We focus exclusively on metadata considering them to be part of the cultural heritage and a fruitful starting point for research, providing access i.e. to 19th-century epistolary culture and archival biases. The project started with a webropol survey addressed to over 100 CH organisations to get an overview of the preserved 19th-century letters and the Finnish public organisations willing to share their letter metadata with us. Currently the CoCo portal includes seven CH organisations and four online letter publications with the metadata of over 997.000 letters and with 95.000 actors (senders and recipients of letters). diff --git a/submissions/445/index.qmd b/submissions/445/index.qmd index f942243..e375263 100644 --- a/submissions/445/index.qmd +++ b/submissions/445/index.qmd @@ -22,9 +22,21 @@ key-points: - Online teaching modules on ATR are a desideratum currently, interested persons must familiarise themselves with the subject themselves at considerable time and effort. - ATR tools are in a constant state of flux, which is why teaching modules should explain the wider context and not specific buttons. - Working with historical documents today often takes place at the intersection between tried and tested analog methods and new digital approaches, which is why our teaching module takes these intersections into account. -date: 07-26-2024 +date: 09-12-2024 +date-modified: 11-15-2024 +doi: 10.5281/zenodo.14171285 +other-links: + - text: Presentation Slides (PDF) + href: https://doi.org/10.5281/zenodo.14171285 bibliography: references.bib --- + +::: {.callout-note appearance="simple" icon=false} + +For this paper, slides are available [on Zenodo (PDF)](https://zenodo.org/records/14171285/files/445_DigiHistCH24_AdFontes_Slides.pdf). + +::: + ## Introduction Scholars and interested laypeople who want to adequately deal with historical topics or generally extract information from differently structured historical documents need both knowledge of old scripts and methods for analysing complex layouts. Studies of written artefacts are only possible if they can be read at all – written in unfamiliar scripts such as Gothic Cursive, Humanist Minuscule or German Kurrent and sometimes with rather unconventional layouts. Until now, the relevant skills have been developed, for example, by the highly specialised field of palaeography. In the last few years, a shift in practice has taken place. With digital transcription tools based on deep learning models trained to read these old scripts and accompanying layouts on the rise, working with old documents or unusual layouts is becoming easier and quicker. However, using the corresponding software and platforms can still be intimidating. Users need to have a particular understanding of how to approach working with Automated Text Recognition (ATR) depending on their projects aims. This is why the Ad fontes platform [@noauthor_ad_2018] is currently developing an e-learning module that introduces students, researchers, and other interested users (e.g. citizen scientists) to ATR, its use cases, and best practices in general and more specifically into how exactly they can use ATR for their papers and projects. diff --git a/submissions/447/index.qmd b/submissions/447/index.qmd index df0b474..0af47ab 100644 --- a/submissions/447/index.qmd +++ b/submissions/447/index.qmd @@ -61,13 +61,15 @@ keywords: - Ontology - FAIR Data abstract: This article explores the significance of the Geovistory platform in the context of the growing Open Science movement within the Humanities, particularly its role in facilitating the production and reuse of FAIR data. As funding agencies increasingly mandate the publication of research data in compliance with FAIR principles, researchers face the dual challenge of mastering new methodologies in data management and adapting to a digital research landscape. Geovistory provides a comprehensive research environment specifically designed to meet the needs of historians and humanists, offering intuitive tools for managing research data, establishing a collaborative Knowledge Graph, and enhancing scholarly communication. By integrating semantic methodologies in the development of a modular ontology, Geovistory fosters interoperability among research projects, enabling scholars to draw on a rich pool of shared information while maintaining control over their data. Additionally, the platform addresses the inherent complexities of historical information, allowing for the coexistence of diverse interpretations and facilitating nuanced digital analyses. Despite its promising developments, the Digital Humanities ecosystem faces challenges related to funding and collaboration. The article concludes that sustained investment and strengthened partnerships among institutions are essential for ensuring the longevity and effectiveness of initiatives like Geovistory, ultimately enriching the field of Humanities research. -date: 07-26-2024 +date: 09-12-2024 +date-modified: 10-13-2024 bibliography: references.bib +doi: 10.5281/zenodo.13907394 --- ## Introduction -The movement of Open Science has grown in importance in the Humanities, advocating for better accessibility of scientific research, especially in the form of the publication of research data [@unesco2023]. This has led funding agencies like SNSF, ANR, and Horizon Europe to ask research projects to publish their research data and metadata along the FAIR principles in public repositories (see for instance [@anr2023; @ec2023; @snsf2024]. Such requirements are putting pressure on researchers, who need to learn and understand the principles and standards of FAIR data and its impact on research data, but also require them to acquire new methodologies and know-how, such as in data management and data science. +The movement of Open Science has grown in importance in the Humanities, advocating for better accessibility of scientific research, especially in the form of the publication of research data [@unesco2023]. This has led funding agencies like SNSF, ANR, and Horizon Europe to ask research projects to publish their research data and metadata along the FAIR principles in public repositories [see for instance @anr2023; @ec2023; @snsf2024]. Such requirements are putting pressure on researchers, who need to learn and understand the principles and standards of FAIR data and its impact on research data, but also require them to acquire new methodologies and know-how, such as in data management and data science. At the same time, this accessibility of an increasing volume of interoperable quality data and the new semantic methodologies might bring a change of paradigm in the Humanities by the way knowledge is produced [@beretta2023; @feugere2015]. The utilization of Linked Open Data (LOD) grants scholars access to large volumes of interoperable and high-quality datasets, at a scale analogue methods cannot reach, fundamentally altering their approach to information. This enables scholars to pose novel research questions, marking a departure from traditional modes of inquiry and facilitating a broader range of analytical perspectives within academic discourse. Moreover, drawing upon semantic methodologies rooted in ontology engineering, scholars can effectively document the intricate complexities inherent of social and historical phenomena, enabling a nuanced representation essential to the Social Sciences and Humanities domains within their databases. This meticulous documentation not only reflects a sophisticated understanding of multifaceted realities but also empowers researchers to deepen the digital analysis of rich corpora. diff --git a/submissions/450/index.qmd b/submissions/450/index.qmd index d0e77e4..daa33ea 100755 --- a/submissions/450/index.qmd +++ b/submissions/450/index.qmd @@ -20,10 +20,21 @@ key-points: - GIS are powerful spatial analysis tools for historical research. - GIS aid the development of hypotheses and the framing of arguments in historical research projects. - GIS offer a myriad of choices for data modeling and visualization which researchers should remain critical and conscious of. -date: 2024-07-26 +date: 09-12-2024 +date-modified: last-modified +doi: 10.5281/zenodo.13903914 +other-links: + - text: Presentation Slides (PDF) + href: https://doi.org/10.5281/zenodo.13903914 bibliography: references.bib --- +::: {.callout-note appearance="simple" icon=false} + +For this paper, slides are available [on Zenodo (PDF)](https://zenodo.org/records/13903914/files/450_DigiHistCH24_PublicUrbanGreenSpaces_Slides.pdf). + +::: + ## Introduction GIS (Geographic Information Systems) have become increasingly valuable in spatial history research since the mid-1990s, and is particularly useful for analyzing socio-spatial dynamics in historical contexts [@kemp_what_2009, p. 16; @gregory_historical_2007, p.1]. My PhD research applies GIS to examine and compare the development of public urban green spaces, namely public parks and playgrounds, in the port cities of Hamburg and Marseille, between post-WWII urban reconstruction and the First Oil Shock in 1973. The management and processing of data concerning green space evolution in GIS allow visualization of when and where parks were created, and how these reflect socio-spatial differentiations. This layering of information offers ways to evaluate historical data and construct arguments, while also helping communicate the project to a wider audience. To critically assess the application of GIS in historical research, I will use the SWOT (Strengths, Weaknesses, Opportunities, Threats) analysis framework. This popular business consultancy approach [@minsky_are_2021] serves here as a structure for systematic reflection on how digital methods can enhance historical research and where caution is needed. The goal is to provoke critical thinking about when using GIS genuinely support research beyond producing impressive visuals, and to explore the balance between close and distant reading of historical data. @@ -31,6 +42,7 @@ GIS (Geographic Information Systems) have become increasingly valuable in spatia ## Strengths GIS are composed of layers of data sets and are mainly used for mapping, georeferencing, and data analysis (e.g. spatial analysis). The data can be varied in what it represents but must be linked to spatial parameters for it to be positioned in a visualization for analysis of spatial information [@wheatley_spatial_2005, pp.1,8]. GIS layers for historical studies can be viewed as models of data sets. They represent specific topics in time and space simplistically, and spark reflection [@van_ruymbeke_modeliser_2021, p.8]. The screen shot of my QGIS workspace (@fig-1) shows the city of Marseille with parks marked in various stages of planning in the early 1970s. This was the time when longstanding Mayor Gaston Defferre launched the large-scale greening initiative Mille Points Verts pour Marseille [@g_envoi_1971]. Defferre and his team’s goal was to react to a growing ecological awareness and increase the number of green spaces for a more livable city [@prof_j_no_1971]. They also organized events to include and educate citizens and to garner their support for the upcoming elections [@noauthor_semaine_1971]. The majority of parks created within Mille Points Verts remain until today, with only a handful of additional parks added after the mid-1970s. This is visible when the layer with parks and gardens from a 2018 dataset provided by the government of Marseille is selected (@fig-2). The strength of GIS layering is evident when we apply distant reading techniques: skimming over the model we see a display of spatial relations of park distribution and location as well as their connection to time.\ + In order for this visualization to take shape, I produced and assembled data. Specifically, I selected data and went through the process of closely reading my historical sources, learning to understand them and to think through their meaning. In this way maps are social documents. By themselves, they do not reveal anything yet. But by superimposing visualizations, GIS can reveal thinking processes of the data curators and map creators [cf. for example @jones_mapping_2021]. ::: {#fig-1} @@ -43,16 +55,22 @@ In order for this visualization to take shape, I produced and assembled data. Sp ## Weaknesses The curation of data, although an important and empowering step for the historian and GIS researcher, also reveals the weaknesses of GIS: the mismatch between GIS requirements (in terms of data structuring and quality) and the imperfection of historical data. GIS software is created for geographers, not historians. Everything in GIS is structured data and therefore cannot handle ambiguities natural to historical sources. Sources in whatever form they are collected by the historian first must be organized, selected, tabularized, geocoded and/or georeferenced [@kemp_what_2009, p.16-17]. The caveat here is that a historian’s data is hardly ever complete. Missing records shape both what we can and cannot analyze – especially when working with GIS. In historical narration on text the researcher can explain gaps and postulate why this may be the case. GIS do not allow for gaps and thus we can only produce models and visualizations with the numerical evidence available.\ + The visualization presented here helps to model different states of an object: the park. As my data is not complete, discrepancies between the mapped data and the on-the-ground reality occur, especially since the planned parks had vague names sometimes only matching the name of an entire neighborhood. This raises the question of how to capture temporality. How can the aspect of time appear on a two-dimensional visualization? Rendering the time layer onto the spatial one demands creativity and an awareness that time is something constructed [Massey speaks of “implicit imaginations of time and space” @massey_for_2005, pp.22].\ + From the map making perspective, time significantly impacts the creation process. GIS work is time-consuming and labor-intensive. It involves meticulous manual searching, assembling, and layering of data. However, linking to the overarching topic of this conference, AI may offer new possibilities. Tools such as Transkribus allow users to apply machine learning to filter specific elements from document sets. LLMs can then process this information into CSV files for GIS software. While not yet revolutionary, as these tools evolve, AI could become useful in extracting numerical evidence from textual sources. For geocoding of places, AI would greatly aid efficiency and relieve the researcher of tedious manual work. However, at this point, LLMs such as Claude AI and ChatGPT still hallucinate considerably. ## Opportunities AI-assisted data extraction presents a gateway to think about opportunities. Researchers could focus more on experimenting with design and layering by automating time-consuming tasks. For example, mapping supports spatial thinking and perception by integrating the crucial 'where' element: Where are specific features located? How near or far is one place from another? Depending on what obstacles or facilitators are in place a park may be close in measured distance but far in terms of accessibility if there is no bridge or tunnel to e.g. cross a motorway, or water element. Therefore, how are different locations related? How can we perceive and understand distance?\ + The screenshot here shows a handful of ‘wheres’ (@fig-3). They reveal where the majority of parks are located, their proximity to neighborhoods, the types of surrounding communities, and their connections to amenities and public infrastructure. This approach enables comparisons across different scales. For instance, I can compare park distribution between Hamburg and Marseille and track their development over time.\ These questions prompted by the use of GIS direct both user and observer back to the original sources for close reading. A GIS model can spark interest in a topic and motivate the researcher to dig deeper on what these layers mean and how they were created. Ideally GIS should be used as a starting point for in depth analysis. In the case of Marseille and Hamburg, the development of public urban green spaces was what inspired me to look more closely at the historical circumstances. Hamburg, for instance, has a long history of creating expansive green areas with the support of private patrons. Marseille does not have a comparable patronage system. Instead, municipal expropriation rendered private villas and their gardens public.\ + GIS are a powerful tool that serve multiple functions in research [@wheatley_spatial_2005, p.8]. They “can play a role in generating ideas and hypotheses at the beginning of a project” and serve as valuable instruments for analysis and evaluation [@brewer_basic_2006, p.S36]. By modeling research hypotheses and findings, e.g. maps can be used to effectively communicate to diverse audiences – from the general public to specialized groups such as urban planners and municipal governments, relevant to my field of historical urban planning research.\ -A particularly compelling aspect of GIS is their ability to visually represent power relations (@fig-4). This feature bridges the gap between historical analysis and contemporary urban planning, making it an invaluable tool in understanding the evolution of urban spaces. The visualization of Marseille reveals that the majority of parks are located towards the center and south of the city and does not necessarily correspond to the population density. The south of Marseille is where villas abound and thus the upper and upper-middle class live. The majority of the HLM (housing at moderate rent) are located towards the north, where living conditions are condensed, and political representation is low. What is more, if I select the layers showing where most immigrants and workers live today, a lack of green spaces is visible (@fig-5) (@fig-6) (@fig-7).\ + +A particularly compelling aspect of GIS is their ability to visually represent power relations (@fig-4). This feature bridges the gap between historical analysis and contemporary urban planning, making it an invaluable tool in understanding the evolution of urban spaces. The visualization of Marseille reveals that the majority of parks are located towards the center and south of the city and does not necessarily correspond to the population density. The south of Marseille is where villas abound and thus the upper and upper-middle class live. The majority of the HLM (housing at moderate rent) are located towards the north, where living conditions are condensed, and political representation is low. What is more, if I select the layers showing where most immigrants and workers live today, a lack of green spaces is visible (@fig-5, @fig-6, @fig-7).\ + Connecting this once more to close reading of the sources: when Mille Points Verts was launched, planners scavenged locations for green space creation. The HLM neighborhoods were marked as unsuitable for participation in this program: People living in social housing would “misuse” the parks by playing soccer on them or walking across the grass [@noauthor_amenagement_1970]. This shows complexity of space perception and power imbalance [@van_ruymbeke_modeliser_2021, pp.7]. ::: {#fig-3} @@ -74,10 +92,12 @@ Connecting this once more to close reading of the sources: when Mille Points Ver ## Threats Yet all these opportunities are ambiguous and “entrusting machines with the memory of human activity can be frightening”. The last element of the SWOT analysis, threats, rounds off these reflections. Although it is crucial to encourage critical thinking through the mapping of, for example, political representation and wealth distribution of a city it also shows my personal convictions. I wish to demonstrate which voices where not heard in the planning of these spaces, which people were not considered when decisions were made. I am biased when I start with the premise that there is inequality. The map, objective as it may seem, never is. The book *How to Lie with Maps* provocatively shows the power of maps to create a strong, and perhaps deceiving, narrative:\ + *"Map users generally are a trusting lot: they understand the need to distort geometry and suppress features, and they believe the cartographer really does know where to draw the line, figuratively as well as literally. […] Yet cartographers are not licensed, and many mapmakers competent in commercial art or the use of computer workstations have never studied cartography. Map users seldom, if ever, question these authorities, and they often fail to appreciate the map's power as a tool of deliberate falsification or subtle propaganda"* [@monmonier_how_1996, pp.1].\ People working with GIS can have all kinds of skill levels and interests. I, for example, am not a GIS specialist and relatively new to using the tool. Still, I can easily manipulate my model to paint various pictures, if I wish to do so. I can turn on different layers and focus on the number of immigrants per neighborhood, I can change the classification for the choropleth map and create entirely different impressions, or I can simply change the basemap and take away the context of terrain, transportation systems, etc. (cf. @fig-4). The quote speaks of an almost blind trust in maps, which shows once more that we must always be critical observers of the things we consume and historians should always want to be curios fact checkers.\ -A map is a series of decisions and it reflects the biography of both the maker and the observer. It is the responsibility of the historian working with GIS to be as transparent as possible regarding the choices made to display a historical development or state. It is the responsibility of the observer to use the map as a starting point for close reading, interpretation and analysis rather than the end point and a fact. We must remember “80% of GIS is about transforming, manipulating and managing spatial data”[@jones_lesson_2022]. + +A map is a series of decisions and it reflects the biography of both the maker and the observer. It is the responsibility of the historian working with GIS to be as transparent as possible regarding the choices made to display a historical development or state. It is the responsibility of the observer to use the map as a starting point for close reading, interpretation and analysis rather than the end point and a fact. We must remember “80% of GIS is about transforming, manipulating and managing spatial data” [@jones_lesson_2022]. ## Conclusion diff --git a/submissions/452/index.qmd b/submissions/452/index.qmd index f4aa5e1..de4aa87 100644 --- a/submissions/452/index.qmd +++ b/submissions/452/index.qmd @@ -26,16 +26,26 @@ abstract: | The Belpop project aims to reconstruct the demographic behavior of the population of a mushrooming working-class town during industrialization: Belfort. Belfort is a hapax in the French urban landscape of the 19^th^ century, as the demographic growth of its main working- class district far outstripped that of the most dynamic Parisian suburbs. The underlying hypothesis is that the massive Alsatian migration that followed the 1870-71 conflict, and the concomitant industrialization and militarization of the city, profoundly altered the demographic behavior of the people of Belfort. This makes Belfort an ideal place to study the sexualization of social relations in 19^th^-century Europe. These relationships will first be understood through the study of out-of-wedlock births, in their socio-cultural and bio-demographic dimensions. In the long term, this project will also enable to answer many other questions related to event history analysis, a method that is currently undergoing major development, thanks to artificial intelligence (AI), and which is profoundly modifying the questions raised by historical demography and social history. The contributions of deep learning make it possible to plan a complete analysis of Belfort's birth (ECN) and death (ECD) civil registers (1807-1919), thanks to HTR methods applied to these sources (two interdisciplinary computer science-history theses in progress). This project is part of the SOSI CNRS ObHisPop (Observatoire de l'Histoire de la Population française: grandes bases de données et IA), which federates seven laboratories and aims to share the advances of interdisciplinary research in terms of automating the constitution of databases in historical demography. Challenges also include linking (matching individual data) the ECN and ECD databases, and eventually the DMC database (DMC is the city's main employer of women)." -date: 07-22-2024 +date: 09-13-2024 +doi: 10.5281/zenodo.13904600 +other-links: + - text: Presentation Slides (PDF) + href: https://doi.org/10.5281/zenodo.13904600 --- +::: {.callout-note appearance="simple" icon=false} + +For this paper, slides are available [on Zenodo (PDF)](https://zenodo.org/records/13904600/files/452_DigiHistCH24_Belpop_Slides.pdf). + +::: + ## Extended Abstract The Belpop project aims to reconstruct the demographic behavior of the population of a mushrooming working-class town during industrialization: Belfort. Belfort is a hapax in the French urban landscape of the 19^th^ century, as the demographic growth of its main working- class district far outstripped that of the most dynamic Parisian suburbs. The underlying hypothesis is that the massive Alsatian migration following the 1870-71 conflict, along with concomitant industrialization and militarization of the city, profoundly altered the demographic behavior of the people of Belfort. - This makes Belfort an ideal place to study the sexualization of social relations in 19th-century Europe. These relationships will first be understood through the study of out-of-wedlock births, in their socio-cultural and bio-demographic dimensions. In line with our initial hypothesis, the random sampling of 1010 birth certificates to build a manual transcription database shows that the number of births outside marriage peaks in the century in the decade following the annexation of Alsace-Moselle to Germany and then remains at a high level. In fact, after 1870, Alsatians arrived in the city in droves, first to escape annexation by Germany, then to follow Alsatian employers who were relocating their mechanical and textile industries to Belfort: the documents needed for marriage were now found beyond the new border, while the presence of a large military population (the department had by far the highest number of single men at the beginning of the 20^th^ century) fueled legal and illegal prostitution and, in part, the number of births out of wedlock. In the long term, this project will also enable us to answer many other questions related to event history analysis, a method that is currently undergoing major development, thanks to artificial intelligence (AI), and which is profoundly modifying the questions raised by historical demography and social history. +This makes Belfort an ideal place to study the sexualization of social relations in 19th-century Europe. These relationships will first be understood through the study of out-of-wedlock births, in their socio-cultural and bio-demographic dimensions. In line with our initial hypothesis, the random sampling of 1010 birth certificates to build a manual transcription database shows that the number of births outside marriage peaks in the century in the decade following the annexation of Alsace-Moselle to Germany and then remains at a high level. In fact, after 1870, Alsatians arrived in the city in droves, first to escape annexation by Germany, then to follow Alsatian employers who were relocating their mechanical and textile industries to Belfort: the documents needed for marriage were now found beyond the new border, while the presence of a large military population (the department had by far the highest number of single men at the beginning of the 20^th^ century) fueled legal and illegal prostitution and, in part, the number of births out of wedlock. In the long term, this project will also enable us to answer many other questions related to event history analysis, a method that is currently undergoing major development, thanks to artificial intelligence (AI), and which is profoundly modifying the questions raised by historical demography and social history. - The contributions of deep learning make it possible to plan a complete analysis of Belfort's birth (ECN) and death (ECD) civil registers (1807-1919), thanks to HTR methods applied to these sources (two interdisciplinary computer science-history theses in progress). This project is part of the SOSI CNRS ObHisPop (Observatoire de l'Histoire de la Population française: grandes bases de données et IA), which federates seven laboratories and aims to share the advances of interdisciplinary research in terms of automating the constitution of databases in historical demography. Challenges also include linking (matching individual data) the ECN and ECD databases, and eventually the DMC database (DMC is the city's main employer of women). +The contributions of deep learning make it possible to plan a complete analysis of Belfort's birth (ECN) and death (ECD) civil registers (1807-1919), thanks to HTR methods applied to these sources (two interdisciplinary computer science-history theses in progress). This project is part of the SOSI CNRS ObHisPop (Observatoire de l'Histoire de la Population française: grandes bases de données et IA), which federates seven laboratories and aims to share the advances of interdisciplinary research in terms of automating the constitution of databases in historical demography. Challenges also include linking (matching individual data) the ECN and ECD databases, and eventually the DMC database (DMC is the city's main employer of women). The Belfort Civil Registers of Birth comprise 39,627 birth declarations inscribed in French in a hybrid format (printed and handwritten text). The declarations consist of four components (declaration number, declaration name, primary paragraph, marginal annotations) and provide information such as the child's name, parent's name, date of birth, and other details. The pages of the registers have been scanned at a resolution of 300 dpi each. @@ -45,9 +55,13 @@ Studying these invaluable resources is crucial for understanding the expansion o The development of these models requires a training dataset to address the challenges imposed by these historical documents, such as text style variation, skewness, and overlapping words and text lines. Two stages have been carried on to construct the training dataset. First, manual transcription of 1,010 declarations and 984 marginal annotations with a total of 21,939 text lines, 189,976 words, and 1,177,354 characters. This stage involves employing structure tags like XML tags to identify the characteristics of the declarations. Second, an automatic text line detection method is utilized to extract the text lines within the primary paragraphs and the marginal annotation images in a polygon boundary to preserve the handwritten text. The method is developed based on analyzing the gaps between two consecutive text lines within the images. The detection process initially identifies the core of the text lines regardless of text skewness. Moreover, the process identifies the gaps based on the identified cores. The number of gaps between each two lines is determined based on a predefined parameter value. Each gap is analyzed by examining the density of black pixels within the central third of the gap region. If the black pixel density is low, a segment point is placed at the center. Otherwise, a histogram analysis is performed to identify minimum valleys, which are then used as potential segment points. Finally, all the localized segment points are connected to form the boundary of the text in a polygon shape. + The Intersection over Union (IoU), Detection Rate (DR), Recognition Accuracy (RA), and F-Measure (FM) metrics have been employed to provide a comprehensive evaluation of different performance aspects of the method, achieving accuracies of 97.5% IoU, 99% DA, 98% (RA), and 98.50% (FM) for the detection of the text lines within the primary paragraphs. Moreover, the marginal annotations exhibit accuracies of 93.1%, 96%, 94%, and 94.79% across the same metrics, respectively. + A structured data tool has been developed for correlating the extracted text line images with their corresponding transcriptions at both the paragraph and text line levels by generating .xml files. These files structure the information within the registers based on the reading order of the components within the document and assign a unique index number for each. Additionally, several essential properties are incorporated within each component block, including the component name, the coordinates within the image, and the corresponding transcribed text. The .xml file generation processes are ongoing to expand the structured declarations to enrich the dataset essential for training artificial intelligence models. Belfort Civil Registers of Death (ECD) are composed of 39,238 death declarations with 18,381 fully handwritten certificates and 20,857 hybrid certificates. This corpus spans from 1807 to 1919. ECDs have the same resolution (300 dpi) and the same structure as the Civil Registers of Birth (ECN). The information given by each declaration is somewhat different: the name, the age, the profession of the deceased, the place of death, and even the profession of the witness, can be found. + Concerning ECDs, a different strategy was chosen for the text segmentation and the data extraction: the Document Attention Network (DAN). This network recently published is used to get rid of the pre-segmentation step which is highly beneficial for the heterogeneity of our dataset. It was developed for the recognition of handwritten dataset such as READ 2016 and RIMES 2009. Moreover, this architecture can focus on relevant parts of the document, improving the precision and identifying and extracting specific segments of interests. The choice was also made because this network is very efficient in handling large volumes of data while maintaining data integrity. -The DAN architecture is made of a Fully Convolutional Network (FCN) encoder to extract feature maps of the input image. This type of network is the most popular approach for pixel-pixel document layout analysis because it maintains spatial hierarchies. Then, a transformer is used as a decoder to predict sequences of variable length. Indeed, the output of this network is a sequence of tokens describing characters of the French language or layout (beginning of paragraph or end of page for instance). These layout tokens or tags were made to structure the layout of a register double page and to unify the ECD and ECN datasets. The ECD training dataset was built by picking around four certificates each year of the full dataset. For the handwritten records (1807-1885) the first two declarations of the double page were annotated and the first four for the hybrid records (1886-1919). This led to annotating 460 declarations for the first period and 558 declarations for the second one to give a total of 1118 annotated death certificates. We are currently verifying these annotations to start the pre-training phase of the DAN in the coming months. + +The DAN architecture is made of a Fully Convolutional Network (FCN) encoder to extract feature maps of the input image. This type of network is the most popular approach for pixel-pixel document layout analysis because it maintains spatial hierarchies. Then, a transformer is used as a decoder to predict sequences of variable length. Indeed, the output of this network is a sequence of tokens describing characters of the French language or layout (beginning of paragraph or end of page for instance). These layout tokens or tags were made to structure the layout of a register double page and to unify the ECD and ECN datasets. The ECD training dataset was built by picking around four certificates each year of the full dataset. For the handwritten records (1807-1885) the first two declarations of the double page were annotated and the first four for the hybrid records (1886-1919). This led to annotating 460 declarations for the first period and 558 declarations for the second one to give a total of 1118 annotated death certificates. We are currently verifying these annotations to start the pre-training phase of the DAN in the coming months. \ No newline at end of file diff --git a/submissions/453/index.qmd b/submissions/453/index.qmd index 9448ac4..372765f 100644 --- a/submissions/453/index.qmd +++ b/submissions/453/index.qmd @@ -16,8 +16,9 @@ keywords: abstract: | Over the last few decades, we have witnessed a major transformation in the digital resources available, with significant implications for society, the economy and research. In the social sciences, and history in particular, we can observe the provision of ever larger amounts of open research data and a growing number of data journals, as well as the development of educational resources aimed at strengthening the digital skills of researchers. Knowledge graphs and Linked Open Data make an exponentially growing number of resources easily accessible and raise the question of a paradigm shift for historical research. But this will only happen if digital methods are integrated into the training of new generations of historians, not just as tools but as part of new approaches to knowledge production, as a growing number of scholars and projects are realising. I have been teaching a master's course in digital methods in history at the University of Lyon 3 for five years, and now for four years at the University of Neuchâtel, which currently offers teachings in digital methods in the master courses in Historical Sciences and in Regional Heritage and Digital Humanities. In this paper, I will present the structure of the threefold programme of my teaching: in the first semester, understanding the research cycle, setting up an information system and discovering the semantic web; in the second, learning data analysis and visualisation methods; in the third, applying the methods to one's own research agenda. I will also review the results obtained and provide some examples of completed Master's theses. -date: 07-26-2024 +date: 09-13-2024 bibliography: references.bib +doi: 10.5281/zenodo.13907693 --- ## Introduction diff --git a/submissions/454/index.qmd b/submissions/454/index.qmd index a83956b..9941840 100644 --- a/submissions/454/index.qmd +++ b/submissions/454/index.qmd @@ -29,7 +29,8 @@ key-points: - We vectorise cadastral plans using an automated approach - We study the dynamics of parcel persistence based on geometries - We investigate the social structure of ownership cross-matching comparing cadastral sources with population censuses -date: 07-24-2024 +date: 09-12-2024 +doi: 10.5281/zenodo.13904641 bibliography: references.bib --- diff --git a/submissions/455/index.qmd b/submissions/455/index.qmd index 432c9cc..cc88d5e 100644 --- a/submissions/455/index.qmd +++ b/submissions/455/index.qmd @@ -13,7 +13,8 @@ author: email: yi-tang.lin@hist.uzh.ch affiliations: - University of Zurich -date: 07-26-2024 +date: 09-13-2024 +doi: 10.5281/zenodo.13907962 --- ## Introduction @@ -24,7 +25,7 @@ It is composed of four parts: first, we outline the objective of the project; se ## Objective of the Project -The aim of the project was to analyze the worldwide scientific politics of Rockefeller philanthropy through the prism of individual fellowship programs. The historiography of American philanthropy is very extensive, but has most often focused on the substantial funding granted by foundations to institutions and research programs, leaving aside funding to individuals, which was far less spectacular. Yet support for individuals is at the heart of philanthropic philosophy, and in particular that of the Rockefeller Foundation, which consists in selecting specific people at specific times, in specific sectors of activity and in specific countries to achieve specific goals. Ultimately, the Rockefeller Foundation's aim with these individual fellowship programs was to contribute to the construction of a global elite of experts and researchers involved in modernizing the world along American patterns, in line with the messianic project developed by American elites from the end of the 19th century onwards. Foundations have been among the main driving forces behind this project, and funding science one of the major means used to achieve it. This is the background to the Rockefeller Foundation's fellowship policy developed from the outset. As early as 1913, the Foundation awarded individual fellowships in all its fields of activity (medicine, public health, natural sciences, social sciences, nursing and agriculture). Between 1914 and 1968, 14,650 individual grants were awarded to 13,633 individuals (some of whom received several grants). +The aim of the project was to analyze the worldwide scientific politics of Rockefeller philanthropy through the prism of individual fellowship programs. The historiography of American philanthropy is very extensive, but has most often focused on the substantial funding granted by foundations to institutions and research programs, leaving aside funding to individuals, which was far less spectacular. Yet support for individuals is at the heart of philanthropic philosophy, and in particular that of the Rockefeller Foundation, which consists in selecting specific people at specific times, in specific sectors of activity and in specific countries to achieve specific goals. Ultimately, the Rockefeller Foundation's aim with these individual fellowship programs was to contribute to the construction of a global elite of experts and researchers involved in modernizing the world along American patterns, in line with the messianic project developed by American elites from the end of the 19^th^ century onwards. Foundations have been among the main driving forces behind this project, and funding science one of the major means used to achieve it. This is the background to the Rockefeller Foundation's fellowship policy developed from the outset. As early as 1913, the Foundation awarded individual fellowships in all its fields of activity (medicine, public health, natural sciences, social sciences, nursing and agriculture). Between 1914 and 1968, 14,650 individual grants were awarded to 13,633 individuals (some of whom received several grants). ## Material Used diff --git a/submissions/456/index.qmd b/submissions/456/index.qmd index 60bfc9b..2cc1042 100644 --- a/submissions/456/index.qmd +++ b/submissions/456/index.qmd @@ -34,8 +34,9 @@ keywords: - census transcription abstract: This article discusses the evolution of digital approaches to historical data, particularly in the creation and use of transcripted corpora extracted from demographic, cadastral, and geographic sources. These datasets are crucial for quantitative analyses across various disciplines. The emergence of computational techniques, including manual annotation, machine learning, and OCR, has led to a significant influx of historical data, challenging traditional methods of analysis. This paper emphasizes the importance of maintaining data versioning and transparency, enabling corrections and refinements over time. It highlights projects like "Names of Lausanne" and "Parcels of Venice," which use advanced technologies to create searchable and analyzable datasets from historical censuses and cadastral records. These projects illustrate the critical role of versioning and proper data management in preserving the integrity and usability of historical data, allowing for continuous improvements and new historical insights. -date: 07-24-2024 +date: 09-13-2024 bibliography: references.bib +doi: 10.5281/zenodo.13907752 --- ## Introduction diff --git a/submissions/457/index.qmd b/submissions/457/index.qmd index fdd9e8a..a30f344 100644 --- a/submissions/457/index.qmd +++ b/submissions/457/index.qmd @@ -16,10 +16,20 @@ keywords: - theory of history abstract: | Digital corpora play an important, if not defining, role in digital history and may be considered as one of the most obvious differences to traditional history. Corpora are essential for the use of computational methods and thus for the construction of computational historical models. But beyond their technical necessity and their practical advantages, their epistemological impact is significant. While the traditional pre-digital corpus is often more of a potentiality, a mere "intellectual object," the objective of computational processing requires the corpus to be made explicit and thus turns it into a "material object." Far from being naturally given, corpora are constructed as models of a historical phenomenon and therefore have all the properties of models. Moreover, following Gaston Bachelard, I would argue that corpora actually construct the phenomenon they are supposed to represent; they should therefore be considered as phenomenotechnical devices. -date: 2024-08-14 +date: 09-12-2024 +doi: 10.5281/zenodo.13904530 +other-links: + - text: Presentation Manuscript (PDF) + href: https://doi.org/10.5281/zenodo.13904530 bibliography: references.bib --- +::: {.callout-note appearance="simple" icon=false} + +For this paper, a manuscript is available [on Zenodo (PDF)](https://zenodo.org/records/13904530/files/457_DigiHistCH24_ComputationalHistoriographicalModeling_Manuscript.pdf). + +::: + ## Introduction When we look for epistemological differences between "traditional" and digital history, *corpora*---stand out. Of course, historians have always created and studied collections of traces, in particular documents, but sometimes also other artifacts, and have built their narratives on the basis of these collections. This is a significant aspect of scholarship and in some sense constitutes the difference between historical and literary narratives: historical narratives are supposed to be grounded (in some way) in the historical facts represented by the respective corpus. diff --git a/submissions/458/index.qmd b/submissions/458/index.qmd index 6e5b04c..05845f1 100644 --- a/submissions/458/index.qmd +++ b/submissions/458/index.qmd @@ -31,7 +31,9 @@ key-points: - 'We strive to detect the evolution of political convictions in the writings of the founder of the Chinese Communist Party with AI tools by posing a question: Is this article in favor of communism or capitalism?' - 'LLaMA was used for interpreting Chinese text but provided an inadequate summary.' - 'ChatGPT offered a good analysis when the text contained one strong point of view, but encountered challenges when multiple perspectives were present in the same text.' -date: 09-12-2024 +date: 09-13-2024 +date-modified: 10-21-2024 +doi: 10.5281/zenodo.13907852 bibliography: references.bib --- diff --git a/submissions/459/index.qmd b/submissions/459/index.qmd index 888a895..a00ed03 100644 --- a/submissions/459/index.qmd +++ b/submissions/459/index.qmd @@ -20,9 +20,19 @@ keywords: - Experience Report abstract: | Libraries are finding their place in the field of data literacy and the opportunities as well as challenges of supporting students and researchers in the field of Digital Humanities. Key aspects of this development are research data management, repositories, libraries as suppliers of data sets, digitisation and more. Over the past few years, the library has undertaken steps to actively bring itself into teaching and facilitate the basics of working with digital sources. The talk shares three experience reports of such endeavours undertaken by subject librarians of the Digital Humanities Work Group (AG DH) at the University Library Basel (UB). -date: 07-26-2024 +date: 09-12-2024 +doi: 10.5281/zenodo.13904170 +other-links: + - text: Presentation Slides (PDF) + href: https://doi.org/10.5281/zenodo.13904170 --- +::: {.callout-note appearance="simple" icon=false} + +For this paper, slides are available [on Zenodo (PDF)](https://zenodo.org/records/13904170/files/459_DigiHistCH24_DataLiteracyLibraries_Slides.pdf). + +::: + ## Introduction More and more, libraries are becoming important institutions when it comes to teaching data literacy and the basics of Digital Humanities (DH) tools and methods, especially to undergraduates or other people new to the subject matter. The Digital Humanities Work Group (AG DH), consisting of a selection of subject librarians from the University Library Basel (UB), have developed various formats to introduce students to these topics and continue to build and expand upon the available teaching elements in order to assemble customised lesson or workshop packages as needed. The aim of this talk is to share our experiences with the planning and teaching of three different course formats. These classes and workshops play, on one hand, an important part of making the library's (historical) holdings and datasets visible and available for digital research and, on the other hand, they are means to engage with students and (early stage) researchers and imparting skills in the area of working with data at an easily accessible level. diff --git a/submissions/460/index.qmd b/submissions/460/index.qmd index b7ac925..62a0852 100644 --- a/submissions/460/index.qmd +++ b/submissions/460/index.qmd @@ -21,10 +21,21 @@ key-points: - This study explored the network of connections between 1,248 migrant glassworkers and their family members working in Estonia from the 16th-19th century, using Transkribus, OCR, and Gephi as the main tools. - The raw dataset was published via DataDOI, an Open Access repository managed by the University of Tartu library in accordance to FAIR principles. - The data shows that a key factor in building and maintaining the glass community was godparenting and marriages between the families. -date: 07-29-2024 +date: 09-12-2024 +date-modified: 11-15-2024 +doi: 10.5281/zenodo.14171320 +other-links: + - text: Presentation Slides (PDF) + href: https://doi.org/10.5281/zenodo.14171320 bibliography: references.bib --- +::: {.callout-note appearance="simple" icon=false} + +For this paper, slides are available [on Zenodo (PDF)](https://zenodo.org/records/14171320/files/460_DigiHistCH24_Glassworkers_Slides.pdf). + +::: + ## Introduction As part of the author’s PhD project, ‘Glass and its makers in Estonia, c. 1550–1950: an archaeological study,’ the genealogical data about 1,248 migrant glassworkers and their family members working in Estonia from the 16th–19th century were collected using archival records and newspapers. The goal was to use information about key life events to trace the life histories of the glassworkers and their families from childhood to old age to gain an understanding of the community and the industry through one of its most important aspects – the workforce. It was hoped that the data will also assist in identifying the locations and names of glassworks during the period under study. In this paper, the author reflects on the process of this documentary archaeology research. The data collection, storage, and visualisation process are described, followed by the results of the study which have been included in a doctoral dissertation [@mythesis] and a research article [@reppo2023d]. diff --git a/submissions/462/index.qmd b/submissions/462/index.qmd index fb8c168..8b30333 100644 --- a/submissions/462/index.qmd +++ b/submissions/462/index.qmd @@ -32,8 +32,10 @@ key-points: - The number of receivers of interest in a house description is likely a good indicator for the burden of interest on a property. - A nested annotation of entities can enable to more complex research questions. - Awareness of strengths and weaknesses of the automated system is important when methodology is chosen. -date: 08-04-2024 +date: 09-13-2024 +date-modified: 10-13-2024 bibliography: references.bib +doi: 10.5281/zenodo.13907882 --- ## Introduction @@ -116,7 +118,7 @@ plt.show() ``` -We consider three ways to measure the burden of interest. First, we could use the absolute number of descriptions, but this number is unreliable, as there is a trend to not describe the interest anymore in later documents (see @fig-3). Second, we could try to use the monetary values mentioned in the descriptions. Here we encounter multiple problems: Numbers suffer from more HTR errors than other parts of the documents and are thus less reliable in general. Additionally a number of different currencies are in use, which would need conversion to a single value, as well as payments in kind. Finally our automated system cannot differentiate between different reasons for interest at the moment, so we wouldn’t know if a value is paid per year or only in case of an exchange of property ownership (“zu erschatz”). We settled using the number of beneficiaries to determine the burden of rent. Any entity found in a house description is classified as a beneficiary (we evaluated this to be true in 98% of cases based on 100 samples, with the true false positives being caused by errors in the named entity recognition process). Figure [-@fig-4] shows the absolute numbers of organizations and persons recognized as receivers of interest in house descriptions over time. +We consider three ways to measure the burden of interest. First, we could use the absolute number of descriptions, but this number is unreliable, as there is a trend to not describe the interest anymore in later documents (see @fig-3). Second, we could try to use the monetary values mentioned in the descriptions. Here we encounter multiple problems: Numbers suffer from more HTR errors than other parts of the documents and are thus less reliable in general. Additionally a number of different currencies are in use, which would need conversion to a single value, as well as payments in kind. Finally our automated system cannot differentiate between different reasons for interest at the moment, so we wouldn’t know if a value is paid per year or only in case of an exchange of property ownership (“zu erschatz”). We settled using the number of beneficiaries to determine the burden of rent. Any entity found in a house description is classified as a beneficiary (we evaluated this to be true in 98% of cases based on 100 samples, with the true false positives being caused by errors in the named entity recognition process). @fig-4 shows the absolute numbers of organizations and persons recognized as receivers of interest in house descriptions over time. ```{python} #| label: fig-4 diff --git a/submissions/464/index.qmd b/submissions/464/index.qmd index 8d1ebad..6d40952 100644 --- a/submissions/464/index.qmd +++ b/submissions/464/index.qmd @@ -20,8 +20,9 @@ key-points: - Institutions holding the same types of ancient heritage objects face similar challenges when it comes to the digitization of their collections. - Papyri are a suitable case study object for the purpose of this analysis, having developed many digital resources through originally independent initiatives. - These endeavors have generated a digital landscape in which institutions can implement varied approaches, from utilizing existing metadata standards, external services and controlled vocabularies to combining digitization resources and connecting to transdisciplinary aggregators. -date: 07-28-2024 +date: 09-13-2024 bibliography: references.bib +doi: 10.5281/zenodo.13907777 --- ## Introduction diff --git a/submissions/465/index.qmd b/submissions/465/index.qmd index b1f2e60..f5c6749 100644 --- a/submissions/465/index.qmd +++ b/submissions/465/index.qmd @@ -21,8 +21,9 @@ key-points: - Integrating Machine Learning output in historical research requires meticulous evaluation. - Factoids can provide a technique for the multifaceted representation of data points. - Digital History requires new hermeneutical tools suitable for digital data and workflows. -date: 07-23-2024 +date: 09-12-2024 bibliography: references.bib +doi: 10.5281/zenodo.13907672 --- ## Introduction diff --git a/submissions/468/index.qmd b/submissions/468/index.qmd index 457e923..73921c2 100644 --- a/submissions/468/index.qmd +++ b/submissions/468/index.qmd @@ -17,7 +17,10 @@ keywords: - Agricultural Films - Audiovisual Media - Film History -date: 08-16-2024 +date: 09-12-2024 +date-modified: 11-15-2024 +doi: 10.5281/zenodo.14171325 +bibliography: references.bib --- ## Introduction @@ -136,13 +139,13 @@ animals at work. Films thus bear witness, often unintentionally, to the fact tha often was not as it was portrayed or demanded in textbooks and magazines. However, films are more than mere images; they intervene in the context of their creation and use, -create a reality of their own and exert an influence on the viewer.[^2] This was often used deliberately, for example if there was a need for media control when innovations of a technical, economic, political, social or medical nature had an impact on society or the environment. Changes of all kinds, including the controversies that accompanied them, were therefore an important reason to produce commissioned films. The films had the function of adapting their audiences to new +create a reality of their own and exert an influence on the viewer [@bernhardt_visual_2013, 5]. This was often used deliberately, for example if there was a need for media control when innovations of a technical, economic, political, social or medical nature had an impact on society or the environment. Changes of all kinds, including the controversies that accompanied them, were therefore an important reason to produce commissioned films. The films had the function of adapting their audiences to new requirements, creating acceptance for the innovation and laying the foundation for further changes. In this respect, commissioned films contributed to the creation of a willingness to cooperate and -to consensus-building in modernisation processes.[^3] In the agricultural context, this function of +to consensus-building in modernisation processes [@zimmermann_dokumentarischer_2011, pp. 64, 69f.]. In the agricultural context, this function of films was used, for example, by the Eidgenössische Alkoholverwaltung EAV (Swiss Alcohol -Board)[^4] and the plant protection company Dr Rudolf Maag AG, which commissioned and -produced numerous films illustrating their activities and the use of their products.[^5] +Board, @auderset_rausch_2016; @wigger_saft_2022) and the plant protection company Dr Rudolf Maag AG, which commissioned and +produced numerous films illustrating their activities and the use of their products [@archiv_fur_agrargeschichte_playlist_nodate-1; @archiv_fur_agrargeschichte_playlist_nodate]. The dual function of audiovisual sources as images and as influencing media often cannot be adequately captured by written texts alone. This is why we conceptualise moving images also for @@ -155,7 +158,7 @@ will come up against limitations because much of what characterises moving image written down: the dynamics and (in the case of sound films) the interplay of image and sound in particular. It is, furthermore, often impossible to translate the content of the image into words, for example when it comes to the behaviour of (speechless) animals, human-animal interactions or -disappeared (agricultural) practices, for which there is no vocabulary in industrialised societies.[^6] +disappeared (agricultural) practices, for which there is no vocabulary in industrialised societies [@wigger_bewegende_2023]. To counter these difficulties, the format of the historical video essay lends itself as a supplement to written texts. A video essay in our series is understood as a montage of historical film and image @@ -164,7 +167,7 @@ material and visual carrier of the knowledge transfer and are contextualised and commentary. In addition to the communication function, video essays can also be used as an analytical tool. -![Fig. 4: The first video essay in the series Video Essays in Rural History focuses on the importance of working horses, cattle, dogs, mules and donkeys in agriculture and in the cities of the 19^th^ and 20^th^ centuries.[^7]](images/Figure4.jpg) +![Fig. 4: The first video essay in the series Video Essays in Rural History focuses on the importance of working horses, cattle, dogs, mules and donkeys in agriculture and in the cities of the 19^th^ and 20^th^ centuries [@wigger_working_2022].](images/Figure4.jpg) The ARH and ERHFA have launched the *[Video Essays in Rural History series](https://www.ruralfilms.eu/all_video_essays.html)*, in which five video essays from Switzerland, Belgium and Canada have been published to date. They address the importance of working animals, Swiss agronomists and farmers travelling to America in the early 20^th^ century, neighbourly @@ -181,24 +184,3 @@ video essay on working animals was clicked on 3,100 times in the first week afte example). [^1]: The film is available online in the ARH/ERHFA online portal: [ruralfilms.eu (16.08.2024)](https://ruralfilms.eu/filmdatabaseOnline/index.php?tablename=films&function=details&where_field=ID_films&where_value=203). - -[^2]: Bernhardt Markus, Visual History: Einführung in den Themenschwerpunkt, in: Zeitschrift für Geschichtsdidaktik, -12/1 (2013), p. 5–8, here: p. 5. - -[^3]: Zimmermann Yvonne, Dokumentarischer Film: Auftragsfilm und Gebrauchsfilm, in: Zimmermann Yvonne (Hg.), -Schaufenster Schweiz: Dokumentarische Gebrauchsfilme 1896-1964, Zürich 2011, p. 34–83, here: p. 64 & 69f. - -[^4]: Auderset Juri/Moser Peter, Rausch & Ordnung. Eine illustrierte Geschichte der Alkoholfrage, der schweizerischen -Alkoholpolitik und der Eidgenössischen Alkoholverwaltung (1887-2015), Bern 2016; Wigger Andreas, Saft statt -Schnaps. Das Filmschaffen der Eidgenössischen Alkoholverwaltung (EAV) von 1930 bis 1985, in: Geschichte im -Puls, Dossier 3: Ekstase (2022), [www.geschichteimpuls.ch (02.07.2024)](https://www.geschichteimpuls.ch/artikel/eav) - -[^5]: Playlist Eidgenössische Alkoholverwaltung (EAV), in: Archiv für Agrargeschichte, [YouTube Playlist (02.07.2024)](https://youtube.com/playlist?list=PLSdpgcFyXTnbny77UvXG2neenufUdK7gH); Playlist Dr. Rudolf -Maag AG, in: Archiv für Agrargeschichte, [YouTube Playlist (02.07.2024)](https://youtube.com/playlist?list=PLSdpgcFyXTnbQFfNleFhCKqqNhGcP3M4_). - -[^6]: Wigger Andreas, Bewegende Tiere auf bewegten Bildern. Filme als Quellen und Vermittlungsformat zur -Geschichte der arbeitenden Tiere in der Zeit der Massenmotorisierung (1950-1980), Videoessay zur Masterarbeit, -Fribourg 2023, [YouTube (25.06.2024)](https://youtu.be/_XVWdHNQxv8). - -[^7]: Moser Peter/Wigger Andreas, Working Animals. Hidden modernisers made visible, in: Video Essays in Rural -History, 1 (2022), [https://www.ruralfilms.eu/essays/videoessay_1_EN.html](https://www.ruralfilms.eu/essays/videoessay_1_EN.html) [16.08.2024]. diff --git a/submissions/468/references.bib b/submissions/468/references.bib new file mode 100644 index 0000000..f57ef64 --- /dev/null +++ b/submissions/468/references.bib @@ -0,0 +1,78 @@ + +@article{bernhardt_visual_2013, + title = {Visual {History}: {Einführung} in den {Themenschwerpunkt}}, + volume = {12}, + number = {1}, + journal = {Zeitschrift für Geschichtsdidaktik}, + author = {Bernhardt, Markus}, + year = {2013}, + pages = {5--8}, +} + +@incollection{zimmermann_dokumentarischer_2011, + address = {Zürich}, + title = {Dokumentarischer {Film}: {Auftragsfilm} und {Gebrauchsfilm}}, + isbn = {978-3-85791-605-2}, + booktitle = {Schaufenster {Schweiz}: {Dokumentarische} {Gebrauchsfilme} 1896-1964}, + publisher = {Limmat-Verlag}, + author = {Zimmermann, Yvonne}, + editor = {Zimmermann, Yvonne and Gertiser, Anita}, + year = {2011}, + pages = {34--83}, +} + +@book{auderset_rausch_2016, + address = {Bern}, + title = {Rausch \& {Ordnung}: eine illustrierte {Geschichte} der {Alkoholfrage}, der schweizerischen {Alkoholpolitik} und der {Eidgenössischen} {Alkoholverwaltung} (1887-2015)}, + isbn = {978-3-906211-10-7}, + shorttitle = {Rausch \& {Ordnung}}, + language = {ger}, + publisher = {Eidgenössische Alkoholverwaltung}, + author = {Auderset, Juri and Moser, Peter}, + collaborator = {{Eidgenössische Alkoholverwaltung}}, + year = {2016}, +} + +@misc{wigger_saft_2022, + title = {Saft statt {Schnaps}: {Das} {Filmschaffen} der {Eidgenössischen} {Alkoholverwaltung} ({EAV}) von 1930 bis 1985}, + url = {https://www.geschichteimpuls.ch/artikel/eav}, + language = {de-ch}, + urldate = {2024-07-02}, + journal = {Geschichte im Puls}, + author = {Wigger, Andreas}, + month = may, + year = {2022}, +} + +@misc{archiv_fur_agrargeschichte_playlist_nodate, + title = {Playlist {Eidgenössische} {Alkoholverwaltung}}, + url = {https://www.youtube.com/playlist?list=PLSdpgcFyXTnbny77UvXG2neenufUdK7gH}, + urldate = {2024-07-02}, + author = {{Archiv für Agrargeschichte}}, +} + +@misc{archiv_fur_agrargeschichte_playlist_nodate-1, + title = {Playlist {Dr}. {Rudolf} {Maag} {AG}}, + url = {https://www.youtube.com/playlist?list=PLSdpgcFyXTnbQFfNleFhCKqqNhGcP3M4_}, + urldate = {2024-07-02}, + author = {{Archiv für Agrargeschichte}}, +} + +@misc{wigger_bewegende_2023, + address = {Fribourg}, + title = {Bewegende {Tiere} auf bewegten {Bildern}. {Filme} als {Quellen} und {Vermittlungsformat} zur {Geschichte} der arbeitenden {Tiere} in der {Zeit} der {Massenmotorisierung} (1950-1980)}, + url = {https://www.youtube.com/watch?v=_XVWdHNQxv8}, + abstract = {Videoessay zur Masterarbeit}, + urldate = {2024-06-25}, + author = {Wigger, Andreas}, + year = {2023}, +} + +@misc{wigger_working_2022, + title = {Working {Animals}. {Hidden} modernisers made visible}, + url = {https://www.ruralfilms.eu/essays/videoessay_1_EN.html}, + urldate = {2024-08-16}, + journal = {Video Essays in Rural History}, + author = {Wigger, Andreas and Moser, Peter}, + year = {2022}, +} diff --git a/submissions/469/index.qmd b/submissions/469/index.qmd index d27f413..b4f8a66 100644 --- a/submissions/469/index.qmd +++ b/submissions/469/index.qmd @@ -8,25 +8,28 @@ author: email: edu.ildf@icloud.com affiliations: - University of Geneva +date: 09-13-2024 +date-modified: 11-06-2024 bibliography: Master_thesis_ILDF.bib +doi: 10.5281/zenodo.13907996 keywords: - Digital History - Structural Topic Modelling - HCPC - Rockefeller Foundation - Development -other-links: +code-links: - text: GitHub Repository href: https://github.com/ivanldf13/Master-thesis- + icon: file-code key-points: - The analysis demonstrated that the Foundation's concept of development evolved significantly, incorporating layers such as economic, social, cultural, and environmental aspects over time. - The STM efficiently captured the temporal dynamics undergone by the Foundation's development concept. - The study highlighted the critical importance of combining advanced digital methodologies with traditional hermeneutical analysis and bibliographical review to fully understand the nuanced development concepts and the shifting roles of political actors and self-help mentalities. - abstract: | This paper aims to analyse the evolution of the development concept throughout the Rockefeller Foundation’s first century of existence, utilising its annual reports. Drawing inspiration from Moretti & Pestre’s influential working paper – Banskpeak – our methodology consists of a two-fold approach. Firstly, we conducted a quantitative language analysis of the language employed in the Rockefeller Foundation’s annual reports. Here, using R we did a Structural Topic Modelling. Secondly, building upon the outcomes of this initial quantitative analysis, we delved into the activities and institutions in which the Foundation was involved to reconstruct its evolving development concept. This approach allowed us to observe how the meaning of development evolved, accumulating new connotations over time. - We started our analysis at the beginning of the 20th century because – even though the development concept was not formally coined until 1949 – the Foundation was already involved in development activities and institutions before that date. Furthermore, this actor had a set of ideas from the beginning of its activities that continued to influence its actions even after the formalisation of the development concept. In this sense, we explored the significance of the self-help ethic and market-oriented mentality in other spheres of development. + We started our analysis at the beginning of the 20^th^ century because – even though the development concept was not formally coined until 1949 – the Foundation was already involved in development activities and institutions before that date. Furthermore, this actor had a set of ideas from the beginning of its activities that continued to influence its actions even after the formalisation of the development concept. In this sense, we explored the significance of the self-help ethic and market-oriented mentality in other spheres of development. Consequently, we demonstrated that self-help had consistently played a pivotal role in the Foundation’s development strategy since the Foundation’s inception. Furthermore, we scrutinised the roles ascribed by the Foundation to various actors in the development process. While the Foundation initially regarded the State as the primary actor in development, by the study period’s end, new participants such as private companies, communities, and individuals had become integral to this process. All the necessary data and scripts to reproduce this presentation can be found [here](https://github.com/ivanldf13/Master-thesis-). --- @@ -35,11 +38,11 @@ abstract: | In our presentation, we will explore how Digital Humanities tools can be used to analyse the concept of development from a historiographical perspective. We will begin with a brief introduction to the topic, followed by an overview of our primary sources. The core of our presentation will focus on the methodology, where we will justify our choice of Structural Topic Modelling over other techniques like Hierarchical Clustering on Principal Components. Finally, we will present the results of our analysis and some remarks. -The concept of development — and its practical implications — has been controversial since its inception, both in academia and the political arena. Created in the post-WWII period as a universal goal, it soon met opposition, especially in 'underdeveloped' countries that had little say in the development policies imposed on them. Consequently, the concept has undergone continuous redefinition.[@sachs_archaeology_2008] +The concept of development — and its practical implications — has been controversial since its inception, both in academia and the political arena. Created in the post-WWII period as a universal goal, it soon met opposition, especially in 'underdeveloped' countries that had little say in the development policies imposed on them. Consequently, the concept has undergone continuous redefinition [@sachs_archaeology_2008]. -From the outset, governmental and non-governmental actors have been involved in the development process. Among the non-state actors, philanthropic foundations are particularly significant. However, despite their importance, the way these foundations conceptualize development has received less academic attention than other aspects of their activities. This is true for the Rockefeller Foundation,[^2] a key player in international public health, [@birn_rockefeller_2013] global food and agriculture policies, [@smith_imaginaries_2009] the development of various academic disciplines, [@tournes_fondation_2007; @fisher_role_1983; @fisher_rockfeller_1999; @schneider_role_1999] and the configuration of the international order after WWII. [@tournes_rockefeller_2014] +From the outset, governmental and non-governmental actors have been involved in the development process. Among the non-state actors, philanthropic foundations are particularly significant. However, despite their importance, the way these foundations conceptualize development has received less academic attention than other aspects of their activities. This is true for the Rockefeller Foundation,[^2] a key player in international public health [@birn_rockefeller_2013], global food and agriculture policies [@smith_imaginaries_2009], the development of various academic disciplines [@tournes_fondation_2007; @fisher_role_1983; @fisher_rockfeller_1999; @schneider_role_1999] and the configuration of the international order after WWII [@tournes_rockefeller_2014]. -[^2]: From now on referred to as the Foundation +[^2]: From now on referred to as the Foundation. ## Primary sources @@ -47,32 +50,32 @@ We chose as primary sources the Foundation’s Annual Reports for two reasons. T The second reason is qualitative. The main objective of annual reports is to communicate the activities of the Foundation, its financial operations, its priorities, its vision of the issues it faces, and a self-assessment of its own actions in the past and those to be adopted in the future. Although the structure of the annual reports has changed over time, the content has remained stable. The Foundation presents with them a summary of its activities but also presents a narrative that seeks to communicate the reasoning and justification behind the Foundation’s activities. In this sense, the annual reports are a showcase in which the Foundation displays, promotes and justifies its values. -Moreover, since these reports are public, they serve two functions. The first is purely functional. The reports inform the reader of the Foundation’s activities, its financial state, and other relevant details. The second function is symbolic. As Peter Goldmark Jr. (president of the Foundation from 1988 to 1997) noted, philanthropic foundations lack the three disciplines American life has: the test of the markets, the test of the elections and the press that analyses every move. [@rockefeller_foundation_annual_1998, pp. 3] Therefore, the Foundation uses the annual reports as a form of self-evaluation, as a way to make itself accountable to the public and to offer a promotion and justification of the values that guide its activities. [@rockefeller_foundation_annual_1955, pp. 3] +Moreover, since these reports are public, they serve two functions. The first is purely functional. The reports inform the reader of the Foundation’s activities, its financial state, and other relevant details. The second function is symbolic. As Peter Goldmark Jr. (president of the Foundation from 1988 to 1997) noted, philanthropic foundations lack the three disciplines American life has: the test of the markets, the test of the elections and the press that analyses every move [@rockefeller_foundation_annual_1998, p. 3]. Therefore, the Foundation uses the annual reports as a form of self-evaluation, as a way to make itself accountable to the public and to offer a promotion and justification of the values that guide its activities [@rockefeller_foundation_annual_1955, p. 3]. ## Methodology and its twists and turns -Confronted with the enormous amount of reports to be analysed and inspired by the working paper “Bankspeak” by Moretti and Pestre, [@moretti_bankspeak_2015] we undertook a quantitative analysis of the language used in this reports. Then, guided by the results of this analysis we interpreted the activities and institutions in which the Foundation was involved to reconstruct the evolution of its concept of development. +Confronted with the enormous amount of reports to be analysed and inspired by the working paper “Bankspeak” by Moretti and Pestre [@moretti_bankspeak_2015], we undertook a quantitative analysis of the language used in this reports. Then, guided by the results of this analysis we interpreted the activities and institutions in which the Foundation was involved to reconstruct the evolution of its concept of development. -We began our quantitative analysis by importing the PDF reports into R using the ‘tidy’ principle [@silge_tidytext_2016, pp.1] and then performing the necessary text cleaning to reduce the size of the corpus. This increased the efficiency and effectiveness of the analysis.[@gurusamy_preprocessing_2014] We then proceeded with the analysis itself. +We began our quantitative analysis by importing the PDF reports into R using the ‘tidy’ principle [@silge_tidytext_2016, pp.1] and then performing the necessary text cleaning to reduce the size of the corpus. This increased the efficiency and effectiveness of the analysis [@gurusamy_preprocessing_2014]. We then proceeded with the analysis itself. -Initially, we employed basic text analysis techniques, namely counting the most frequent words per year and per period and using the TF-IDF. These techniques yielded promising results but were insufficient. Although the Foundation had the same objective throughout the period – “*to promote the well-being of mankind throughout the world*” – ,[@rockefeller_foundation_annual_1915, pp.7; @rockefeller_foundation_annual_1964, pp.3; @rockefeller_foundation_annual_2014, pp.3] it used different words in absolute and relative terms to describe and justify its activities. +Initially, we employed basic text analysis techniques, namely counting the most frequent words per year and per period and using the TF-IDF. These techniques yielded promising results but were insufficient. Although the Foundation had the same objective throughout the period – “*to promote the well-being of mankind throughout the world*” – [@rockefeller_foundation_annual_1915, p. 7; @rockefeller_foundation_annual_1964, p. 3; @rockefeller_foundation_annual_2014, p. 3], it used different words in absolute and relative terms to describe and justify its activities. However, in terms of visualisation, precision and displaying temporal dynamics, the capabilities of these two techniques are worse than those of Hierarchical Clustering on Principal Components (HCPC) and Structural Topic Modelling (STM). Moreover, the former techniques are unable to create clusters and topics, unlike the latter two. -We continued with the HCPC, using only nouns, as this part of speech is the most suitable for analysing topics.[@suh_socialterm-extractor_2019, pp.2] This technique confirmed the findings of the absolute frequency analysis and the TF-IDF. That is, there is structure in the use of words by the Foundation, as reflected both in the biplot created by the Correspondence Analysis (CA) necessary to perform the HCPC and in the final clusters. In the biplot in @fig-1.top25, the documents are organised in a temporal manner and, being together with each other, this indicates that they favour and avoid the same words regardless of the number of words in each document.[@becue-bertaut_textual_2019, pp.18-19] Specifically, we observed that the Foundation used more frequently terms such as ‘infection’ or ‘hookworm’ and less frequently terms such as ‘resilience’ or ‘climate’ at the beginning of the period. Furthermore, when clustering after the CA and analysing the words contained in each cluster, it is observed that the Foundation, over time, diversifies the topics in which it engages, following a chronological trend. However, the visualisation of the clusters does not significantly enhance our understanding of the matter. +We continued with the HCPC, using only nouns, as this part of speech is the most suitable for analysing topics [@suh_socialterm-extractor_2019, p. 2]. This technique confirmed the findings of the absolute frequency analysis and the TF-IDF. That is, there is structure in the use of words by the Foundation, as reflected both in the biplot created by the Correspondence Analysis (CA) necessary to perform the HCPC and in the final clusters. In the biplot in @fig-1.top25, the documents are organised in a temporal manner and, being together with each other, this indicates that they favour and avoid the same words regardless of the number of words in each document [@becue-bertaut_textual_2019, p. 18-19]. Specifically, we observed that the Foundation used more frequently terms such as ‘infection’ or ‘hookworm’ and less frequently terms such as ‘resilience’ or ‘climate’ at the beginning of the period. Furthermore, when clustering after the CA and analysing the words contained in each cluster, it is observed that the Foundation, over time, diversifies the topics in which it engages, following a chronological trend. However, the visualisation of the clusters does not significantly enhance our understanding of the matter. ![Top 25 contributors to the two first dimensions](images/Figure%201.Top_25_Contributors.png){#fig-1.top25} Despite offering us greater certainty regarding the temporal structure of the language used, the HCPC does not possess the precision of the next technique we employed: the Structural Topic Modelling with temporal metadata. In a CA with two dimensions, the closer a word, report, or cluster is to the origin of coordinates, the lower its explanatory power, as it represents a smaller percentage of the variance. In our case, there is one cluster almost at the origin of coordinates and two others not far from the central values of one or the other dimension. -Next, we employed the STM using also only nouns. As a topic modelling technique, the STM seeks to discover latent topics assumed to be generated by the corpus to be analysed, and the researcher must define the number of topics. Since there is no ‘correct’ number of topics for a corpus, we followed Roberts et al.’s methodology.[-@roberts_structural_2014] Thus, we quantitatively measured semantic coherence[^3] and topic exclusivity[^4] by standardising these scores and choosing a number of topics that balances them well. +Next, we employed the STM using also only nouns. As a topic modelling technique, the STM seeks to discover latent topics assumed to be generated by the corpus to be analysed, and the researcher must define the number of topics. Since there is no ‘correct’ number of topics for a corpus, we followed Roberts et al.’s methodology [-@roberts_structural_2014]. Thus, we quantitatively measured semantic coherence[^3] and topic exclusivity[^4] by standardising these scores and choosing a number of topics that balances them well. -[^3]: Words more likely to appear in a topic are more likely to appear together within documents -[^4]: Words more likely to appear in one topic are less likely to appear in another +[^3]: Words more likely to appear in a topic are more likely to appear together within documents. +[^4]: Words more likely to appear in one topic are less likely to appear in another. ![Table with the topical content](images/Topics_DHCH.png){#tbl-topics} -Once we chose the number of topics, we obtained two lists of nouns associated with each topic, as shown in @tbl-topics. One list groups the nouns most likely to appear in each topic (Highest Prob list), while the other groups those that are frequent and exclusive (FREX list). These lists allow us to discover the central topics without our prior biases. We then named each topic using both lists and analysed the most representative reports for each topic. Therefore, this approach is a mixture of the methods suggested by Roberts et al.[-@roberts_structural_2014, pp.1068] and Grajzl & Murrell.[-@grajzl_toward_2019, pp.10] +Once we chose the number of topics, we obtained two lists of nouns associated with each topic, as shown in @tbl-topics. One list groups the nouns most likely to appear in each topic (Highest Prob list), while the other groups those that are frequent and exclusive (FREX list). These lists allow us to discover the central topics without our prior biases. We then named each topic using both lists and analysed the most representative reports for each topic. Therefore, this approach is a mixture of the methods suggested by Roberts et al. [-@roberts_structural_2014, p. 1068] and Grajzl & Murrell [-@grajzl_toward_2019, p. 10]. ![Topical prevalence of the topics correlated with time](images/plot.expected.topic.proportions.png){#fig-topical_prev} @@ -90,4 +93,4 @@ This approach provided an innovative way to understand the main topics in which However, this methodology proved inefficient in analysing the role of the self-help mentality and the market-oriented mentality. To address this, we had to perform a close reading to conclude the centrality of both in the Foundation's thinking, especially in the 21st century. Indeed, throughout its existence, the Foundation sought to ensure that the actors it helped to develop became autonomous agents who could solve their problems without recourse to third parties. Furthermore, we observed how the importance of these actors in the development process also changed over time. At the beginning of the period, the Foundation conceived of the State as the primary catalyst for development. By the end of the period, it advocated development involving the State, private enterprise, civil society, and individuals. As the State’s credibility as a guarantor of rights and provider of welfare-related services wanes, the Foundation encourages individuals to find their own means to cope with the risks present in contemporary society without waiting for help from the State. -This limitation of STM revealed the importance of working hypotheses created through a sound bibliographical review and the hermeneutical work of the historian, despite the use of new methodologies. It was only through the insights gained from the bibliographical review that we anticipated a change in the role of different political actors in the development arena and recognised the significance of the self-help and market-oriented mentality in the Foundation’s development concept. When interpreting the STM results, we found that we could not answer these questions solely with the digital tools. Consequently, we had to conduct a close reading to address these issues, highlighting the critical role of hermeneutical work both in analysing the results of Digital Humanities tools and in the close reading exercise. +This limitation of STM revealed the importance of working hypotheses created through a sound bibliographical review and the hermeneutical work of the historian, despite the use of new methodologies. It was only through the insights gained from the bibliographical review that we anticipated a change in the role of different political actors in the development arena and recognised the significance of the self-help and market-oriented mentality in the Foundation’s development concept. When interpreting the STM results, we found that we could not answer these questions solely with the digital tools. Consequently, we had to conduct a close reading to address these issues, highlighting the critical role of hermeneutical work both in analysing the results of Digital Humanities tools and in the close reading exercise. \ No newline at end of file diff --git a/submissions/471/index.qmd b/submissions/471/index.qmd index 5e68dc0..bf717a5 100644 --- a/submissions/471/index.qmd +++ b/submissions/471/index.qmd @@ -38,12 +38,18 @@ date: 09-12-2024 bibliography: references.bib doi: 10.5281/zenodo.13908208 other-links: - - text: Presentation Slides + - text: Presentation Slides (PDF) href: https://doi.org/10.5281/zenodo.13908208 - text: Mini Muse Project Website href: https://mini-muse.github.io/project/ --- +::: {.callout-note appearance="simple" icon=false} + +For this paper, slides are available [on Zenodo (PDF)](https://zenodo.org/records/13908208/files/471_DigiHistCH24_AI-assistedSearch_Presentation.pdf). + +::: + ## Introduction Cultural digital archives are a goldmine of information for historians, offering access to digitized sources online. However, research highlights that these archives often struggle with usability issues [@vora_n00b_2010; @dani_digital_2015] including difficulties in accessing certain items and information. Moreover, several emerging challenges are affecting the work of historians. Among these are the limited ways to explore archives, because cultural digital archives typically rely heavily on keyword-based search methods, the lack of transparency about how search results are generated, and the absence of advanced search and filtering options to reduce the volume of search results. diff --git a/submissions/473/index.qmd b/submissions/473/index.qmd index f3e8f75..aa40099 100644 --- a/submissions/473/index.qmd +++ b/submissions/473/index.qmd @@ -19,8 +19,9 @@ keywords: - encyclopaedia abstract: | For some time now, there has been a desire to elevate film to the status of a scholarly publication, and thus to recognise it as a research product in its own right. Never before has this dream seemed so concrete: analogue film collections are extensively digitized and made reliably referenceable via permanent links (DOIs). However, there is often a gap between the digital present of collections and their (analogue) history – such as in the case of the Encyclopaedia Cinematographica (EC). The EC was a large-scale project of exceptional duration and scope (1952–90), conducted by the Institute for Scientific Film in Göttingen (IWF). The EC was intended as an encyclopedia of movement processes, comprising more than 3000 films from biology, ethnology and the technical sciences. In 2010, the IWF was dissolved and the collection was transferred to the Technische Informationsbibliothek Hannover (TIB), where the majority of it was digitized, and certain sections were made accessible online. The challenging layering of media, histories, technologies and institutional agendas that the EC presents as a research object demands various “literacies”, but also specifically designed research tools. The SNSF-funded research project “Visualpedia” (2022–2026) is dedicated to the question of how to appropriately activate and research such a collection entangled in the logics of archival digitization and digital library organization. In addition to a historical reappraisal of the institution and collection, various interfaces are being developed within the project to activate the collection with regard to different research questions. As part of the group presentation, we would like to provide insights into the research project and the interfaces developed. -date: 08-13-2024 +date: 09-12-2024 bibliography: references.bib +doi: 10.5281/zenodo.13907496 --- For some time now, there has been a desire to elevate film to the status of a scholarly publication, and thus recognize it as a research publication in its own right. Never before has this dream seemed so concrete: analog film collections are being extensively digitized and made reliably referenceable via permanent links. diff --git a/submissions/474/index.qmd b/submissions/474/index.qmd index 1d055ed..d83942e 100644 --- a/submissions/474/index.qmd +++ b/submissions/474/index.qmd @@ -14,8 +14,9 @@ keywords: - Digital Forensics - Digital Born Sources - Source Criticism -date: 07-26-2024 +date: 09-13-2024 bibliography: baselbib.bib +doi: 10.5281/zenodo.13907816 abstract: | In an era dominated by digital information processing and communication systems, digital literacy has emerged as a critical competence. This competency is vital at all educational levels, fostering a profound and critical understanding of how information is processed digitally. Especially crucial is the ability to discern information sources, evaluate their expertise, and recognize potential biases, fundamental for the stability of democratic societies. This imperative becomes even more pronounced for historians engaging with original digital sources and analytical tools. Without reflective consideration of the epistemic conditions and consequences of these methods, historians risk compromising the reproducibility of their findings and undermining their long-term epistemic authority. This paper contends that historians, with their specific expertise and perspectives, can significantly contribute to the establishment and dissemination of a digital literacy canon. Source criticism and hermeneutics, integral to interpreting information, transcend the medium, be it digital or otherwise. However, in the digital realm, source criticism requires verifying the completeness, authenticity, provenance, context, and environment of data. To achieve this, the historian's toolkit must integrate methods from digital forensics. diff --git a/submissions/482/index.qmd b/submissions/482/index.qmd index 98ae1e0..f131554 100644 --- a/submissions/482/index.qmd +++ b/submissions/482/index.qmd @@ -21,10 +21,21 @@ keywords: - digital exploration abstract: | Our presentation reflects on the experience gained in an ongoing SNSF-funded research project investigating the internationalization of patent systems. In our research, we mix different methods: traditional historical methods allow us to shed light on the role of intergovernmental agreements and of private networks of patent specialists; digital analysis enables us to trace how internationalization stemmed from patent practice itself, and to study the activity of patentees that have left few historical traces. Relying on a large corpus (over 4 million documents) of digitized patents, we use text mining and computer vision techniques to explore the corpus and operationalize the concept of internationalization. In this paper, we focus on the challenges of matching (almost) identical drawings between patents of different countries, combining image embeddings (obtained by using a pretrained convolutional neural network) and feature matching (SIFT). -date: 08-27-2024 +date: 09-12-2024 +date-modified: 11-15-2024 +doi: 10.5281/zenodo.14171307 +other-links: + - text: Presentation Slides (PDF) + href: https://doi.org/10.5281/zenodo.14171307 bibliography: references.bib --- +::: {.callout-note appearance="simple" icon=false} + +For this paper, slides are available [on Zenodo (PDF)](https://zenodo.org/records/14171307/files/482_DigiHistCH_Internationalization_Slides.pdf). + +::: + ## Introduction Our presentation reflects on the experience gained in the ongoing SNSF-funded research project [The Internationalization of Patent Systems: From Patent Cultures to Global Intellectual Property](https://data.snf.ch/grants/grant/207571). As recent debates on price of, and access to, patented COVID-19 vaccines have recalled, intellectual property rights are of great importance on a global scale. Our research investigates how patents have become, albeit incompletely, such globally relevant rights. While this internationalization is often seen as the consequence of agreements between macro-actors such as states, this project argues that this internationalization stems equally, if not more, from the networks of actors, economic strategies, texts and images involved in patent practices. To explore these, our project relies on the digital analysis of a large corpus of digitized patent documents, using text mining and computer vision techniques. diff --git a/submissions/486/index.qmd b/submissions/486/index.qmd index c586aa6..a76529c 100644 --- a/submissions/486/index.qmd +++ b/submissions/486/index.qmd @@ -22,16 +22,17 @@ author: orcid: 0000-0002-4710-3440 affiliations: - ETH Zürich IT Services - keywords: - Machine Learning - Named Entity Linking - Named Entity Recognition - Historical Data - Natural Language Processing -abstract: Named Entity Linking (NEL) describes the recognition, disambiguation, and linking of so-called «Named Entities» (such as people, places, and organizations) in text. Machine-assisted linking of entities helps to identify historical actors in large source corpora and thus contributes significantly to digital approaches in historical research. However, applying NEL to historical data presents unique challenges due to issues ranging from poor OCR and alternate spellings to people in historical texts being under-represented in contemporary databases. Given that we often have only sparse specific information about an entity in its direct context, we are developing a robust, modular, and scalable workflow in which we «embed» the people by the context in which they appear. This gives us more information, enabling disambiguation even when only limited data is present and application of NEL to large text corpora. Such techniques have been used and described in works such as [@10.1007/978-3-030-29563-9_13] and [@vasilyev2022namedentitylinkingentity]. With developing this pipeline and the corresponding embedding knowledge base(s) of historical entities we want to enable the use of such methods in the Swiss GLAM landscape. -date: 09-02-2024 +abstract: Named Entity Linking (NEL) describes the recognition, disambiguation, and linking of so-called «Named Entities» (such as people, places, and organizations) in text. Machine-assisted linking of entities helps to identify historical actors in large source corpora and thus contributes significantly to digital approaches in historical research. However, applying NEL to historical data presents unique challenges due to issues ranging from poor OCR and alternate spellings to people in historical texts being under-represented in contemporary databases. Given that we often have only sparse specific information about an entity in its direct context, we are developing a robust, modular, and scalable workflow in which we «embed» the people by the context in which they appear. This gives us more information, enabling disambiguation even when only limited data is present and application of NEL to large text corpora. Such techniques have been used and described in works such as @10.1007/978-3-030-29563-9_13 and @vasilyev2022namedentitylinkingentity. With developing this pipeline and the corresponding embedding knowledge base(s) of historical entities we want to enable the use of such methods in the Swiss GLAM landscape. +date: 09-13-2024 +date-modified: 10-13-2024 bibliography: references.bib +doi: 10.5281/zenodo.13907910 --- ## Introduction diff --git a/submissions/687/index.qmd b/submissions/687/index.qmd index 5b1a89d..ef88c0f 100644 --- a/submissions/687/index.qmd +++ b/submissions/687/index.qmd @@ -6,17 +6,28 @@ author: - name: Ina Serif orcid: 0000-0003-2419-4252 email: ina.serif@unibas.ch - affiliation: University Basel, Switzerland + affiliation: University of Basel, Switzerland keywords: - teaching computer-assisted methods - digital history - digital literacy abstract: | The digitization of historical materials and the application of computational techniques significantly expand the spectrum of sources and questions for historical research. However, the practical use of computer-assisted methods often involves resolving technical problems unique to a specific project. When teaching such methods to history students, this is the major challenge: there isn't a simple set of commands that covers all the potential issues in a research project. Moreover, the goal is not to train humanities students to be computer scientists, but to equip them with the skills to tackle specific problems. I will discuss how, based on problems faced in my own research, I combine the teaching of computer-assisted methods with student projects to help the students understand the limitations of out-of-the-box solutions while letting them experience the possibilities of digital analyses. Through their own project, students learn how to break down research questions into separate, manageable technical tasks and identify which types of problems can and which can’t be resolved using digital history methods. -date: 09-09-2024 +date: 09-13-2024 +date-modified: 11-15-2024 +doi: 10.5281/zenodo.14171331 +other-links: + - text: Presentation Slides (PDF) + href: https://doi.org/10.5281/zenodo.14171331 bibliography: references.bib --- +::: {.callout-note appearance="simple" icon=false} + +For this paper, slides are available [on Zenodo (PDF)](https://zenodo.org/records/14171331/files/687_DigiHistCH24_GoDigital_Slides.pdf). + +::: + ## Introduction As historians today, we profit from an unmatched availability of historical sources online, with most of the information contained in these sources digitally accessible. This greatly facilitates the use of computer-assisted methods to support or augment historical analyses. How and when to use which methods in a research endeavor are questions that cannot easily be answered, as the application of appropriate techniques more often than not is something to be clarified or revised during a project. Therefore, we need to find a way to not only teach computer-assisted methods to history students, but also how to enable them to conceptualize a historical research project and how to solve technical problems along the way, empowering them to develop and apply different methods in a practical and inspiring way. In the following, I will discuss an approach that proposes designing semester-long courses with a thematic focus, where students progressively learn how to use computational tools through continuous engagement with a historical source. diff --git a/submissions/keynote/index.qmd b/submissions/keynote/index.qmd index 4daa321..aa8c3f6 100644 --- a/submissions/keynote/index.qmd +++ b/submissions/keynote/index.qmd @@ -12,12 +12,9 @@ keywords: - Digital literacy - Digitisation - Ethics -abstract: | - In recent years, the critical turn in digital humanities has sparked numerous discussions about digital literacy in the discipline of history. While critical work has focused on data, tools, and the skills that historians need in the current digital age, questions remain about the broader contours of digital literacy and the multiple meanings that could be attributed to it. Amidst the shift to a culture of digital abundance and a research environment that privileges what is available online, digitisation has brought old questions about heritage, power, and the production and construction of historical knowledge to the fore. This calls for an approach that expands our current methodological purview to include broader epistemological and normative considerations. - To this end, my talk will foreground the ethics and politics of digitisation as an essential component of digital historical literacy. I propose to do so in three intertwined steps. First comes historical context. Just as digital history urgently needs historicising, so too does digital literacy, not only as a product of precursors such as information and media literacy, but also in relation to notions of literacy and its ethical dimensions more generally. Second, thinking through digital literacy inevitably implies reckoning with the global dimensions of cultural heritage digitisation and its effects on historical knowledge production beyond the oft-posited Global North/South binary. Third, to exercise digital literacy is to acknowledge how ethics and politics suffuse digital epistemologies that fundamentally reframe historical research practices. - Ultimately, I argue that integrating these considerations in our discussions of digital literacy is crucial for a discipline still grappling to come to terms with the digital age. -date: 05-10-2024 -date-modified: 05-30-2024 +date: 09-12-2024 +date-modified: 11-15-2024 +doi: 10.5281/zenodo.13904623 other-links: - text: Homepage href: https://gerbenzaagsma.github.io/ @@ -27,4 +24,18 @@ other-links: href: https://x.com/gerbenzaagsma - text: C²DH href: https://www.c2dh.uni.lu/ + - text: Keynote Video Recording + href: https://doi.org/10.5281/zenodo.14340336 --- + +::: {.callout-note appearance="simple" icon=false} + +A recording of the Zoom live stream of this keynote is available [on Zenodo](https://doi.org/10.5281/zenodo.14340336). + +::: + +In recent years, the critical turn in digital humanities has sparked numerous discussions about digital literacy in the discipline of history. While critical work has focused on data, tools, and the skills that historians need in the current digital age, questions remain about the broader contours of digital literacy and the multiple meanings that could be attributed to it. Amidst the shift to a culture of digital abundance and a research environment that privileges what is available online, digitisation has brought old questions about heritage, power, and the production and construction of historical knowledge to the fore. This calls for an approach that expands our current methodological purview to include broader epistemological and normative considerations. + +To this end, my talk will foreground the ethics and politics of digitisation as an essential component of digital historical literacy. I propose to do so in three intertwin ed steps. First comes historical context. Just as digital history urgently needs historicising, so too does digital literacy, not only as a product of precursors such as information and media literacy, but also in relation to notions of literacy and its ethical dimensions more generally. Second, thinking through digital literacy inevitably implies reckoning with the global dimensions of cultural heritage digitisation and its effects on historical knowledge production beyond the oft-posited Global North/South binary. Third, to exercise digital literacy is to acknowledge how ethics and politics suffuse digital epistemologies that fundamentally reframe historical research practices. + +Ultimately, I argue that integrating these considerations in our discussions of digital literacy is crucial for a discipline still grappling to come to terms with the digital age. \ No newline at end of file diff --git a/submissions/poster/440/index.qmd b/submissions/poster/440/index.qmd index 6bcaca7..f889c9e 100644 --- a/submissions/poster/440/index.qmd +++ b/submissions/poster/440/index.qmd @@ -25,10 +25,20 @@ key-points: - The Swiss Federal Archives (SFA) has implemented Archipanion, built on top of vitrivr, to enable AI-driven, content-based multimedia retrieval, significantly improving access to its digital film collection "Filmwochenschau (1940-1975)". - The Archipanion platform supports multilingual query processing and multiple search modes, allowing users to perform granular content queries and discover connections within the Schweizer Filmwochenschau collection. - The integration of machine learning technologies into the SFA's archival processes exemplifies the potential to transform archival research, although it requires ongoing validation and human expertise to ensure meaningful and accurate results. -date: 07-25-2024 +date: 09-12-2024 +doi: 10.5281/zenodo.13908129 +other-links: + - text: Poster (PDF) + href: https://doi.org/10.5281/zenodo.13908129 bibliography: references.bib --- +::: {.callout-note appearance="simple" icon=false} + +A PDF version of the poster is available [on Zenodo (PDF)](https://zenodo.org/records/13908129/files/440_DigiHistCH24_MultimodalUI_SFA_Poster.pdf). + +::: + ## Introduction Access to archival records has been an integral part of the mission of the Swiss Federal Archives (SFA)[^1] since the founding of the Helvetic Republic in 1798, and represents a commitment to preserving and providing access to the administrative records of federal authorities such as the government, parliament and the administration. As custodian of Switzerland's historical documentation, the SFA plays a vital role in facilitating access to these records for researchers and the public. In response to the constant evolution of digital technologies and the emergence of artificial intelligence (AI), the SFA has adopted innovative approaches to improve the accessibility and usability of its extensive holdings. diff --git a/submissions/poster/463/index.qmd b/submissions/poster/463/index.qmd index 6737c28..6853257 100644 --- a/submissions/poster/463/index.qmd +++ b/submissions/poster/463/index.qmd @@ -26,10 +26,20 @@ key-points: - Transcriptions are crucial for historical research but largely inaccessible, leading to redundant work. - transcriptiones is a collaborative platform which revolutionizes the access to transcriptions and metadata. - transcriptiones takes transcriptions to the age of FAIR and open research data. -date: 07-24-2024 +date: 09-12-2024 +doi: 10.5281/zenodo.13908159 +other-links: + - text: Poster (PDF) + href: https://doi.org/10.5281/zenodo.13908159 bibliography: references.bib --- +::: {.callout-note appearance="simple" icon=false} + +A PDF version of the poster is available [on Zenodo (PDF)](https://zenodo.org/records/13908159/files/463_DigiHistCH24_transcriptiones_Poster.pdf). + +::: + ## Background The significance of Open Research Data (ORD) is rapidly increasing in the research landscape, promoting transparency, reproducibility, and reuse [For more information about ORD in the Swiss higher education system, see @swissuniversitiesSwissNationalOpen2021; and @swissuniversitiesSwissNationalStrategy2021]. In historical research, transcriptions are crucial research data, serving as indispensable resources for the interpretation of the past. Despite their immense value, transcriptions have often remained unpublished, difficult to find, and lacked a central platform for access. Therefore, historians frequently had to re-transcribe the same sources. *transcriptiones* addresses this problem by providing the infrastructure for sharing, editing and reusing transcriptions [@fuchsTranscriptiones]. diff --git a/submissions/poster/466/index.qmd b/submissions/poster/466/index.qmd index 91e7d59..4ab6044 100644 --- a/submissions/poster/466/index.qmd +++ b/submissions/poster/466/index.qmd @@ -36,7 +36,8 @@ author: email: katrin.fuchs@unibas.ch affiliations: - University of Basel -date: 08-28-2024 +date: 09-12-2024 +doi: 10.5281/zenodo.13908083 --- In the realm of historical data processing, machine learning has emerged as a game-changer, enabling the analysis of vast archives and complex finding aids on an unprecedented scale. One intriguing case study exemplifying the potential of these techniques is the digitization of the Historical Land Registry of the City of Basel (=Historisches Grundbuch Basel, HGB). diff --git a/submissions/poster/472/index.qmd b/submissions/poster/472/index.qmd index 1c63923..ef9c47f 100644 --- a/submissions/poster/472/index.qmd +++ b/submissions/poster/472/index.qmd @@ -13,7 +13,8 @@ author: email: kurzawe@sub.uni-goettingen.de affiliations: - SUB Göttingen -date: 08-28-2024 +date: 09-12-2024 +doi: 10.5281/zenodo.13908038 --- In this poster, we show how the Discuss Data research data platform is being expanded to include a "community space" for the digital humanities (DH). Discuss Data enables and promotes contextualized discussion about the quality and sustainability of research data directly on the object. diff --git a/submissions/poster/476/index.qmd b/submissions/poster/476/index.qmd index 8e5873d..2fd5f0a 100644 --- a/submissions/poster/476/index.qmd +++ b/submissions/poster/476/index.qmd @@ -16,8 +16,8 @@ keywords: - Artificial Intelligence abstract: | This study seeks to merge two realms: theoretical digital history, specifically the modeling in history, and economic history, with a focus on the history of income and wealth inequalities. The central objective is to apply theoretical research outcomes concerning models and their application in history to scrutinize a historical explanation of the evolution of economic inequalities between 1914 and 1950. Traditionally, predictive models with reproducible results were paramount for validating explanations through observed data. However, the role of models has expanded, moving beyond mere predictive functions. This paradigm shift, acknowledged by the philosophy of science in recent decades, emphasizes that models now serve broader purposes, guiding exploration and research rather than just prediction. These models are not merely tools for validating predictions; they serve to bring clarity to our thinking processes, establishing the conditions under which our intuitions prove valid. Beyond merely representing systematic relationships between predetermined facets of reality, models aspire to elucidate causal connections. When a historical model aims to provide causal explanations, the process involves identifying the "explanandum" (the aspect of reality being explained) as the dependent variable and working backward to pinpoint its hypothetical causes as independent. Using a diagrammatic approach, we formalized a qualitative model aligning with an historiographical explanation of the evolution of economic inequalities by Thomas Piketty during 1914-1950. The intent was to employ causal diagrams, translating the narrative embedded in Piketty's historiography of inequalities into a formal model. This endeavor sought to make explicit the implicit causal relationships within the historian's narrative: the resulting causal model serves as a symbolic synthesis of our comprehension of a specific subject, enabling the construction of a highly refined narrative synthesis from a complex topic. - -date: 03-12-2023 +date: 09-12-2024 +doi: 10.5281/zenodo.13908110 bibliography: references.bib --- diff --git a/submissions/poster/484/index.qmd b/submissions/poster/484/index.qmd index bec25ce..79d1aeb 100644 --- a/submissions/poster/484/index.qmd +++ b/submissions/poster/484/index.qmd @@ -18,9 +18,21 @@ author: email: matteo.lorenzini@unibas.ch affiliations: - University of Basel, University Library -date: 08-29-2024 +date: 09-12-2024 +doi: 10.5281/zenodo.13908139 +other-links: + - text: Poster (PDF) + href: https://doi.org/10.5281/zenodo.13908139 --- +::: {.callout-note appearance="simple" icon=false} + +A PDF version of the poster is available [on Zenodo (PDF)](https://zenodo.org/records/13908139/files/484_DigiHistCH24_SwissGoogleBooks_Poster.pdf). + +::: + +## Poster Abstract + The UB Bern, ZHB Lucerne, ZB Zurich and UB Basel are digitizing a large part of their holdings from the 18th and 19th centuries in collaboration with Google Books. This digital collection, which is accessible in full text, is intended to offer new possibilities for digital and data-driven research and teaching, e.g. in the context of text and data mining and distant reading. Due to its size (90 million pages), the collection offers many opportunities, but also presents libraries and researchers with new challenges. Google's algorithms are responsible for image processing, book composition and full-text recognition. Continuous data improvement/changes must therefore be expected when changed algorithms deliver new data versions. This helps to continuously improve quality, but represents a black box that makes it complicated to make transparent statements about the data production processes.