Skip to content

tlemenestrel/CharlesDeGaulle-GPT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CharlesDeGaulle-GPT

Introduction

This repository contains a version of GPT-2 finetuned on data extracted from Charles De Gaulle's speeches during and after WW2. The GPT-2 model used is a version further pretrained on a large heterogeneous French corpus (~60Gb). It is available on HuggingFace at https://huggingface.co/tlemenestrel/CharlesDeGaulle-GPT.

Table of Contents

Data Collection

Over 85 documents containing Charles De Gaulle's speeches were manually collected and converted to txt using spacypdfreader with the fr_core_news_sm_pipeline. The data was then pre-processed to be fed into the GPT-2 model.

Training

The following parameters were used for training on an RTX 3090:

      
  1. learning_rate: 2e-05
  2.   
  3. train_batch_size: 4
  4.   
  5. optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  6.   
  7. num_epochs: 8

About

A French GPT-2 model finetuned on text data from De Gaulle's speeches

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published