SentencePiece trained on both queries and answers (stored as .vocab file)
Token embeddings (stored as pkl)
SentencePiece with GloVe and Word2Vec (Pretrain and finetune))
BERT (Pretrained and finetuned)
Random
Consider taking from high in distribution
Take all from the bing relevant
Create triples of (q_i p_ij n_ij)
Use mapping functions
DatasetDict({ validation: Dataset({ features: ['answers', 'passages', 'query', 'query_id', 'query_type', 'wellFormedAnswers'], num_rows: 10047 }) train: Dataset({ features: ['answers', 'passages', 'query', 'query_id', 'query_type', 'wellFormedAnswers'], num_rows: 82326 }) test: Dataset({ features: ['answers', 'passages', 'query', 'query_id', 'query_type', 'wellFormedAnswers'], num_rows: 9650 }) })
Share random state
Share code from last week (training loop)