2_relatedwork.tex

\chapter{Related Work}
\label{ch:relatedwork}

% Carbon Dating The Web: Estimating the Age of Web Resources: https://arxiv.org/pdf/1304.5213.pdf

% Predicting document creation times: https://dbs.ifi.uni-heidelberg.de/files/Team/aspitz/publications/Spitz_et_al_Predicting_Document_Creation_Times.pdf

In the past decade, there have been a number of papers approaching the problem of combining relevance and recency in ranking. Some of them deal with more specific domains than web search, such as mail search \citep{carmel2017promoting}, microblog retrieval \citep{efron2012query}, and news search \citep{dakka2012answering}. In the domain of mail search, a recency-only ranking is traditionally implemented. Even when time-ranked results are mixed with relevance-based results, as \cite{carmel2017promoting} report, showing duplicate results to the user is not discouraged. These two key points make mail search a different problem than web search. Moreover, when it comes to news search, it is assumed that the publication date of the document is available \citep{dakka2012answering}. For the majority of documents on the Web, this is not the case, making web search a more difficult problem.

The works that directly improve ranking recency in web search are \citep{dong2010towards,dong2010time,dai2011learning,styskin2011recency}. \citet{dong2010towards} focus on breaking-news queries and build different rankers. If a query is classified as recency-sensitive, a recency-sensitive ranker is used to rank the documents matching that query. Similar to our work, they model different time slots by building language models from different sources (news content and queries) and compare them in order to classify if a query is recency-sensitive or not. Furthermore, they extract recency features from documents, and define the \textit{page age} as the time between the query submission time and page publication time, which is either the page creation time or the time it was last updated. The recency features they extract from documents are content-based and link-based. In our approach, we only extract content-based evidence. Finally, they manually annotate query-url pairs and train two separate models: a recency ranker and a relevance ranker. The key difference between this work and ours is that we do not manually annotate data, but automatically produce labels, we use only one ranking model for all queries, and we focus on all types of queries.

Extending their existing approach, \citet{dong2010time} introduce Twitter features to help with recency ranking. However, they do not focus on the fresh content produced by tweets to determine if the query terms are recency-sensitive, but instead they employ a URL mining approach to detect fresh URLs from tweets and learn to rank them.

Instead of having separate rankers, \citet{dai2011learning} propose a machine learning framework for simultaneously optimizing recency and relevance, focusing on improving the ranking of results for queries based on their temporal profiles. They determine the temporal profile of a query by building a time series of the relevant documents' content changes. Next, they extract temporal features from the seasonal-trend decomposition of different time series. We could not use this approach, as we only keep the most updated snapshot of documents in our index.

Similarly, \citet{cheng2013fresh} model term distribution of results over time and conclude that this change is strongly correlated with the users' perception of time sensitivity. However, they only focus on improving ranking for timely queries, i.e., queries that have no major spikes in volume over time, but still favour more recently published documents. In a similar work, \citet{efron2011estimation} assume that the document publication time is known. They compute a query-specific parameter that captures recency-sensitivity and is calculated based on the distribution of the publication times of the top retrieved documents by a relevance-based ranker.

However, when document publication time is not known, we have to search for other indicators of document recency and query recency sensitivity. For example, \citet{campos2016gte} use temporal expressions from web snippets related to the query to improve the ranking. In this work, we do the same. Other possible sources of temporal features include click logs and query logs. \citet{wang2012joint} learn the relevance and recency models separately from two different sets of features. Their divide-and-conquer learning approach is similar to \citep{dai2011learning}, but in this work, they omit manually annotating data, and instead automatically infer labels using clickthrough data. We do not use click logs, but we do extract frequency features from our query log, similarly to \citep{metzler2009improving, lefortier2014online}.

So far, we have explained how related work has extended existing learning-to-rank algorithms to take into account both relevance and recency. Another approach to recency ranking is an aggregated search strategy. In other words, recent results are extracted from a fresh vertical (such as news articles) and subsequently integrated into the result page. The blending of results from different verticals is called result set diversification. Examples of this approach are \citep{lefortier2014online,styskin2011recency}. In these two papers, a classifier is used to score the recency sensitivity of a user query, and this score is used to determine to what extent the documents from the fresh vertical should be inserted. Even though we do not perform result set diversification, we share similarities with these papers. The query fresh intent detector from \citep{lefortier2014online} is similar to our query recency classifier in terms of features used, and we have modeled our query ground truth labels according to the distribution reported in \citep{styskin2011recency}.