Use graphics.py to set up necessary graphical connections to backend. Needs to be able to display query suggestions, ranked list of results, and allow users to submit queries.
Implement tokenizer (HW2), remove stopwords (Zipf's law?) (HW2), stem or lem non-stopwords (HW2), create index structure for dc (HW3).
Use a query log to generate query suggestions. Triggered by space. Identify possible candidates that include the terms triggering it. Rank each suggest on given equation.
Create set of documents comprised of all documents in DC that contain each of the terms in q. If less than 50 then get resources that contain n-1 in q.
Compute relevance for each resource in CR based on given equations.
For each selected result, create a corresponding snippet which includes the title and the two sentences that have the highest cosine similarity with respect to q, TF-IDF.
Run the queries in TestSet and make sure we are getting the expected results.
Getting expected result from Wikipedia search module.