Skip to content

a web crawler project assignment utilizing node.js and mariadb

Notifications You must be signed in to change notification settings

deyan-a/web-crawler

Repository files navigation

web-crawler

Telerik Alpha individual project assignment utilizing Node.js and Databases

Create a web crawler gathering and aggregating information from atleast two different web sites. The crawler should support the following operations: • npm run update o Gathers the information and stores it in MariaDB/MySQL instance • npm run statistics COMMAND:params o At least 3 commands for information aggregation

Example:

Web crawler for mobile phones. • Gathers information from technopolis and technomarket • Statistics o Order by price  npm run statistics order-by-price o Filter by RAM, screen size, or OS  npm run statistics filter:ram:gt:4GB  npm run statistics filter:screen-suze:lt:5 o Search for a specific requirement  i.e. 4G, gorilla glass, etc...  npm run statistics search:4g  npm run statistics search:gorilla

Web crawler for books (goodreads.com) Web crawler for movies (imdb)

Technical Requirements:

• No UI required, only CLI interface • Parse HTML pages, DO NOT use APIs • Use as much ES2015 as possible – async-await, promises, generators (if possible), etc.. • Zero ESLint errors/warnings – Use the .eslintrc file from demos • Use MariaDB as data storage – With schemas, fulfulling the good practices • Use Sequelize • Do not use loop constructs – for(var i = 0; …. ), for(const el of …), for(const key in …) – while(….)

Optional Requirements:

• Optimize the gathering of data – i.e. using an async queue, where, at each moment of time, there are exactly 5 downloading queries • Feel free to use any npm package available on the Web – i.e. jQuery for the parsing of the HTML

About

a web crawler project assignment utilizing node.js and mariadb

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published