-
Notifications
You must be signed in to change notification settings - Fork 1
Home
Hey there! Who are you?
[I am a developer (I want to edit the source)](I am a developer) | [I am a user (I want to get the tool running)](I am a user) |
---|---|
Newscrawler is a software developed by the CColon-team in the context of the lecture "Softwareprojekt" by the University of Konstanz in the summer term 2016.
The team consisted of Jonathan Hassler (@JBH168), Franziska Schlor (@franziscl), Matt Sharinghousen (@msharing), Claudio Spener (@claudeeee) and Moritz Bock (@movabo).
Its goal is to download the HTML-source of news-articles on multiple sites given by multiple URLs. In this context, a news-article is a collection of multiple articles (as for example on most index pages).
It relies heavily on Scrapy 1.1.
This program was originaly written in Python 2.7 and is tested there. We decided to write it in Python 2.7 because Scrapy was only stable with this version. Right now Scrapy is in a Python 3-beta. This program can run with Python 2.7 or Python 3.5 but is only tested with Python 2.7.
The main problem with Python 3 is the new string handling. Strings can be byte strings and normal strings.
- [I am a developer](I am a developer)
- [I am a user](I am a user)
- [Database System](Database System)
- Logging
- Output
- Troubleshooting
- Use-cases
- Anti-crawling Issues
- Bottlenecks
- [Demo Crawls](Demo Crawls)
- IDE
- [RSS-Feed Decision](RSS-Feed Decision)
- [Thinking Process](Thinking Process)