A simple CLI tool to get everything you need from Craigslist
My interest in web scraping began in 2018 when I was desperate to buy a Modcan Dual Delay for my Eurorack collection and stumbled across WiggleHunt. Since then, I’ve found the utility of organizing used items for sale from across a variety of different websites genius. I built CL Search to solve this problem.
-
Webdriver Agnostic: Supports Chrome, Chromium, Edge, Firefox, & Safari Webdriver
-
Supports all Craigslist locations + categories
-
Supports a variety of formats to export data
-
Supports headless mode in all browsers
-
Full SQLite 3 support
-
Download images
The cl search CLI is available at cl
.
Here is an example of how you might search for iphones in Austin, Texas using a headless browser and exporting the results to an sql database.
Resulting database will be in your current working directory.
cl -s iphone -L austin --headless -o sql
❗
|
Location is a required flag |
Supports URLs, City Names, States, Provinces, Countries, Continents, or Craigslist
-L or --location foo
Examples:
cl -L 'New York'
💡
|
Use Lower 48 to search thru the Contiguous US 🦅
|
ℹ️
|
Default Output is CSV |
Currently supporting a few different formats:
-
csv
-
json
-
excel[1]
-
sqlite 3
-o or --output foo
Examples:
Simply type in the name of the format
cl -L foo -o json
or just use the extension for ease of use!
cl -L bar -o xlsx
ℹ️
|
Defaults to Firefox |
Supports the following browsers:
-
Chrome
-
Chromium
-
Edge
-
Firefox
-
Safari.
-b or --browser foo
ℹ️
|
No Default / Not Required |
Query a search or take every listing!
-s foo
-s or --search 'foo bar'
ℹ️
|
False by Default |
Downloads images from the listings.
-i or --image
Image defaults can be set in class_cl_item.py
by subclass.
if image_url_src.strip() == "":
image_url = "No image"
image_path = f'{path}/images/no_image.png'
ℹ️
|
Default All for sale |
Select the category or subcategory you wish to search in.
-C or --category 'foo bar'
All categories are listed in categories.py
You can customize these categories by appending to the end of the dict.
ℹ️
|
False by Default |
Deletes old listings from SQL tables
-D or --delete
You can modify the timedelta in database.py
to adjust when listings are
deleted
time_to_stale = current_time - timedelta(weeks=1)
Contributions are welcomed to this project.
Take advantage of pre-commit to lint and test your PRs before submission.