Skip to content

Download and categorize receipts from Kroger.com by scraping a user's purchases with puppeteer and playwright.

Notifications You must be signed in to change notification settings

callaginn/kroger-sweeper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kroger Receipt Sweeper

Downloads itemized receipts from a Kroger account with Puppeteer and categorizes them using the Kroger API and custom data cleanup scripts.

brew install 1password-cli
yarn install

How to Use

1. Scrape Kroger Data (Two Methods)

At the moment, both methods have to be manually copied/pasted into a json file, such as src/data/receipts.json.

a. Scrape with yarn sweep

This experimental script logs into Kroger and scrapes all the receipt data automatically.

b. Scrape with console scripts

If you run into issues with the automated scripts above, you can revert to a more manual process to collect the same data. This process includes logging in, navigating to the "/mypurchases" page, opening up the Chrome Console, and pasting in code from the "scrape*.md" files.

Since this process involves you controlling a regular version of Chrome, it's the least likely to be flagged as a bot. And unlike the automated scripts, it allows you to intervene if it runs into issues. We'll use scrape2b.md as an example.

This script is able to get a list of all the purchases, but often fails after collecting a few itemized receipts. I believe this is because bot protection is being triggered, but haven't found an automated way to bypass that.

If it fails when fetching a batch of receipts, you can do the following:

  • Scroll/click around the Kroger interface a bit so that they reflag you as a human. Make sure you stay on the "/mypurchases" page though. There's some tabs at top that stay on the same page.
  • In the console, manually rerun the batch that failed and the ones afterwards. For example, if it fails while grabbing the fourth batch, you'll need to run processBatch(batches, 3) and subsequent batches.
  • Repeat the steps above if additional batches fail.

2. Cleanup with yarn cleanup

This script grabs a simplified array of products from receipts.json and exports them to products.json.

3. Get product categories with yarn lookup

This script loads products.json and uses the Kroger api to request information about all the products. These categorized products are saved to src/data/categories.json.

4. Categorize items with yarn categorize

This script matches the products from products.json to receipts.json. Easiest course of action would be looping through categories.json and assigning the category based off the upc key.

Development Info and Scripts

The src/dev/docs folder contains a list of markdown files that explains how the process works.

Run yarn dev bot to get a feel for how bot detectors like Akamai's Bot Manager (used by Kroger) detect bots and ID your device by browser.

Useful References

Bot Tests

About

Download and categorize receipts from Kroger.com by scraping a user's purchases with puppeteer and playwright.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published