Performance Tweaks, Iterators, and Lazy Evaluations #308

PyWoody · 2024-04-22T19:22:59Z

A new function, dejavu.logic.decoder.find_files_g, has been created as an iterator replacement for find_files in the same file. This allows the updated Dejavu.fingerprint_directory method to utilize the concurrent.futures.ProcessPoolExecutor in conjunction with the concurrent.futures.as_completed function to submit files to be processed as they are yielded for immediate processing. Once all of the files have been submitted to the executor for processing, their respective results will be iterated over in as_completed...as they are completed.

These two modifications will allow considerable speed improvements over the existing methods. If anyone knows of where to find a large number of Creative Common openly licensed audio files to download and test on, I would be glad to post comparison results. I tried finding some today but every website was either broken, required creating accounts, or was so extensively rate-limited to be near useless.

Other minor improvements are

Adding placeholder songs and songhashes_set in the __init__ of Dejavu to take advantage of __init__'s special dict
Changing counts and songs_matches in Dejavu.align_matches to generator comprehensions
Adding song_hash directly to self.songhashes_set instead of creating the variable first. Saves a lookup per iteration
Waiting to call Dejavu.__load_fingerprinted_audio_hashes() until after all files have been processed
Changing both channels in dejavu.logic.decoder.read to use list comprehensions

…sion later

…t to take advantage of Python's special, optimized dict build for __init__

…or to compute hashes. Since a ProcessPoolExecutor can pass args and kwargs, Dejavu._fingerprint_worker not takes the file_name and limit args directly.

…sion tuple. Also adds the ':' to unique_hash that I forgot but am too lazy to fix in the commit

…ry lookup

…pressions for lazy evaluation

…return_type

PyWoody added 10 commits April 22, 2024 10:00

Commiting .gitignore changes from vim. Will be restored to master ver…

574ee5d

…sion later

Adding placeholder definitions of Dejavu.songs, Dejavu.songshashes_se…

d77a865

…t to take advantage of Python's special, optimized dict build for __init__

fingerprint_directory now uses a concurrent.futures.ProcessPoolExecut…

ba77e00

…or to compute hashes. Since a ProcessPoolExecutor can pass args and kwargs, Dejavu._fingerprint_worker not takes the file_name and limit args directly.

Minor tweak for cleanup

6ce7dbf

Adding find_files_g which is an iterator that yields the fpath, exten…

3eab4f1

…sion tuple. Also adds the ':' to unique_hash that I forgot but am too lazy to fix in the commit

Making channels in read list comps

21f68ed

Adding song hashes directly to songhashes_set to prevent an unnecessa…

7ab1ce5

…ry lookup

Changing counts and song_matches in align_matches to use generator ex…

19201bf

…pressions for lazy evaluation

Fixing type hint for find_files_g to correctly capture send_type and …

6034978

…return_type

Restoring .gitignore to master

2305629

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Tweaks, Iterators, and Lazy Evaluations #308

Performance Tweaks, Iterators, and Lazy Evaluations #308

PyWoody commented Apr 22, 2024

Performance Tweaks, Iterators, and Lazy Evaluations #308

Are you sure you want to change the base?

Performance Tweaks, Iterators, and Lazy Evaluations #308

Conversation

PyWoody commented Apr 22, 2024