This script searches .xls
, .xlsx
, and .xlsm
files for specified keywords/phrases.
- Fast: Utilizes parallel processing to expedite searches. With four keywords and 1000 Excel files, it took only 180 seconds.
- Configurable: Specify minimum file size, input filename, output filename, and any number of keywords—all configurable via a file.
- Excel-compatible: Generates a CSV file that can be opened in Excel with UTF-8 characters.
- Comprehensive: Searches all files listed in a CSV file (
files_to_search.efu
from Everything save file). - Efficient: Excludes files that have no matched keywords from the results.
- UTF-8 Support: Supports filenames and keywords/phrases in UTF-8, allowing for symbols like Cyrillic characters in the search configuration and the resulting CSV output.
- Windows: Designed primarily for Windows.
- Linux: Also compatible with Unix-like systems, provided the file list is in the specified format.
-
Python with
pandas
(install dependencies viarequirements.txt
).- Everything from Voidtools.
- Create a CSV file with filenames and their sizes, with a header like:
Filename,Size
- Ensure Python and Everything are installed, then run
pip install -r requirements.txt
. - Download the script and
search_config.txt
. - Open
search_config.txt
and add key phrases after line 19 (after the comments). - Use Everything to search for
.xls
files. - Save the search results as
files_to_search.efu
. - Place
files_to_search.efu
in the same directory as the script. - Ensure
search_config.txt
is correctly configured. - Open a console and navigate to the script directory.
- Run the script with
python .\search.py
. - Wait for the script to complete.
- Open
output_results.csv
in Excel or a text editor to review the results.
Supports phrases in other languages via UTF-8. All the necessary instructions are already there: check out search_config.txt.
reading from: files_to_search.efu, exporting to output_csv.csv
files bigger than 5000
with keywords: ['32', 'project', 'other project', 'third project', 'maybe']
removed 174 files that are smaller than 5000
using 32 workers
1/883 files (0.00% done). Time total 0.08 $R59GSHE.xls
2/883 files (0.01% done). Time total 0.11 $RG6HYYA.xls
...
882/883 files (97.10% done). Time total 91.13 Розклад+.xls
883/883 files (100.00% done). Time total 183.74 темп.xlsx
search done
saving
done
Total files with hits 708, files with errors 5
sep=;
Filepath;Error_msg;32;project;other project;third project;maybe;Hits
A:\\\\censored_name.xlsx;;True;True;False;False;False;2
C:\\\ncensored_name.xlsx;;True;True;False;False;False;2
A:\$RECYCLE.BIN\...\$R59GSHE.xls;;True;False;False;False;False;1
C:\Program Files\Microsoft Office\root\vfs\ProgramFilesX86\Microsoft Office\Office16\DCF\SyncFusion.XlsIO.Base.dll;**File is not a zip file**;False;False;False;False;False;0
Good luck with your search!