ClipQuery

Introduction

Welcome to my CLIP Image Search tool—a personal endeavor to harness the power of OpenAI's CLIP model for image and text-based search functionalities. This tool enables the embedding and indexing of image directories for image search capabilities using text or images queries. It uses CLIP's vector embeddings and leverages Spotify's ANNOY library for efficient similarity searches.

Features

Image Upload: Users can upload a directory of images to be indexed.
Text and Image Search: Supports querying by both text and images.
Nearest Neighbor Display: The 16 most similar images are displayed in response to a query.

Setup

To get this platform running on your local machine, follow these steps in your terminal:

Clone the Repository

git clone https://github.com/USERNAME/ClipQuery.git
cd ClipQuery

Install Dependencies

npm install

Startup

npm run start

Navigate to Frontend

VITE v5.1.5  ready in 215 ms
      ➜  Local:   http://localhost:5173/

Go to the link that appears in your terminal. It should default to 5173.

Usage

Upload Images
- Click the Upload Directory button to upload the images you want to search through
- Note: This process may take a few minutes depending on how large your upload is. Each image must go through the CLIP model, and then an index is created from the embeddings.
Search Uploads
- Use the search bar to enter a text query or upload an image to search by image.
- The results will display the 16 nearest images based on the query.

Technology Stack

Backend

Express.js: Manages server-side logic for uploading directories and queries. Routes to the necessary python scripts, and returns results to the frontend.
Python: Alongside Node.js, Python scripts are used for specific machine learning tasks that involve image and text processing.
- CLIP (Contrastive Language–Image Pre-training): A model developed by OpenAI, used to generate 512-dimensional vectors from images and text. This vector representation is crucial for performing content-based searches. Python scripts handle the interaction with the CLIP model, ensuring that inputs are appropriately preprocessed and embeddings are generated accurately.
- ANNOY (Approximate Nearest Neighbors Oh Yeah): A library for performing nearest neighbors searches. After images and text are converted into vector embeddings by the CLIP model, these vectors are indexed using ANNOY to facilitate efficient and fast retrieval of the most relevant images based on the query.

Frontend

React: Provides the user interface, enabling users to upload images, enter text queries, and view search results dynamically. It communicates with the backend via Axios to post data and retrieve results.

Data Handling

Node.js & Express.js: While Python handles machine learning operations, Node.js with Express.js is used for overall server management, including request handling, file uploads, and serving query results.

CIFAR100 Demo

I uploaded the CIFAR100 Training Dataset (50,000 32x32 PNG images in 100 classes) and searches by class label. The following are some screenshots of the first 8 results for each query. I think it worked pretty well.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
assets		assets
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClipQuery

Introduction

Features

Setup

Usage

Technology Stack

Backend

Frontend

Data Handling

CIFAR100 Demo

About

Releases

Packages

Languages

EzraApple/ClipQuery

Folders and files

Latest commit

History

Repository files navigation

ClipQuery

Introduction

Features

Setup

Usage

Technology Stack

Backend

Frontend

Data Handling

CIFAR100 Demo

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages