Skip to content

A tool to convert tabular data into images, in order to be used by CNNs Inspired by the "DeepInsight" paper.

License

Notifications You must be signed in to change notification settings

nicomignoni/tab2img

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tab2img: from tabular data to images

A tool to convert tabular data into images for CNN. Inspired by the DeepInsight paper.

Installation

pip install tab2img

Background

In the paper "DeepInsight: A methodology to transform a non-image data to an image for convolution neural network architecture" the autors propose a method to convert tabular data into images, in order to utilize the power of convolutional neural network (CNN) for non-image structured data.

The Figure illustrates the main idea: given a training dataset $X \in \mathbb{R}^{m \times n}$ with $m$ samples and $n$ features, we are required to find a function $M \in \mathbb{R}^{m \times n} \to \mathbb{R}^{m \times d \times d}$, where $d = \lceil \sqrt{n} \rceil$.

There are numerous ways to choose $M$. In this implementation, the features are organized with respect to the correlation vector $\rho(X,Y)$, where $Y \in \mathbb{R}^{1 \times m}$ is the target vector. Given $X$ and $Y$ as

$$ X = \begin{bmatrix} x^{(1)}_1 & \cdots & x^{(1)}_n \\ \vdots & \ddots & \vdots \\ x^{(m)}_1 & \cdots & x^{(m)}_n \end{bmatrix}, \quad Y = \begin{bmatrix} y_1 \\ \vdots \\ y_m \end{bmatrix} $$

Vector $\rho_i$ express the Pearson correlation coefficient for the $i$-th feature, i.e.,

$$ \rho_i = \rho(X_i, Y), \quad X_i = \begin{bmatrix} x^{(1)}_i \\ \vdots \\ x^{(m)}_i \end{bmatrix} $$

In this case, being $X$ a sample, the correlation coefficient is implemented as

$$ \rho(x,y) = \frac{\sum_{i=1}^{n}(x_{i}-{\bar {x}})(y_{i}-{\bar{y}})}{{\sqrt{\sum_{i=1}^{n}(x_{i}-{\bar{x}})^{2}}}{\sqrt{\sum_{i=1}^{n}(y_{i}-{\bar{y}})^{2}}}} $$

At this point, $\rho_1, \dots, \rho_n$ are sorted from the greatest to the smallest, generating the vector of indices

$$ J = \left[ J_k \in \mathbb{N}: \ \rho(X_{J_k}, Y) > \rho(X_{J_{k-1}}, Y), \ k = 2,\dots,n \right] $$

Eventually, the final tensor $M$ is

$$ M = \begin{bmatrix} X_{J_1} & X_{J_2} & X_{J_5} & \cdots \\ X_{J_3} & X_{J_4} & X_{J_7} & \cdots \\ X_{J_6} & X_{J_8} & X_{J_9} & \cdots \\ \vdots & \vdots & \vdots & \ddots \end{bmatrix} $$

The mapping from $J_k$ to the right row and column $(r,c)_k$ of $M$ is

$$ (r, c)_ k = \begin{cases} (\sqrt{k}, \sqrt{k}) & \text{if} \sqrt{k} \in \mathbb{N} \\ (\lceil\sqrt{k}\rceil, \lceil\sqrt{k}\rceil - \frac{1}{2}(\lceil\sqrt{k}\rceil^2 - k)) & \text{if} \sqrt{k} \notin \mathbb{N} \ \text{and} \ \lceil\sqrt{k}\rceil^2 - k = 0 \mod{2} \\ (\lceil\sqrt{k}\rceil - \frac{1}{2}(\lceil\sqrt{k}\rceil^2 - k), \lceil\sqrt{k}\rceil) & \text{if} \sqrt{k} \notin \mathbb{N} \ \text{and} \ \lceil\sqrt{k}\rceil^2 - k \neq 0 \mod{2} \end{cases} $$

Example

from sklearn.datasets import fetch_covtype
from tab2img.converter import Tab2Img

dataset = fetch_covtype()

train = dataset.data
target = dataset.target

model = Tab2Img()
images = model.fit_transform(train, target)

About

A tool to convert tabular data into images, in order to be used by CNNs Inspired by the "DeepInsight" paper.

Topics

Resources

License

Stars

Watchers

Forks

Languages