- histogram of feature types (binary, integer, non-negative, character, string etc.)
- Heat map of raw data that fits on screen (k-means++ to select 1000 samples, CUR to select 100 dimensions)
- 1st moment statistics
- mean & median on (line plot + heatmap)
- 2nd moment statistics
- correlation matrix (heatmap)
- matrix of energy distances (heatmap)
- density estimate 4. 1D marginals (Violin + jittered scatter plot of each dimension, if n > 1000 or d>10, density heatmaps) 8. 2D marginals (Pairs plots for top ~8 dimensions, if n*d>8000, 2D heatmaps)
- Outlier plot
- cluster analysis (IDT++)
- BIC curves
- mean line plot
- covariance matrix heatmaps
- spectral analysis
- cumulative variance (with elbows) of data matrix
- eigenvectors (pairs plot + heatmap)
- heatmap, sorted by child node
- 1st order stats per child + difference between children
- 1st order stats per child + difference between children
- density estimate per child
2. 1D marginals: violion plot, separated by child node
- 2D marginals: pairs plots, color coded by cluster, voronoi diagram overlaid
- outlier plot for each child node
- cluster analysis per child
- spectral analysis per child
- raw
- linear options
- linear squash between 0 & 1
- mean subtract and standard deviation divide
- median subtract and median absolute deviation divide
- make unit norm
- nonlinear
- rank
- sigmoid squash
- use Geometric median and robust estimation in Banach spaces to obtain robust estimates of 1st and 2nd moments
- sort by category
- color code labels by category
- label points in scatter plots by symbol