diff --git a/examples/1_basic_usage.ipynb b/examples/1_basic_usage.ipynb
index a238967..8a4d59e 100644
--- a/examples/1_basic_usage.ipynb
+++ b/examples/1_basic_usage.ipynb
@@ -63,14 +63,19 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "heading_collapsed": true
+   },
    "source": [
     "## Regression"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "heading_collapsed": true,
+    "hidden": true
+   },
    "source": [
     "### Problem definition"
    ]
@@ -78,7 +83,9 @@
   {
    "cell_type": "code",
    "execution_count": 3,
-   "metadata": {},
+   "metadata": {
+    "hidden": true
+   },
    "outputs": [],
    "source": [
     "cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race', 'salary']\n",
@@ -89,7 +96,9 @@
   {
    "cell_type": "code",
    "execution_count": 4,
-   "metadata": {},
+   "metadata": {
+    "hidden": true
+   },
    "outputs": [],
    "source": [
     "data = (TabularList.from_df(df, path=path, cat_names=cat_names, cont_names=cont_names, procs=procs)\n",
@@ -100,14 +109,19 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "heading_collapsed": true,
+    "hidden": true
+   },
    "source": [
     "### Model"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "hidden": true
+   },
    "source": [
     "Creates a gaussian process model:"
    ]
@@ -115,7 +129,9 @@
   {
    "cell_type": "code",
    "execution_count": 5,
-   "metadata": {},
+   "metadata": {
+    "hidden": true
+   },
    "outputs": [],
    "source": [
     "learn = tabularGP_learner(data)"
@@ -123,7 +139,9 @@
   },
   {
    "cell_type": "raw",
-   "metadata": {},
+   "metadata": {
+    "hidden": true
+   },
    "source": [
     "Trains the model:"
    ]
@@ -131,7 +149,9 @@
   {
    "cell_type": "code",
    "execution_count": 6,
-   "metadata": {},
+   "metadata": {
+    "hidden": true
+   },
    "outputs": [
     {
      "data": {
@@ -193,14 +213,19 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "heading_collapsed": true,
+    "hidden": true
+   },
    "source": [
     "### Uncertainty"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "hidden": true
+   },
    "source": [
     "Gaussian processes produce a mean (the usual output) and a standard deviation (modelizing the uncertainty on the result).\n",
     "Here they are stored respectively in the index 0 and 1 of the last dimenssion of the tensor outputed by the model:"
@@ -209,7 +234,9 @@
   {
    "cell_type": "code",
    "execution_count": 7,
-   "metadata": {},
+   "metadata": {
+    "hidden": true
+   },
    "outputs": [
     {
      "name": "stdout",
@@ -235,7 +262,9 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "heading_collapsed": true
+   },
    "source": [
     "### Problem definition"
    ]
@@ -243,7 +272,9 @@
   {
    "cell_type": "code",
    "execution_count": 8,
-   "metadata": {},
+   "metadata": {
+    "hidden": true
+   },
    "outputs": [],
    "source": [
     "cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race']\n",
@@ -254,7 +285,9 @@
   {
    "cell_type": "code",
    "execution_count": 9,
-   "metadata": {},
+   "metadata": {
+    "hidden": true
+   },
    "outputs": [],
    "source": [
     "data = (TabularList.from_df(df, path=path, cat_names=cat_names, cont_names=cont_names, procs=procs)\n",
@@ -265,14 +298,18 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "heading_collapsed": true
+   },
    "source": [
     "### Model"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "hidden": true
+   },
    "source": [
     "Creates a gaussian process model (notice that nothing is doen to indicate that this is a classification problem):"
    ]
@@ -280,7 +317,9 @@
   {
    "cell_type": "code",
    "execution_count": 10,
-   "metadata": {},
+   "metadata": {
+    "hidden": true
+   },
    "outputs": [],
    "source": [
     "learn = tabularGP_learner(data)"
@@ -289,7 +328,9 @@
   {
    "cell_type": "code",
    "execution_count": 11,
-   "metadata": {},
+   "metadata": {
+    "hidden": true
+   },
    "outputs": [
     {
      "data": {
@@ -351,7 +392,9 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "heading_collapsed": true
+   },
    "source": [
     "### Uncertainty"
    ]
@@ -360,16 +403,19 @@
    "cell_type": "code",
    "execution_count": 12,
    "metadata": {
+    "hidden": true,
     "scrolled": true
    },
    "outputs": [],
    "source": [
-    "from loss_functions import gp_softmax"
+    "from tabularGP.loss_functions import gp_softmax"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "hidden": true
+   },
    "source": [
     "Classification models also have a standard deviation but, following the pytorch convention, the output is a raw logit and not a genuine probability (hence the means might not sum to one):"
    ]
@@ -377,7 +423,9 @@
   {
    "cell_type": "code",
    "execution_count": 13,
-   "metadata": {},
+   "metadata": {
+    "hidden": true
+   },
    "outputs": [
     {
      "name": "stdout",
@@ -397,7 +445,9 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "hidden": true
+   },
    "source": [
     "The proper way to get probabilities is to apply `gp_softmax` to your raw output (as you would apply a `softmax` to a traditional classification output):"
    ]
@@ -405,7 +455,9 @@
   {
    "cell_type": "code",
    "execution_count": 16,
-   "metadata": {},
+   "metadata": {
+    "hidden": true
+   },
    "outputs": [
     {
      "name": "stdout",
diff --git a/examples/2_kernel_selection.ipynb b/examples/2_kernel_selection.ipynb
index 33b193b..9d0fc72 100644
--- a/examples/2_kernel_selection.ipynb
+++ b/examples/2_kernel_selection.ipynb
@@ -18,19 +18,23 @@
    "source": [
     "from fastai.tabular import *\n",
     "from tabularGP import tabularGP_learner\n",
-    "from kernel import *"
+    "from tabularGP.kernel import *"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "heading_collapsed": true
+   },
    "source": [
     "## Data"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "hidden": true
+   },
    "source": [
     "Builds a regression problem on a subset of the adult dataset:"
    ]
@@ -38,7 +42,9 @@
   {
    "cell_type": "code",
    "execution_count": 2,
-   "metadata": {},
+   "metadata": {
+    "hidden": true
+   },
    "outputs": [],
    "source": [
     "path = untar_data(URLs.ADULT_SAMPLE)\n",
@@ -49,7 +55,9 @@
   {
    "cell_type": "code",
    "execution_count": 3,
-   "metadata": {},
+   "metadata": {
+    "hidden": true
+   },
    "outputs": [],
    "source": [
     "cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race']\n",
@@ -60,7 +68,9 @@
   {
    "cell_type": "code",
    "execution_count": 4,
-   "metadata": {},
+   "metadata": {
+    "hidden": true
+   },
    "outputs": [],
    "source": [
     "data = (TabularList.from_df(df, path=path, cat_names=cat_names, cont_names=cont_names, procs=procs)\n",
@@ -71,14 +81,18 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "heading_collapsed": true
+   },
    "source": [
     "## Tabular kernels"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "hidden": true
+   },
    "source": [
     "By default, tabularGP uses one kernel type for each continuous features (a [gaussian kernel](https://en.wikipedia.org/wiki/Radial_basis_function_kernel)) and one kernel type for each categorial features (an [index kernel](https://gpytorch.readthedocs.io/en/latest/kernels.html#indexkernel)).  \n",
     "Using those kernels we can compute the similarity between the individual coordinates of two points, those similarity are them combined with what we call a tabular kernel."
@@ -86,7 +100,9 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "hidden": true
+   },
    "source": [
     "The simplest kernel is the `WeightedSumKernel` kernel which computes a weighted sum of the feature similarities.  \n",
     "It is equivalent to a `OR` type of relation: if two points have at least one feature that is similar then they will be considered close in the input space (even if all the other features are very dissimilar)."
@@ -95,7 +111,9 @@
   {
    "cell_type": "code",
    "execution_count": 5,
-   "metadata": {},
+   "metadata": {
+    "hidden": true
+   },
    "outputs": [
     {
      "data": {
@@ -158,7 +176,9 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "hidden": true
+   },
    "source": [
     "Then there is the `WeightedProductKernel` kernel which computes a weighted geometric mean (weighted product) of the feature similarities.  \n",
     "It is equivalent to a `AND` type of relation: all features need to be similar to consider two points similar in the input space.\n",
@@ -168,7 +188,9 @@
   {
    "cell_type": "code",
    "execution_count": 6,
-   "metadata": {},
+   "metadata": {
+    "hidden": true
+   },
    "outputs": [
     {
      "data": {
@@ -231,7 +253,9 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "hidden": true
+   },
    "source": [
     "The default tabular kernel is a `ProductOfSumsKernel` which modelise a combinaison of the form: $$s = \\prod_i{(\\sum_j{\\beta_j * s_j})^{\\alpha_i}}$$\n",
     "It is equivalent to a `WeightedProductKernel` put on top of a `WeightedSumKernel` kernel.\n",
@@ -241,7 +265,9 @@
   {
    "cell_type": "code",
    "execution_count": 7,
-   "metadata": {},
+   "metadata": {
+    "hidden": true
+   },
    "outputs": [
     {
      "data": {
@@ -304,7 +330,9 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "hidden": true
+   },
    "source": [
     "It is important to note that the choice of the tabular kernel can have a drastic impact on your loss and that you should probably always test all available kernels to find the one that is most suited to your particular problem.\n",
     "\n",
@@ -324,7 +352,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from loss_functions import *\n",
+    "from tabularGP.loss_functions import *\n",
     "from tabularGP import *"
    ]
   },
diff --git a/examples/3_prior_selection.ipynb b/examples/3_prior_selection.ipynb
index 4360f1b..8da6139 100644
--- a/examples/3_prior_selection.ipynb
+++ b/examples/3_prior_selection.ipynb
@@ -18,7 +18,7 @@
    "source": [
     "from fastai.tabular import *\n",
     "from tabularGP import tabularGP_learner\n",
-    "from prior import *"
+    "from tabularGP.prior import *"
    ]
   },
   {