- setup of the local dev environement
- create a copy of the notebook
- ensure that the current notebook run without errors
- identify code smells : [X] dead code (executed code with a result never reused) => print, display, show, df.printSchema ... [X] exposing explicity implementation details [X] duplication [X] magic command (%sql, %scala, %python) [X] to many comments ..etc
- convert the notebook into a python file
- remove dead codes
- group codes if possible into 3 sections (global functions): extract(), transoform(), load()
- resolve dependencies between sections (extract(), transoform(), load()) and ensure that the notebook run successufily
- Start the refactoring phase of the 3 sections (one by one)
- run a characterisation test (pytest-watch)
- do()
- identify a block of code that can be exported to a python module
- write the test for the python module
- write the python module
- make the test pass (read and analyze continuous feedback from pytest-watch)
- use the python function in the main code
- commit the last changes
- refactor again ...