setup of the local dev environement
create a copy of the notebook
ensure that the current notebook run without errors
identify code smells : [X] dead code (executed code with a result never reused) => print, display, show, df.printSchema ... [X] exposing explicity implementation details [X] duplication [X] magic command (%sql, %scala, %python) [X] to many comments ..etc
convert the notebook into a python file
remove dead codes
group codes if possible into 3 sections (global functions): extract(), transoform(), load()
resolve dependencies between sections (extract(), transoform(), load()) and ensure that the notebook run successufily
Start the refactoring phase of the 3 sections (one by one)

Provide feedback

Saved searches